<a href="https://colab.research.google.com/github/prithvikannan/facebook-business/blob/master/FacebookCaseStudy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 0. Imports

Relevant imports of Selenium, ChromeDriver, JSON, Regex, Time, Pandas, etc.

In [1]:
!apt update
!apt install chromium-chromedriver
!pip install selenium

Hit:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
Hit:2 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Hit:3 http://security.ubuntu.com/ubuntu bionic-security InRelease
Ign:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:6 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:9 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic InRelease
Hit:10 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:12 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Reading package lists... Done
Building dependency tree       
Reading state information... Done
42 packages can be upgraded. Run 'apt

In [0]:
import json
import re
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException   
from collections import OrderedDict 
import time
import pandas as pd

In [0]:
# set options to be headless, ..
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)

# 1. Functions

Starting with the base url, `getAllCategories()` searches through the dropdown menus to find sublinks relevant to categories of small businesses, and returns a dictionary of `{category, link}` pairs. 

In [0]:
def getAllCategories():
    # queries the base small business link
    wd.get("https://www.facebook.com/business/success?categories[0]=small-business")

    # looks through categories with href containing "small business"
    all_categories = {}
    elems = wd.find_elements_by_xpath("//a[contains(@href, '/business/success/?categories%5B0%5D=small-business&categories%5B1%5D')]")
    for elem in elems:
        link = elem.get_attribute("href")
        if link not in all_categories:
            # parse out category name using regex
            name = re.search('[^=]+$', link).group()
            all_categories[name] = link
    
    # return a dictionary of {category, link} pairs
    return all_categories

For a given category url input, `getCompaniesFromCategory()` will expand the category's contents and find all companies within the category. 

By default, it will also analyze those companies, but that can be turned off by setting `recursive=False`

By setting `silent=False`, it will also log to console.



In [0]:
def getCompaniesFromCategory(url, recursive=True, silent=True):
    wd.get(url)

    # clicks the load more button at the bottom
    while True:
        try:
            loadMoreButton = wd.find_element_by_xpath("//a[contains(@class, '_3cr5 _5j3- _53m5 _7p5k _1s6a _quh _88-d')]")
            time.sleep(2)
            loadMoreButton.click()
            time.sleep(5)
            if not silent:
                print('clicked')
        except Exception as e:
            if not silent:
                print (e)
            break

    # populates an array of links for the category
    all_links = []
    elems = wd.find_elements_by_xpath("//a[contains(@class, '_3cr5 _8p_9 _53m5 _8xo6 _8xo4 _1s6c _93rg _8p_b')]")
    for elem in elems:
        if elem.get_attribute("href") not in all_links:
            all_links.append(elem.get_attribute("href"))

    # by default, recursively gets info for each link
    if recursive:
        all_resp = []
        for link in all_links:
            all_resp.append(getInfoForLink(link))
        return all_resp

    # otherwise, just returns the links
    else:
        return all_links

`getInfoForLink` is a function which will get the page source for a given url and output a dictionary containing the name, goal, solution, and result.

By setting `silent=False`, it will also log to console.

In [0]:
def getInfoForLink(url, silent=True):
    wd.get(url)

    # looks for client name, defaults to "[can't find]"
    try:
        name = wd.find_element_by_xpath("//h1").text
    except NoSuchElementException:
        name = "[can't find]"

    # looks for client goal, defaults to "[can't find]"
    try:
        goal = wd.find_element_by_class_name('_5yvq').text
    except NoSuchElementException:
        goal = "[can't find]"

    # looks for client solutions, defaults to "[can't find]"
    try:
        solution = ""
        for element in wd.find_elements_by_class_name("_4971"):
            solution += element.text
        if (solution == ""):
            solution = "[can't find]"
    except NoSuchElementException:
        solution = "[can't find]"

    # looks for client results, defaults to "[can't find]"
    try:
        results = wd.find_element_by_class_name('_8grt').text
    except NoSuchElementException:
        results = "[can't find]"
    
    # prints if requested
    if not silent:
        print("NAME: ", name)
        print("GOAL: ", goal)
        print("SOLUTION: ", solution)
        print("RESULTS: ", results)

    # returns dictionary containing all data for the client
    return {
        "name": name,
        "goal": goal, 
        "solution": solution, 
        "results": results, 
        "link": url
    }

# 2. Usage

Using `getAllCategories()`, create a subset called `industry_categories`. 

Also made a small subset called `test_categories` for debugging purposes.

In [0]:
# all_categories = getAllCategories()
industry_categories = {"automotive": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=automotive",
    "b2b": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=b2b",
    "consumer-goods": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=consumer-goods",
    "ecommerce": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=ecommerce",
    "education": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=education",
    "entertainment-media": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=entertainment-media",
    "financial-services": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=financial-services",
    "gaming": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=gaming",
    "health-pharmaceuticals": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=health-pharmaceuticals",
    "non-profits-organizations": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=non-profits-organizations",
    "professional-services": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=professional-services",
    "real-estate": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=real-estate",
    "restaurant": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=restaurant",
    "retail": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=retail",
    "sports": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=sports",
    "technology": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=technology",
    "telecommunication": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=telecommunication",
    "travel": "https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=travel"}

test_categories = {'automotive': 'https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=automotive',
    'b2b': 'https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=b2b'}

In [8]:
everything = {}
for category in industry_categories:
    link = industry_categories[category]
    print(category)
    print(link)
    everything[category]= getCompaniesFromCategory(link)

automotive
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=automotive
b2b
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=b2b
consumer-goods
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=consumer-goods
ecommerce
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=ecommerce
education
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=education
entertainment-media
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=entertainment-media
financial-services
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=financial-services
gaming
https://www.facebook.com/business/success/?categories%5B0%5D=small-business&categories%5B1%5D=gaming
health-pharmaceuticals
https://www.facebook.com/business

# 3. Data Analysis

In [0]:
print(json.dumps(everything, indent=4))

In [0]:
with open('result.json', 'w') as fp:
    json.dump(everything, fp)

In [11]:
pd.json_normalize(everything['ecommerce'])

Unnamed: 0,name,goal,solution,results,link
0,Unit 1,"This wearable tech company earned 4,700 leads ...",Unit 1 worked with Facebook Marketing Partner ...,"4,700 leads in 14 days\n2X return on ad spend\...",https://www.facebook.com/business/redirect/per...
1,The Hair Bow Company,This apparel and accessories online retailer l...,When The Hair Bow Company was looking for new ...,15% average increase in revenue attributable t...,https://www.facebook.com/business/redirect/per...
2,Hammitt,The luxury accessories brand used Facebook dyn...,Hammitt partnered with marketing agency MuteSi...,"Core audiences were selected based on gender, ...",https://www.facebook.com/business/redirect/per...
3,Bombas,The ecommerce sock company launched a unique P...,Bombas has increased its success with Facebook...,33% lift in conversions over business-as-usual...,https://www.facebook.com/business/redirect/per...
4,Perpetual Kid,The unique gifts online retailer listed its pr...,When Perpetual Kid learned about Facebook Mark...,21% average increase in revenue attributable t...,https://www.facebook.com/business/redirect/per...
...,...,...,...,...,...
186,ReSnap,The smart technology-based photobook company s...,ReSnap’s strategy focused on getting people ex...,"1,963X increase in turnover between September ...",https://www.facebook.com/business/success/2-re...
187,Boux Avenue,This premier lingerie brand used striking Face...,"With guidance from agency Tomorrow TTH, Boux A...",11.7X return on ad spend\n53% lower cost per a...,https://www.facebook.com/business/success/boux...
188,BYIC,This online women’s fashion store used dynamic...,Facebook has long been BYIC’s number one chann...,3.59X increase in revenue\n2.1X increase in ov...,https://www.facebook.com/business/success/byic
189,Perkbox,This employee rewards and engagement platform ...,As a fast growing company with a rapidly expan...,1.5% click-through rate\n9.3 million people re...,https://www.facebook.com/business/success/perkbox
