### There are three imports required, the first two being for our web scraper, and the third one for the analysis of the data we scrape

In [1]:
from splinter import Browser
from bs4 import BeautifulSoup
import pandas as pd

### Establish path that your computer can use to find chromedriver.exe, which this script assumes is in the same folder as this notebook.

#### Mac Users

In [None]:
# # https://splinter.readthedocs.io/en/latest/drivers/chrome.html
# !which chromedriver

In [None]:
# executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
# browser = Browser('chrome', **executable_path, headless=False)

#### Windows Users

In [88]:
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

### Establish base URL from which to scrape open positions, and use it to define our browser

In [89]:
url = 'https://careers.zillowgroup.com/List-Jobs/location/San-francisco/'
browser.visit(url)

### Each job posting is a "td" element injected into the page. We know this from right-clicking on the page itself and clicking on "Inspect" (or pressing Ctrl+Shift+I), and then clicking on the box-and-arrow icon seen in the top-left corcer of the image below, next to the word "Elements" underlined in blue. This is the "Inspect Element" tool in the Chrome Browser web developer toolkit, which allows you to understand how each element of any web page fits into the overall structure of the page. We can use the information we gain by using the Inspect Element tool to pull what we are seeking for analysis.

![](Images/Job_Posting.png)

### The "page-turner" was created with a text descriptor ("arrow-e"), which we will use to, well, turn the page and continue scraping.

![](Images/Arrow.png)

### Parse all job titles from page, print page number and job titles, stick them in a list, "click" next page arrow, and repeat until the end of page 10.

In [114]:
titles = []

for x in range(1, 11):

    html = browser.html
    soup = BeautifulSoup(html, 'html.parser')

    quotes = soup.find_all('td', class_='JobTitle-cell')

    for quote in quotes:
        print('page:', x, '-------------')
        print(quote.text)
        titles.append(quote.text)

    browser.click_link_by_partial_text('arrow-e')

page: 1 -------------
DevOps Manager
page: 1 -------------
DevOps Engineer
page: 1 -------------
Associate Visual Designer (contract)
page: 1 -------------
Principal Applied Scientist - Document Understanding AI
page: 1 -------------
Senior Technical Recruiter
page: 1 -------------
Principal Software Engineer (Trulia Shopping API Team)
page: 1 -------------
Senior Software Engineer (Trulia Shopping API Team)
page: 1 -------------
Director of Product & Engineering
page: 1 -------------
Senior Software Engineer, Android
page: 2 -------------
DevOps Manager
page: 2 -------------
DevOps Engineer
page: 2 -------------
Associate Visual Designer (contract)
page: 2 -------------
Principal Applied Scientist - Document Understanding AI
page: 2 -------------
Senior Technical Recruiter
page: 2 -------------
Principal Software Engineer (Trulia Shopping API Team)
page: 2 -------------
Senior Software Engineer (Trulia Shopping API Team)
page: 2 -------------
Director of Product & Engineering
page: 2 

### Let's take a peek at what we have so far. It's not that beautiful or as helpful as it could be yet, but we will clean it up and arrive at some insights shortly.

In [116]:
titles = pd.DataFrame(titles)
titles.head(5)

Unnamed: 0,0
0,DevOps Manager
1,DevOps Engineer
2,Associate Visual Designer (contract)
3,Principal Applied Scientist - Document Underst...
4,Senior Technical Recruiter


### Rename column and create a new one with counts of job titles

In [117]:
titles = titles.rename({0 : 'Job Title'}, axis = 1)
titles = pd.DataFrame(titles.groupby('Job Title')['Job Title'].count())
titles = titles.rename({'Job Title' : 'Count'}, axis = 1)
titles = titles.reset_index()
titles = titles.sort_values('Count', ascending = False)
titles.head(5)

Unnamed: 0,Job Title,Count
0,Associate Visual Designer (contract),10
1,DevOps Engineer,10
2,DevOps Manager,10
3,Director of Product & Engineering,10
4,Principal Applied Scientist - Document Underst...,10


### How many job titles were scraped?

In [123]:
titles.Count.sum()

90

### How many contain the word (or abbreviation for) "senior"?

In [119]:
senior_count = titles[(titles['Job Title'].str.contains('Senior')) | (titles['Job Title'].str.contains('Sr.'))]
senior_count.Count.sum()

30

### Meaning that 60 shouldn't...

In [120]:
junior_count = titles[~titles.isin(senior_count)].dropna()
int(junior_count.Count.sum())

60

### Excellent. Let's see what we've got:

In [121]:
junior_count

Unnamed: 0,Job Title,Count
0,Associate Visual Designer (contract),10.0
1,DevOps Engineer,10.0
2,DevOps Manager,10.0
3,Director of Product & Engineering,10.0
4,Principal Applied Scientist - Document Underst...,10.0
5,Principal Software Engineer (Trulia Shopping A...,10.0


### How many Data Analyst positions??

In [122]:
len(titles[titles['Job Title'].str.contains('Data Analyst')])

0

### Oh, okay. Next.