# Web Scraping Job Vacancies

## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Setup our development environment
2. Understand the basics of web scraping
3. Analyze the website structure of our job search platform
4. Write the Python code to extract job data from our job search platform
5. Save the data to a CSV file
6. Test our web scraper and refine our code as needed

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you may want to use the following packages in your Python environment:

- requests
- BeautifulSoup
- csv
- datetime

These packages should already be installed in Coursera's Jupyter Notebook environment, however if you'd like to install additional packages that are not included in this environment or are working off platform you can install additional packages using `!pip install packagename` within a notebook cell such as:

- `!pip install requests`
- `!pip install BeautifulSoup`

#### Objective:
To develop a web scraping tool that retrieves job postings from Google's career website based on a specified role. The tool will extract job title, location, seniority level, and minimum requirements, providing this information to users efficiently.

#### Project Steps:

##### 1. Requirements Gathering
Understand User Needs: Determine the specific roles and job details users are interested in.
Target Website: Identify the structure of Google's career website and the specific pages to scrape.

##### 2. Tool Selection
Web Scraping Tools: Use libraries like BeautifulSoup, Scrapy, or Selenium.
BeautifulSoup: For parsing HTML and XML documents.
Scrapy: For more advanced crawling and scraping needs.
Selenium: For dynamic content rendering that requires JavaScript.

##### 3. Data Fields to Extract
Job 

Title

Location

Seniority Level

Minimum Requirements

## Step 1: Importing Required Libraries

In [1]:
!pip install requests

You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.[0m


In [2]:
!pip install beautifulsoup4

You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.[0m


In [3]:
def create_url(url, position, site):
    if site.lower() == "google":
        return url + "/?q=%22"+ "%20".join(position.split())+"%22"
    else: 
        return url

In [4]:
position = input("Enter the job role you are interested in: ")

Enter the job role you are interested in: Software Engineer II


In [5]:
url = "https://www.google.com/about/careers/applications/jobs/results"

query_url = create_url(url,position, "Google")
print(query_url)

https://www.google.com/about/careers/applications/jobs/results/?q=%22Software%20Engineer%20II%22


In [6]:

import requests 

r = requests.get(query_url) 
# print(r.content) 

In [7]:
# Above it the raw data
from bs4 import BeautifulSoup


In [8]:
soup = BeautifulSoup(r.content, 'html.parser')

In [9]:
positions_soup = []
for pos in soup.find_all("div"):
    if "class" in pos.attrs:
        if "sMn82b" in pos["class"]:
            positions_soup.append(pos)

In [10]:
openings = []
for pos in  positions_soup:
    ch=[]
    for child in pos.children:
        ch.append(child)
#     print(ch[0].div.h3.string)
    
    spans=[]

    for span in ch[1].div.children:
        spans.append(span)
    locations = []
    for loc in spans[1].children:
        locations.append(loc.string)
        
    seniority = spans[2].span.div.button.span.string
    
    requirements= []
    
    for c in ch[3].ul.children:
        if "<class 'bs4.element.NavigableString'>" != str(type(c)):
            requirements.append(c.text)
    
    openings.append({"title":ch[0].div.h3.string, "locations":locations, "seniority":seniority, "requirements":requirements})
    


## Printing the data

In [11]:
for pos in openings:
    res = f'\nPosition        :       {pos["title"]}\n\nSeniority level     :      {pos["seniority"]} \n\nOpening at         :          '
    for loc in pos["locations"]:
        res+=loc+" "
    res+="\n\nMinimum Requirements      :       \n "
    for req in pos["requirements"]:
        res+=req+ "\n "
    res+="\n\n "+ "-"*100
    print(res)


Position        :       Software Engineer II, Site Reliability Engineering, Google Cloud

Seniority level     :      Early 

Opening at         :          place Dublin, Ireland ; San Bruno, CA, USA 

Minimum Requirements      :       
 Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
 Experience with data structures/algorithms and software development in one or more programming languages. 
 

 ----------------------------------------------------------------------------------------------------

Position        :       Software Engineer II, Engineering Productivity, YouTube

Seniority level     :      Early 

Opening at         :          place Bengaluru, Karnataka, India 

Minimum Requirements      :       
 Bachelor’s degree or equivalent practical experience.
 1 year of experience with software development in one or more programming languages (e.g., Python, C, C++, Java, JavaScript)
 1 year of experience with data structures or algorithms.
 

## Transformation of data extracted: 

In [12]:
data_in_csv_format = "title,Seniority,Locations,minimun requirements\n\r"

for pos in openings:
    data_in_csv_format+= " ".join(pos["title"].split(",")) + "," + pos["seniority"]+","
    location_data = ""
    for loc in pos["locations"]:
        location_data+= " ".join(loc.split(","))+"\t"
    data_in_csv_format+=location_data + ","
    needs = ""
    for req in pos["requirements"]:
        needs += " ".join(req.split(",")) +"\t"
    data_in_csv_format+=needs+ "\n"
    
    
with open("GoogleJobs.csv", "w") as f:
    f.write(data_in_csv_format)
    

# Another way to save file as csv

In [13]:
import csv

csv_file_path = 'GoogleOpenings.csv'
with open(csv_file_path, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data_in_csv_format)

print(f"CSV file '{csv_file_path}' created successfully.")

CSV file 'GoogleOpenings.csv' created successfully.


In [14]:
!ls

GoogleJobs.csv	GoogleOpenings.csv  WebScraping.ipynb
