<a href="https://colab.research.google.com/github/meabhaykr/Job-Listing-Web-Scraper-with-Python/blob/main/Abhay_Kumar_Numerical_Programming_in_Python_Analyze_it_Yourself.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Job Listing Web Scraper with Python



##### **Project Type**    - Web Scraping
##### **Contribution**    - Individual
##### **Name**            - Abhay Kumar
##### **Cohort**            - Monaco

## **General Guidelines** : -


### **Problem Statement: Navigating the Data Science Job Landscape**

🚀 Unleash your creativity in crafting a solution that taps into the heartbeat of the data science job market! Envision an ingenious project that seamlessly wields cutting-edge web scraping techniques and illuminating data analysis.

🔍 Your mission? To engineer a tool that effortlessly gathers job listings from a multitude of online sources, extracting pivotal nuggets such as job descriptions, qualifications, locations, and salaries.

🧩 However, the true puzzle lies in deciphering this trove of data. Can your solution discern patterns that spotlight the most coveted skills? Are there threads connecting job types to compensation packages? How might it predict shifts in industry demand?

🎯 The core objectives of this challenge are as follows:

1. Web Scraping Mastery: Forge an adaptable and potent web scraping mechanism. Your creation should adeptly harvest data science job postings from a diverse array of online platforms. Be ready to navigate evolving website structures and process hefty data loads.

2. Data Symphony: Skillfully distill vital insights from the harvested job listings. Extract and cleanse critical information like job titles, company names, descriptions, qualifications, salaries, locations, and deadlines. Think data refinement and organization.

3. Market Wizardry: Conjure up analytical tools that conjure meaningful revelations from the gathered data. Dive into the abyss of job demand trends, geographic distribution, salary variations tied to experience and location, favored qualifications, and emerging skill demands.

4. Visual Magic: Weave a tapestry of visualization magic. Design captivating charts, graphs, and visual representations that paint a crystal-clear picture of the analyzed data. Make these visuals the compass that guides users through job market intricacies.

🌐 While the web scraping universe is yours to explore, consider these platforms as potential stomping grounds:

* LinkedIn Jobs
* Indeed
* Naukri
* Glassdoor
* AngelList

🎈 Your solution should not only decode the data science job realm but also empower professionals, job seekers, and recruiters to harness the dynamic shifts of the industry. The path is open, the challenge beckons – are you ready to embark on this exciting journey?






## **GitHub Link -**

https://github.com/meabhaykr/Job-Listing-Web-Scraper-with-Python

## **Project Summary:**

The project involves web scraping job listings from JobInventory.com for a specific search query and location, cleaning and processing the data, and storing it in a CSV file for future use. It utilizes Python, the requests library for making GET requests, BeautifulSoup for parsing HTML content, and regular expressions for data cleaning. The main objectives are to gather job details such as job title, company name, location, and job description. The scraping process is designed to handle multiple pages of job listings, and the resulting data is stored in a structured format for further analysis.

## **Explanation:**

The project begins by defining the search query ("data scientist") and the location ("New York City, NY"), along with the base URL of JobInventory.com. It sets up empty lists to store the job details, including titles, companies, locations, and descriptions.

A loop is used to iterate through multiple pages of job listings, with the maximum number of pages limited to 5 in this example. For each page, the script constructs the URL with the appropriate parameters and sends a GET request to retrieve the HTML content. BeautifulSoup is used to parse the HTML, and job listings are identified using the specified HTML class.

The script then iterates through each job listing on the current page, extracting job details like title, company, location, and description. These details are stored in the respective lists.

After scraping all the job listings, the project uses regular expressions to clean up the job descriptions by removing extra white spaces and separating the descriptions from the location information.

A Pandas DataFrame is created to structure and organize the scraped data, with columns for job title, company, location, and cleaned job description. Finally, the DataFrame is exported to a CSV file named "job_listings.csv."

### Requirements:
Before you start, make sure you have the following libraries installed in your Python environment:

1. requests
2. BeautifulSoup (bs4)
3. pandas

### Project Steps:

1. Define Search Query and Location:
   - Set your desired search query (e.g., "data scientist") and location (e.g., "New York City, NY").

2. Construct the Base URL:
   - Define the base URL for JobInventory.com.

3. Create Lists to Store Job Details:
   - Initialize empty lists for job titles, companies, locations, and descriptions.

4. Scrape Job Listings from Multiple Pages:
   - We use a loop to scrape job listings from multiple pages, up to a specified limit (e.g., 5 pages).
   - We construct the URL for each page by incrementing the "start" parameter.
   - Send a GET request to the URL and parse the HTML content using BeautifulSoup.
   - Find all the job listings on the page and loop through each one.
   - Extract job details such as title, company, location, and description and append them to the respective lists.
   - If there are no job listings on the current page, we break out of the loop.

5. Clean Up Job Descriptions:
   - Use regular expressions to clean up job descriptions, removing extra white spaces and other unwanted characters.

6. Create a Pandas DataFrame:
   - Store the job details in a Pandas DataFrame, with columns for Title, Company, Location, and Description.

7. Export to CSV:
   - Export the DataFrame to a CSV file for future analysis.

This project demonstrates how to scrape job listings from JobInventory.com across multiple pages, extract key information, and save it for further analysis. The skills learned in this project can be applied to various web scraping tasks, enabling you to collect data from websites efficiently and effectively.

In [None]:
# Import necessary libraries

import requests  # To make HTTP requests
from bs4 import BeautifulSoup  # To parse HTML content
import re  # Regular expressions for text cleaning
import pandas as pd  # To work with data in tabular form

# Define the search query and location
search_query = "data scientist"
location = "New York City, NY"

# Construct the base URL for job listings
# The website where you'll be scraping job listings
base_url = "http://www.jobinventory.com"

# Define empty lists to store job details
titles = []
companies = []
locations = []
descriptions = []

# Set the maximum number of pages to scrape
max_pages = 5
page_num = 1

# Loop through pages to scrape job listings
while page_num <= max_pages:

    # Construct the URL for the current page
    url = f"{base_url}/search?q={search_query}&l={location}&start={page_num}"

    # Send a GET request to the URL to retrieve the HTML content
    response = requests.get(url)

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(response.content, "html.parser")

    # Find all the job listings on the page
    job_listings = soup.find_all("li", class_="resultBlock")

    # If there are no job listings on the current page, we have reached the end of the results
    if not job_listings:
        break

    # Loop through each job listing and extract the relevant details
    for job in job_listings:
        title = job.find("div", class_="title").text.strip()
        company = job.find("span", class_="company").text.strip()
        location = job.find("div", class_="state").text.split("\xa0-\xa0")[-1].strip()
        description = job.find("div", class_="description").text.strip()

        # Append the extracted details to respective lists
        titles.append(title)
        companies.append(company)
        locations.append(location)
        descriptions.append(description)

    # Increment the page number for the next iteration
    page_num += 1

# Clean up the job descriptions using regular expressions
regex = re.compile(r"\s+")
clean_descriptions = [regex.sub(" ", d).split(" - ")[1] for d in descriptions]

# Create a Pandas DataFrame to store the job details
df = pd.DataFrame(
    {
        "Job Title": titles,
        "Company Name": companies,
        "Location": locations,
        "Description": clean_descriptions,
    }
)

# Export the DataFrame to a CSV file
df.to_csv("job_listings.csv", index=False)  # Save the job details to a CSV file

# Print a message to indicate that scraping is complete
print("Scraping has been successfully completed. Check out 'job_listings.csv' for the results.")

# Display the first 40 rows of the DataFrame as a sample
df.head(40)


Scraping has been successfully completed. Check out 'job_listings.csv' for the results.


Unnamed: 0,Job Title,Company Name,Location,Description
0,Lead Data Scientist,Tiro,"New York, NY",Lead Data Scientist Enigma is seekingand visua...
1,Lead Data Scientist,Thomas,"New York, NY",looking for a Lead Data Scientist to lead and ...
2,Scientist Data Curator - Remote,Rancho BioSciences LLC,"Newark, NJ",work and its quality. * Collaborate frequently...
3,Data Scientist,Simpson Thacher & Bartlett LLP,"New York, NY",Simpson Thacher's Data Scientist will play a k...
4,Data Scientist & Machine Learning Specialist,Vaco Technology,"New York, NY",Title: Data Scientist & Machine as data modeli...
5,Customer Data Scientist- NYC,h2o.ai,"New York, NY",Opportunity Are you a data scientist or machin...
6,Principal Data Scientist,"InVitro Cell Research, LLC","Leonia, NJ",Principal Data Scientist with expertise in pre...
7,Machine learning/Data scientist,Themesoft Inc.,"Hoboken, NJ","Machine learning/Data scientist Hoboken, NJdev..."
8,Data Scientist - LLM,Spartan Technologies,"New York, NY",an experienced Data Scientist
9,Scientist,Talent Software Services,"Rahway, NJ",Services is in search of a Scientist for a con...


## **Conclusion:**

In conclusion, this project demonstrates a web scraping process for collecting job listings from JobInventory.com based on a specific search query and location. It provides a practical example of using Python libraries such as requests, BeautifulSoup, and Pandas to automate data extraction and organization.

The script is designed to handle multiple pages of job listings, making it a useful tool for aggregating a substantial amount of data for analysis or job search purposes. The use of regular expressions for data cleaning and structuring enhances the quality of the extracted information.

By exporting the data to a CSV file, the project ensures that the scraped job listings can be easily stored, shared, and further analyzed. This project serves as a valuable template for similar web scraping tasks and can be customized for different websites and data requirements.