---
title: "Webscraping Indeed Job Portal"
description: "webscraping with python "
author: "Aakash Basnet"
date: "2024/02/03"
categories:
  - webscraping
  - code
  - ETL
  - python
format:
  html:
    code-fold: true
jupyter: python3
---

##  Building URL
After navigating the developer toolbar for Indeed job listing, I found the pattern in the url query for each job title search and location. We can use this info to build the url. The link printed from the code below will take you to the Indeed page having listing for python developer in Dalla, TX

In [None]:
def url_builder(job_title, location, page_number=10 ):
    job_title = "+".join(job_title.split(" "))
    location = "+".join(location.split(" "))
    base_url = "https://www.indeed.com/jobs"
    query_str = f"?q={job_title}&l={location}"
    url = f"{base_url}{query_str}"
     
    return url

print(url_builder(job_title="python developer", location="Dallas, TX"))

## Scraping the indeep page with selenium
The script below scrapes the data for the given job title and location. It uses selenium web driver to automate the data scraping. The web driver clicks 'next' button on pagination until the end of the page.

In [9]:
from selenium import webdriver
from selenium.webdriver.common.by import By

import time

def get_data(job_title, location):

    url = url_builder(job_title=job_title, location=location)
    driver = webdriver.Chrome()
    driver.get(url)

    jobs = []
    has_next = True
    count = 1
    while has_next:
        
        time.sleep(2)
        cards = driver.find_elements(By.CLASS_NAME,'cardOutline')
        for card in cards:
            job_title = card.find_element(By.CLASS_NAME,'jobTitle')
            job_title_text = job_title.text
            job_id = job_title.find_element(By.TAG_NAME, 'a').get_attribute('data-jk')
            location = card.find_element(By.CLASS_NAME,'company_location').text
            job_description = card.find_element(By.CLASS_NAME,'underShelfFooter').text
            
            try:
                pay,*_metadata = card.find_element(By.CLASS_NAME,'heading6').text.split('\n')
            except Exception as e:
                pay = 'NA'
                _metadata = []
        
            
            jobs.append({
                'job_title':job_title_text,
                'location': location,
                'description': job_description,
                'pay rate': pay,
                'metadata': _metadata,
                'job_id':job_id,
                'job_url': f"https://www.indeed.com/viewjob?jk={job_id}",
                
            })

        try:
            driver.find_element(By.CSS_SELECTOR,"[data-testid='pagination-page-next']").click()
            count += 1
        except Exception as e:
            print(f"Ending at page {count}")
            has_next = False

    driver.close()
    return jobs



In [10]:
job_data = get_data(job_title='python developer', location='Fort Worth,TX')



Ending at page 40


In [11]:
import pandas as pd
df = pd.DataFrame(job_data)
df.head(40)

Unnamed: 0,job_title,location,description,pay rate,metadata,job_id,job_url
0,Software Developer - AI Trainer (Contract),"DataAnnotation\n4.6\nRemote in Dallas, TX",You will work with the chatbots that we are bu...,$40 an hour,"[Contract, 1 to 40 hours per week, Choose your...",cd937da8c0f30efd,https://www.indeed.com/viewjob?jk=cd937da8c0f3...
1,Python Developer,"Robert Half\n3.9\nPlano, TX 75024",Periodically exercise code-review and band tog...,$76 - $88 an hour,[Temp-to-hire],aeb409d79f39f2e1,https://www.indeed.com/viewjob?jk=aeb409d79f39...
2,Back End Developer,"IWP Services, LLC\nHybrid remote in Fort Worth...",Integration of user-facing elements developed ...,"$100,000 - $110,000 a year","[Full-time, +1, Monday to Friday, Employee sto...",dc3e79c2af261bbe,https://www.indeed.com/viewjob?jk=dc3e79c2af26...
3,DevOps Engineer (309801),Internal Data Resources\n3.7\nRemote in Coppel...,Our client is looking for a DevOps Engineer to...,$55 - $63 an hour,"[Full-time, 40 hours per week, 8 hour shift]",1ebb8e02e8eb7b07,https://www.indeed.com/viewjob?jk=1ebb8e02e8eb...
4,Python Developer,"Emonics LLC\nFort Worth, TX 76107 \n(Arlington...",8.Coordinating with front-end developers..\nTo...,"$70,000 - $130,000 a year","[Full-time, +1]",8d0dfc501ff3fb89,https://www.indeed.com/viewjob?jk=8d0dfc501ff3...
5,Software Engineer III (API / Scripting / Python),"JPMorgan Chase & Co\n3.9\nPlano, TX 75024",Your collaboration will be crucial in advancin...,Pay information not provided,[Full-time],b7109d2b1ce04d16,https://www.indeed.com/viewjob?jk=b7109d2b1ce0...
6,Software Developer,Boston Enterprises Investment Group LLC\nDeSot...,The ideal candidate will be passionate about d...,"$68,000 - $77,000 a year","[Full-time, +1, 8 hour shift]",8e5cb88d329ff384,https://www.indeed.com/viewjob?jk=8e5cb88d329f...
7,Principal Artificial Intelligence / Machine Le...,"Raytheon\n3.9\nRichardson, TX 75082","In this role, you will work with data scientis...","$96,000 - $200,000 a year",[Full-time],ab702435728c8649,https://www.indeed.com/viewjob?jk=ab702435728c...
8,Python Developer,"Qatalys Software Technologies\nIrving, TX",Analyzes business and technical requirements t...,,[],a2a8cbd917cbf446,https://www.indeed.com/viewjob?jk=a2a8cbd917cb...
9,Software Engineer - Mid-Career (HYBRID TELEWORK),"Lockheed Martin Corporation\nFort Worth, TX","Design, modify, develop, write, and implement ...",Pay information not provided,"[Full-time, 4x10]",91cdaee932850a17,https://www.indeed.com/viewjob?jk=91cdaee93285...


In [8]:
df.shape

(518, 7)