---
title: "Webscraping Indeed Job Portal"
description: "webscraping with python "
author: "Aakash Basnet"
date: "2024/02/03"
categories:
  - webscraping
  - code
  - ETL
  - python
format:
  html:
    code-fold: true
jupyter: python3
---

##  Building URL
After navigating the developer toolbar for Indeed job listing, I found the pattern in the url query for each job title search and location. We can use this info to build the url. The link printed from the code below will take you to the Indeed page having listing for python developer in Dalla, TX

In [70]:
def url_builder(job_title, location, page_number=10 ):
    job_title = "+".join(job_title.split(" "))
    location = "+".join(location.split(" "))
    base_url = "https://www.indeed.com/jobs"
    query_str = f"?q={job_title}&l={location}"
    url = f"{base_url}{query_str}"
     
    return url

print(url_builder(job_title="python developer", location="Dallas, TX"))

https://www.indeed.com/jobs?q=python+developer&l=Dallas,+TX


## Scraping the indeep page with selenium
The script below scrapes the data for the given job title and location. It uses selenium web driver to automate the data scraping. The web driver clicks 'next' button on pagination until the end of the page.

In [71]:
from selenium import webdriver
from selenium.webdriver.common.by import By

import time

def get_data(job_title, location):

    url = url_builder(job_title=job_title, location=location)
    driver = webdriver.Chrome()
    driver.get(url)

    jobs = []
    has_next = True
    count = 1
    while has_next:
        
        time.sleep(2)
        cards = driver.find_elements(By.CLASS_NAME,'cardOutline')
        for card in cards:
            job_title = card.find_element(By.CLASS_NAME,'jobTitle').text
            job_title_text = job_title.text
            job_id = job_title.find_element(By.TAG_NAME, 'a').get_attribute('data-jk')
            location = card.find_element(By.CLASS_NAME,'company_location').text
            job_description = card.find_element(By.CLASS_NAME,'underShelfFooter').text
            
            try:
                pay,*_metadata = card.find_element(By.CLASS_NAME,'heading6').text.split('\n')
            except Exception as e:
                pay = 'NA'
                _metadata = []
        
            
            jobs.append({
                'job_id':job_id,
                'job_url': f"https://www.indeed.com/viewjob?jk={job_id}",
                'job_title':job_title,
                'location': location,
                'description': job_description,
                'pay rate': pay,
                'metadata': _metadata
            })

        try:
            driver.find_element(By.CSS_SELECTOR,"[data-testid='pagination-page-next']").click()
            count += 1
        except Exception as e:
            print(f"Ending at page {count}")
            has_next = False

    driver.close()
    return jobs



In [72]:
import pandas as pd

job_data = get_data(job_title='python developer', location='Fort Worth,TX')
df = pd.DataFrame(job_data)
df.head(20)

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


Ending at page 20


Unnamed: 0,job_id,job_url,job_title,location,description,pay rate,metadata
0,507a2954dc423d0d,https://www.indeed.com/viewjob?jk=507a2954dc42...,<selenium.webdriver.remote.webelement.WebEleme...,"Fidelity TalentSource\n3.6\nWestlake, TX 76262","Are you ready to develop cutting edge, best-in...",Pay information not provided,[Temporary]
1,cd937da8c0f30efd,https://www.indeed.com/viewjob?jk=cd937da8c0f3...,<selenium.webdriver.remote.webelement.WebEleme...,"DataAnnotation\n4.6\nRemote in Dallas, TX",You will work with the chatbots that we are bu...,$40 an hour,"[Contract, 1 to 40 hours per week, Choose your..."
2,3d210c1baa3a3633,https://www.indeed.com/viewjob?jk=3d210c1baa3a...,<selenium.webdriver.remote.webelement.WebEleme...,"Signify Health\n3.0\nDallas, TX 75201","Build and maintain business-critical, enterpri...","$118,000 - $189,700 a year",[]
3,8f182ee73fc19368,https://www.indeed.com/viewjob?jk=8f182ee73fc1...,<selenium.webdriver.remote.webelement.WebEleme...,"PCI Enterprises\n4.3\nDallas, TX","2+ years experience with Javascript, HTML5 & C...","$65,000 - $75,000 a year","[Full-time, Monday to Friday, +1]"
4,32ded569bf22c1ac,https://www.indeed.com/viewjob?jk=32ded569bf22...,<selenium.webdriver.remote.webelement.WebEleme...,"Ford Audio Video\n3.0\nDallas, TX",Ford offers two types of Control System Progra...,"$60,000 - $110,000 a year","[Full-time, Monday to Friday]"
5,3ef4be01459f25a0,https://www.indeed.com/viewjob?jk=3ef4be01459f...,<selenium.webdriver.remote.webelement.WebEleme...,"EasyHiring\nDallas, TX","As a GCP security engineer, you will be a part...","$99,000 - $148,000 a year",[Full-time]
6,dc3e79c2af261bbe,https://www.indeed.com/viewjob?jk=dc3e79c2af26...,<selenium.webdriver.remote.webelement.WebEleme...,"IWP Services, LLC\nHybrid remote in Fort Worth...",Integration of user-facing elements developed ...,"$100,000 - $110,000 a year","[Full-time, +1, Monday to Friday]"
7,8d0dfc501ff3fb89,https://www.indeed.com/viewjob?jk=8d0dfc501ff3...,<selenium.webdriver.remote.webelement.WebEleme...,"Emonics LLC\nFort Worth, TX 76107 \n(Arlington...",8.Coordinating with front-end developers..\nTo...,"$70,000 - $130,000 a year","[Full-time, +1]"
8,8e5cb88d329ff384,https://www.indeed.com/viewjob?jk=8e5cb88d329f...,<selenium.webdriver.remote.webelement.WebEleme...,Boston Enterprises Investment Group LLC\nDeSot...,The ideal candidate will be passionate about d...,"$68,000 - $77,000 a year","[Full-time, +1, 8 hour shift]"
9,91cdaee932850a17,https://www.indeed.com/viewjob?jk=91cdaee93285...,<selenium.webdriver.remote.webelement.WebEleme...,"Lockheed Martin Corporation\nFort Worth, TX","Design, modify, develop, write, and implement ...",Pay information not provided,"[Full-time, 4x10]"
