In this step, we will use **Selenium + ChromeDriver** to scrape job postings from [Naukri.com](https://www.naukri.com).  
The key tasks are:
- Open Naukri job search page
- Extract job details (title, company, location, experience, description, skills, posted date)
- Handle pagination (multiple pages of results)
- Save data into `data/raw/job_postings.csv`

If live scraping fails, we will generate a **sample dataset** for testing further steps.


In [1]:
! pip install selenium pandas duckdb 



In [2]:
! pip install webdriver_manager



In [3]:
# Imports
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager

In [4]:
# Setup ChromeDriver
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--disable-blink-features=AutomationControlled")  # avoid detection


In [5]:
# Initialize driver
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)


In [6]:
# Navigate to Naukri job search page
url = "https://www.naukri.com/analyst-jobs-in-bengaluru"
driver.get(url)

In [7]:
time.sleep(5)  # wait for page to load
print("Page Title:", driver.title)

Page Title: Analyst Jobs In Bengaluru - 40589 Analyst Job Vacancies In Bengaluru - Naukri.com


In [8]:
# Find all job cards on the page
job_cards = driver.find_elements(By.CLASS_NAME, "cust-job-tuple")

print("Jobs found on page:", len(job_cards))



Jobs found on page: 20


In [9]:
jobs_data = []

In [10]:
for job in job_cards:
    try:
        title = job.find_element(By.CSS_SELECTOR, "a.title").text
    except:
        title = None

    try:
        company = job.find_element(By.CSS_SELECTOR, "a.comp-name").text
    except:
        company = None

    try:
        exp = job.find_element(By.CSS_SELECTOR, "span.expwdth").text
    except:
        exp = None

    try:
        location = job.find_element(By.CSS_SELECTOR, "span.locWdth").text
    except:
        location = None

    try:
        desc = job.find_element(By.CSS_SELECTOR, "span.job-desc").text
    except:
        desc = None

    try:
        skills = [li.text for li in job.find_elements(By.CSS_SELECTOR, "ul.tags-gt li")]
        skills = ", ".join(skills)
    except:
        skills = None

    try:
        posted = job.find_element(By.CSS_SELECTOR, "span.job-post-day").text
    except:
        posted = None

    jobs_data.append({
        "title": title,
        "company": company,
        "experience": exp,
        "location": location,
        "description": desc,
        "skills": skills,
        "posted": posted
    })


In [11]:
# Convert to DataFrame
df_page1 = pd.DataFrame(jobs_data)
df_page1.head()

Unnamed: 0,title,company,experience,location,description,skills,posted
0,Analyst,EY,,Bengaluru,,,Starts in 1-3 months
1,Analyst,Deloitte Consulting,3-6 Yrs,Bengaluru,Provide Level 1 & 2 support for CAD/CAE applic...,"PLM, Supply chain, Delmia, HyperMesh, Data man...",4 days ago
2,Analyst,Goldman Sachs,1-3 Yrs,Bengaluru,Bachelor s degree in a relevant fieldRequired ...,"python, project management, confluence, proces...",2 days ago
3,Analyst,Grant Thornton,0-1 Yrs,Bengaluru,Bachelor / Post Graduation degree along with ....,"microsoft office applications, data research, ...",Today
4,Analyst,Grant Thornton,2-4 Yrs,"Bengaluru, Kolkata",Conduct and interact with US team via Skype ca...,"analysts, project management, auditing, accoun...",1 day ago


In [12]:
all_jobs = []

In [13]:
# Let's scrape first 5 pages
for page in range(1, 55):
    url = f"https://www.naukri.com/analyst-jobs-in-bengaluru-{page}" if page > 1 else "https://www.naukri.com/analyst-jobs-in-bengaluru"
    driver.get(url)
    time.sleep(5)  # let page load
    
    job_cards = driver.find_elements(By.CLASS_NAME, "cust-job-tuple")
    print(f"Page {page} - Jobs found:", len(job_cards))
    
    for job in job_cards:
        try:
            title = job.find_element(By.CSS_SELECTOR, "a.title").text
        except:
            title = None

        try:
            company = job.find_element(By.CSS_SELECTOR, "a.comp-name").text
        except:
            company = None

        try:
            exp = job.find_element(By.CSS_SELECTOR, "span.expwdth").text
        except:
            exp = None

        try:
            location = job.find_element(By.CSS_SELECTOR, "span.locWdth").text
        except:
            location = None

        try:
            desc = job.find_element(By.CSS_SELECTOR, "span.job-desc").text
        except:
            desc = None

        try:
            skills = [li.text for li in job.find_elements(By.CSS_SELECTOR, "ul.tags-gt li")]
            skills = ", ".join(skills)
        except:
            skills = None

        try:
            posted = job.find_element(By.CSS_SELECTOR, "span.job-post-day").text
        except:
            posted = None

        all_jobs.append({
            "title": title,
            "company": company,
            "experience": exp,
            "location": location,
            "description": desc,
            "skills": skills,
            "posted": posted
        })


Page 1 - Jobs found: 20
Page 2 - Jobs found: 20
Page 3 - Jobs found: 20
Page 4 - Jobs found: 20
Page 5 - Jobs found: 20
Page 6 - Jobs found: 20
Page 7 - Jobs found: 20
Page 8 - Jobs found: 20
Page 9 - Jobs found: 20
Page 10 - Jobs found: 20
Page 11 - Jobs found: 20
Page 12 - Jobs found: 20
Page 13 - Jobs found: 20
Page 14 - Jobs found: 20
Page 15 - Jobs found: 20
Page 16 - Jobs found: 20
Page 17 - Jobs found: 20
Page 18 - Jobs found: 20
Page 19 - Jobs found: 20
Page 20 - Jobs found: 20
Page 21 - Jobs found: 20
Page 22 - Jobs found: 20
Page 23 - Jobs found: 20
Page 24 - Jobs found: 20
Page 25 - Jobs found: 20
Page 26 - Jobs found: 20
Page 27 - Jobs found: 20
Page 28 - Jobs found: 20
Page 29 - Jobs found: 20
Page 30 - Jobs found: 20
Page 31 - Jobs found: 20
Page 32 - Jobs found: 20
Page 33 - Jobs found: 20
Page 34 - Jobs found: 20
Page 35 - Jobs found: 20
Page 36 - Jobs found: 20
Page 37 - Jobs found: 20
Page 38 - Jobs found: 20
Page 39 - Jobs found: 20
Page 40 - Jobs found: 20
Page 41 -

In [14]:
# Quit after loop
driver.quit()


In [15]:
# Convert to DataFrame
df = pd.DataFrame(all_jobs)


In [16]:
# Show preview
df.head()

Unnamed: 0,title,company,experience,location,description,skills,posted
0,Analyst,EY,,Bengaluru,,,Starts in 1-3 months
1,Analyst,Deloitte Consulting,3-6 Yrs,Bengaluru,Provide Level 1 & 2 support for CAD/CAE applic...,"PLM, Supply chain, Delmia, HyperMesh, Data man...",4 days ago
2,Analyst,Goldman Sachs,1-3 Yrs,Bengaluru,Bachelor s degree in a relevant fieldRequired ...,"python, project management, confluence, proces...",2 days ago
3,Analyst,Grant Thornton,0-1 Yrs,Bengaluru,Bachelor / Post Graduation degree along with ....,"microsoft office applications, data research, ...",Today
4,Analyst,Grant Thornton,2-4 Yrs,"Bengaluru, Kolkata",Conduct and interact with US team via Skype ca...,"analysts, project management, auditing, accoun...",1 day ago


In [17]:
# Save as CSV
df.to_csv("/Users/priyankamalavade/Desktop/job-market-skill-analyzer/data/raw/naukri_jobs_bengaluru.csv", index=False)
print(" Saved to naukri_jobs_bengaluru.csv")

 Saved to naukri_jobs_bengaluru.csv


## DUCKDB

In [18]:
import duckdb

In [20]:
# Connect to your DuckDB database
con = duckdb.connect(r"/Users/priyankamalavade/Desktop/job-market-skill-analyzer/data/jobs.duckdb")


In [21]:
# Show tables
print(con.execute("SHOW TABLES").fetchdf())


       name
0  raw_jobs


In [22]:
# Count rows
print(con.execute("SELECT COUNT(*) FROM raw_jobs").fetchdf())


   count_star()
0          1080


In [23]:
# Preview first 5 rows
print(con.execute("SELECT * FROM raw_jobs LIMIT 5").fetchdf())

     title              company experience            location  \
0  Analyst                   EY       None           Bengaluru   
1  Analyst  Deloitte Consulting    3-6 Yrs           Bengaluru   
2  Analyst        Goldman Sachs    1-3 Yrs           Bengaluru   
3  Analyst       Grant Thornton    0-1 Yrs           Bengaluru   
4  Analyst       Grant Thornton    2-4 Yrs  Bengaluru, Kolkata   

                                         description  \
0                                               None   
1  Provide Level 1 & 2 support for CAD/CAE applic...   
2  Bachelor s degree in a relevant fieldRequired ...   
3  Bachelor / Post Graduation degree along with ....   
4  Conduct and interact with US team via Skype ca...   

                                              skills                posted  
0                                               None  Starts in 1-3 months  
1  PLM, Supply chain, Delmia, HyperMesh, Data man...            4 days ago  
2  python, project management, conf