### **Web Scrapping**
#### • Web scraping is the process of extracting information from websites.
#### • Two popular tools for this task are Selenium and BeautifulSoup.
#### • Each has its strengths and is often used together to leverage their combined capabilities.

### **BeautifulSoup**
#### • BeautifulSoup is a library for parsing HTML and XML documents.
#### • It helps to navigate the HTML structure, search for elements, and extract data.
#### • It is particularly effective for handling and cleaning up the HTML after fetching it, making it easier to extract the desired information.

### **Selenium**
#### • Selenium is a powerful tool for automating web browsers.
#### • It can simulate user interactions, such as clicking buttons, filling out forms, and scrolling through pages.
#### • This makes it especially useful for scraping dynamic content that is loaded via JavaScript or requires user actions.

## Import Required Packages

In [1]:
import pandas as pd

# requests
import requests

# BeautifulSoup
from bs4 import BeautifulSoup

import time

# selenium
from selenium import webdriver 
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

# BeautifulSoup
from bs4 import BeautifulSoup


### ***Libraries Used***

#### ***Pandas***
#### • This library is used for data manipulation and analysis.
#### • It provides powerful data structures like DataFrames which are great for organizing and analyzing data scraped from websites.
#### • For example, after scraping data you can use pandas to clean and save the data in various formats like CSV or Excel.

#### ***Requests***
#### • This library allows you to send HTTP requests using Python.
#### • It's often used to fetch the HTML content of a webpage.
#### • With requests, you can easily retrieve the page source which can then be parsed to extract the desired information.

#### ***BeautifulSoup***
#### • This library is used for parsing HTML and XML documents.
#### • It makes it easy to navigate and search the HTML structure of a webpage.
#### • After retrieving the HTML content using requests you can use BeautifulSoup to parse and extract specific elements of the webpage.

#### ***Selenium***
#### • This library is used for automating web browsers.
#### • It allows you to interact with web pages, which is particularly useful for scraping dynamic content that is loaded via JavaScript.
#### • Selenium can simulate user interactions such as clicking buttons, filling out forms and scrolling.

#### ***Webdriver_manager***
#### • This library is used to automatically manage browser drivers for Selenium.
#### • Instead of manually downloading and setting up browser drivers, webdriver_manager handles the installation and setup for you.
#### • This simplifies the process of ensuring you have the correct driver version for your browser.

### ***Selinum Used Functions***
##### *Webdriver, By, Chrome Service, Chrome Drive Manager,  Web Driver Wait, Expected Conditions, Action Chains*

#### Open Chrome Browser

In [13]:
# Setting Up Selenium WebDriver
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

#### Login Naukri

In [14]:
def login_naukri():
    
    # ==== Your Naukri Login Credentials ====
    EmailId = 'sivaraman11velmurugan@gmail.com'
    Pass_Word = '$iva5688'
    
    # Step 1: Open Naukri login page
    driver.get("https://www.naukri.com/nlogin/login?URL=https://www.naukri.com/mnjuser/homepage")
    time.sleep(1)
    
    # Step 2: Enter email and password
    driver.find_element(By.ID, 'usernameField').send_keys(EmailId)
    driver.find_element(By.ID, 'passwordField').send_keys(Pass_Word)
    
    # Step 3: Click login button
    driver.find_element(By.XPATH, "//button[@type='submit']").click()
    time.sleep(3)  # Wait to ensure login is complete
    
    print("✅ Logged in successfully!")

login_naukri()

✅ Logged in successfully!


#### Search Job related Keywords

In [4]:
def searchBox_Activity(SEARCH_KEYWORDS, LOCATION):
    
    # Step 5: Find search box input field
    search_box = driver.find_element(By.CLASS_NAME, 'nI-gNb-sb__main').click()
    
    ## Step 5: 
    ## Find the seacrch box
    search_box_keywords = driver.find_element(By.XPATH, "//input[@placeholder = 'Enter keyword / designation / companies']")
    
    ## Clear and type your keyword
    search_box_keywords.clear()
    #search_box_keywords.send_keys("Data Analytics")
    search_box_keywords.send_keys(", ".join(SEARCH_KEYWORDS))
    
    ## Step 6:
    # Click on the Experience dropdown input box
    exp_input = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.ID, "experienceDD"))
    )
    exp_input.click()
    
    ##  Click on "1 year" option
    one_year_option = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, "//li[@title='1 year']"))
    )
    one_year_option.click()
    
    ## Step 6: 
    # Find search box location input field 
    search_box_location = driver.find_element(By.XPATH, "//input[@placeholder= 'Enter location']")
    search_box_location.send_keys(", ".join(LOCATION))
    
    # Click the search box
    search_box = driver.find_element(By.CLASS_NAME, 'nI-gNb-sb__icon-wrapper').click()



#### Using Filters for ***Department*** specific job selection

In [5]:
### Using Filter Option from Naukri
def filter_department(Dept_Keys):

    ##  Department Field Open and Close
    # Intially It's open only so Currently no need
    '''
    # Expand Department filter (optional, if collapsed)
    dept_section = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, "//span[text()='Department']"))
    )
    dept_section.click()

    time.sleep(1)
    '''
    # Select "Data Science & Analytics" checkbox
    DS_DA_checkbox = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, f"//div[@class='styles_chckBoxCont__t_dRs']//label[contains(., '{Dept_Keys}')]"))
    )
    
    DS_DA_checkbox.click()


#### Using Filters for ***Role*** specific job selection

In [6]:
# Filter the specific role 
def select_role(option_text):
    '''
    # Expand Role filter (optional, if collapsed)
    role_section = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, "//span[text()='Role category']"))
    )
    #role_section.click()
    '''
    DS_DA_checkbox = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, f"//div[@class='styles_chckBoxCont__t_dRs']//label[contains(., '{option_text}')]"))
    )
    DS_DA_checkbox.click()
    print(f"✅ Selected option: {option_text}")



#### Using Filters for ***Location*** specific job selection

In [7]:
## Filter Location

def filter_location(location):
    DS_DA_checkbox = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, f"//div[@class='styles_chckBoxCont__t_dRs']//label[contains(., '{location}')]"))
    )
    DS_DA_checkbox.click()
    print(f"✅ Selected option: {location}")



#### Using Filters for ***Salary*** specific job selection

In [8]:
## Filter Salary
def filter_salary(salary):
    
    DS_DA_checkbox = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, f"//div[@class='styles_chckBoxCont__t_dRs']//label[contains(., '{salary}')]"))
    )
    
    '''
    DS_DA_checkbox = WebDriverWait(driver, 10).until(
    EC.element_to_be_clickable((By.XPATH, f"//label[contains(@class, 'styles_chkLbl__n2x09') and contains(., '{salary}')]"))
    )
    '''
    DS_DA_checkbox.click()
    print(f"✅ Selected option: {salary}")



https://www.naukri.com/data-analyst-jobs-in-chennai?k=data%20analyst&l=chennai%2C%20coimbatore%2C%20bangalore&nignbevent_src=jobsearchDeskGNB&experience=1&functionAreaIdGid=3&cityTypeGid=97&cityTypeGid=183&cityTypeGid=184&ctcFilter=3to6


#### ***Scrapping the Company Details data which is get after applying search key words, filter for specific job list***
#### Column Name - *Job Role, Company Name, Company URL, Experience, Location, Technical Skills, Rating, Review, Job Posted*

In [9]:

def scrapping():
    # Step 1 : Get current page url
    base_url = driver.current_url
    print(base_url)
    
    # Step 2: Get page source and parse
    soup = BeautifulSoup(driver.page_source, "html.parser")
    
    # Step 3: Find job cards using your div structure
    job_cards = soup.select('div.cust-job-tuple')
    
    jobs = []
    
    for card in job_cards:
        try:
            # Company Name
            company = card.select_one('a.comp-name').get_text(strip=True)
        except:
            company = None
    
    
        try:
            # Job Role (it's usually inside h2 but sometimes empty, fallback to job title)
            job_title = card.select_one('a.title').get_text(strip=True)
        except:
            job_title = None
    
        try:
            experience = card.select_one('span.expwdth').get_text(strip=True)
        except:
            experience = None
            
        try:
            location = card.select_one('span.locWdth').get_text(strip=True)
        except:
            location = None
            
        try:
            job_posted = card.select_one('span.job-post-day ').get_text(strip=True)
        except:
            location = None
        try:
            # Technical Skills (from tags)
            skills = ", ".join([tag.get_text(strip=True) for tag in card.select('ul.tags-gt li')])
        except:
            skills = None
    
        try:
            rating = card.select_one('a.rating').get_text(strip=True)
        except:
            rating = None
            
        try:
            review = card.select_one('a.review').get_text(strip=True)
        except:
            review = None
    
        try:
            company_url = card.select_one('a.comp-name')['href']
        except:
            company_url = None
    
        jobs.append({
            'Job Role': job_title,
            'Company Name': company,
            'Company URL': company_url,
            'Experience': experience,
            'Location': location,
            'Technical Skills': skills,
            'Rating': rating,
            'Review': review,
            'Job Posted': job_posted
        })


    
    return jobs
    # Step 4: Convert to DataFrame
    #df = pd.DataFrame(jobs)
    #print(df,"<---- df")
    
    # Optional: Save to Excel
    #df.to_excel("D:/Job Related/New folder/naukri_jobs_scraped.xlsx", index=False)

# driver.quit()


#### Apply scrapping for multiple pages job list and convert excel file

In [10]:
def multipage_excel():
    try:
        df = pd.DataFrame(columns=[
            'Job Role', 'Company Name', 'Company URL', 'Experience', 'Location',
            'Technical Skills', 'Rating', 'Review', 'Job Posted'
        ])
        for i in range(1, 5):  # start from 1 (pages start from 1)
            page = str(i)
            print(f"Clicking page {page}")
            
            pagination_div = driver.find_element(By.CLASS_NAME, 'styles_pages__v1rAK')
            page_link = pagination_div.find_element(By.XPATH, f'.//a[text()="{page}"]')
            page_link.click()
            jobs_list = scrapping()
        
            for row in jobs_list:
                df = pd.concat([df, pd.DataFrame([row])], ignore_index=True)
                
            time.sleep(15)  # wait for page to load
            
        # Optional: Save to Excel
        df.to_excel("D:/Job Related/New folder/naukri_jobs_scraped.xlsx", index=False)
        print("Shape of df", df.shape)
        print("First 10 rows", df.head(10))
        time.sleep(15)
        print(f"Result is {result}")
    

    except ValueError:
        # This block runs if there's a ValueError (like entering text instead of a number)
        print("Please enter a valid number.")
    except ZeroDivisionError:
        # This block runs if the user enters zero (division by zero)
        print("Cannot divide by zero.")
    except Exception as e:
        # This catches any other exceptions
        print(f"An unexpected error occurred: {e}")
    finally:
        # This block always runs, whether there was an error or not
        print("Execution completed.")


#multipage_excel()

##### ***HERE HANDLING THE FUNCTIONS AND KEYWORDS.***

In [15]:
SEARCH_KEYWORDS = ["Data Analyst"]
LOCATION = ["Chennai"]
# "Coimbatore","Bangalore"

searchBox_Activity(SEARCH_KEYWORDS, LOCATION)
time.sleep(5)

Dept_Keys = 'Data Science & Analytics'
filter_department(Dept_Keys)
time.sleep(5)

#####   __Role Selection__   #####
# 1) Select "Data Science & Machine Learning"
# select_role("Data Science & Machine Learning")
# time.sleep(1)

# 2) Select "Data Science & Analytics - Other"
# select_role("Data Science & Analytics - Other")
# time.sleep(1)

# 3) Select "Business Intelligence & Analytics"
# select_role("Business Intelligence & Analytics")

#####   __Locations__   #####
filter_location("Chennai")
time.sleep(3)
# filter_location("Coimbatore")

#filter_location("Bengaluru")
# time.sleep(3)

#####   __Salary__   #####
salary = "3-6 Lakhs"
filter_salary(salary)
time.sleep(3)
#filter_salary("6-10 Lakhs")

#####   __Scarapping Functions__   #####
scrapping()
time.sleep(15)

#####   __MultipageDF Excel Convertion__   #####
multipage_excel()

✅ Selected option: Chennai
✅ Selected option: 3-6 Lakhs
https://www.naukri.com/data-analyst-jobs-in-chennai?k=data%20analyst&l=chennai&nignbevent_src=jobsearchDeskGNB&experience=1&functionAreaIdGid=3&cityTypeGid=183&ctcFilter=3to6
Clicking page 1
https://www.naukri.com/data-analyst-jobs-in-chennai?k=data%20analyst&l=chennai&nignbevent_src=jobsearchDeskGNB&experience=1&functionAreaIdGid=3&cityTypeGid=183&ctcFilter=3to6
Clicking page 2
https://www.naukri.com/data-analyst-jobs-in-chennai?k=data%20analyst&l=chennai&nignbevent_src=jobsearchDeskGNB&experience=1&functionAreaIdGid=3&cityTypeGid=183&ctcFilter=3to6
Clicking page 3
https://www.naukri.com/data-analyst-jobs-in-chennai-2?k=data+analyst&l=chennai&nignbevent_src=jobsearchDeskGNB&experience=1&functionAreaIdGid=3&cityTypeGid=183&ctcFilter=3to6
Clicking page 4
https://www.naukri.com/data-analyst-jobs-in-chennai-3?k=data+analyst&l=chennai&nignbevent_src=jobsearchDeskGNB&experience=1&functionAreaIdGid=3&cityTypeGid=183&ctcFilter=3to6
Shape

In [None]:
#####   __MultipageDF Excel Convertion__   #####
multipage_excel()

In [144]:
# Step 4: Find search box, enter keywords, and press Enter
#SEARCH_KEYWORDS = ["Data Analytics", "Data Analyst", "Data analysis Analyst", "Data analytics Analyst","Data Manipulation","Data visualization", "Data Cleansing", ]
#SEARCH_KEYWORDS = ["Data Analytics", "Data Analyst", "Data visualization", "Data Manipulation", "Data Cleansing"]
#SEARCH_KEYWORDS = ["Data analysis Analyst", "Data analytics Analyst", "Pandas", "Data Processing", "Data Manipulation", "data extraction"]
#SEARCH_KEYWORDS = ["python data analyst", "data analyst with python", ]
#["Python Data Analytics"]
#data analyst, data analysis, data analytics, data visualization, data manipulation, data cleaning, data transformation,  raw data into actionable business intelligence 
