# Web Scrapping using Selenium

Write a python program to scrape data for “Data Analyst” Job position in “Bangalore” location. You
have to scrape the job-title, job-location, company_name, experience_required. You have to scrape first 10
jobs data.
This task will be done in following steps:
1. First get the webpage https://www.naukri.com/
2. Enter “Data Analyst” in “Skill, Designations, Companies” field and enter “Bangalore” in “enter the
location” field.
3. Then click the search button.
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data.

#installing selenium library
!pip install selenium

In [1]:
#importing libraries
import pandas as pd
import selenium
from selenium import webdriver
import warnings
warnings.filterwarnings('ignore')

In [2]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [3]:
url= 'https://www.naukri.com/'
driver.get(url)

In [4]:
#finding web element for search job bar using url
search_job= driver.find_element_by_class_name('suggestor-input')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="afb0d825-4dfa-4741-8d62-f6f3661d76ed")>

In [5]:
#write on search bar
search_job.send_keys('Data Analyst')

In [6]:
#finding web element for search location using absolute xpath
search_location= driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div/div/div[3]/div/div/div/input')
search_location

<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="34e28d42-177b-4645-9c4e-9ee71949312b")>

In [7]:
#finding web element for job location
search_location.send_keys('Bangalore')

In [8]:
search_button= driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div/div/div[6]')
search_location

<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="34e28d42-177b-4645-9c4e-9ee71949312b")>

In [9]:
search_button.click()

In [10]:
salary_check= driver.find_element_by_xpath('/html/body/div[1]/div[3]/div[2]/section[1]/div[2]/div[4]/div[2]/div[1]/label/i')
salary_check

<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="27febff0-90f2-493d-8024-9132d776d684")>

In [11]:
salary_check.click()

In [12]:
title_tags= driver.find_elements_by_xpath('//a[@class="title fw500 ellipsis"]')
len(title_tags)
title_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="de4f8336-3ca9-45d5-ad0b-9b34907b091e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="e4ba6ab2-3889-4ea4-80fb-2a1fb69a8d77")>]

In [13]:
#to scrape the job title using for loop
job_titles=[]
for i in title_tags:
    job_titles.append(i.text)
len(job_titles)
job_titles[0:2]

['Job openings For Data Analyst - AOA',
 'Hiring Data Analyst-Coimbatore/Bangalore']

In [14]:
#extracting company tags
company_tags= driver.find_elements_by_xpath('//a[@class="subTitle ellipsis fleft"]')
company_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="491d829e-2457-4df4-a77f-07f824025114")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="60485992-d6d8-446c-adad-8e41d8bc7aab")>]

In [15]:
#extracting company names using for loop
company_names=[]
for i in company_tags:
    company_names.append(i.text)
len(company_names)
company_names[0:2]

['izmo ltd', 'KGISL BSS- Division of KG Information System Priva te Limited']

In [16]:
#extracting location tags
location_tags= driver.find_elements_by_xpath('//li[@class="fleft grey-text br2 placeHolderLi location"]')
len(location_tags)
location_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="5e555d7c-f64b-4721-94e2-4e9952443ed3")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="d39c07db-5149-4a7e-ae0e-af5ee6d98806")>]

In [17]:
#extracting location names
location_names= []
for i in location_tags:
    location_names.append(i.text)
len(location_names)
location_names[0:2]

['Bangalore/Bengaluru(4th Phase JP Nagar)', 'Coimbatore, Bangalore/Bengaluru']

In [18]:
#extracting experience tags required for the job
experience_tags= driver.find_elements_by_xpath('//li[@class="fleft grey-text br2 placeHolderLi experience"]')
experience_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="31a8d7ec-125b-42f6-a07f-8390b293e275")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3ddcf9586568dd8f463016d867a16e45", element="893e07dd-2c3e-4b32-b603-449fa971019e")>]

In [19]:
experience_years=[]
for i in experience_tags:
    experience_years.append(i.text)
len(experience_years)
experience_years[0:2]

['2-3 Yrs', '5-10 Yrs']

In [20]:
len(job_titles), len(company_names), len(location_names),len(experience_years)

(20, 20, 20, 18)

In [21]:
jobs= pd.DataFrame()
jobs['Job Title']= job_titles[:10]
jobs['Company']=company_names[:10]
jobs['Location']=location_names[:10]
jobs['Years of Experience']= experience_years[:10]
jobs

Unnamed: 0,Job Title,Company,Location,Years of Experience
0,Job openings For Data Analyst - AOA,izmo ltd,Bangalore/Bengaluru(4th Phase JP Nagar),2-3 Yrs
1,Hiring Data Analyst-Coimbatore/Bangalore,KGISL BSS- Division of KG Information System P...,"Coimbatore, Bangalore/Bengaluru",5-10 Yrs
2,software developer & Testing / Business Analys...,SECRET TECHNOLOGIES INDIA VMS GROUP,"Pune, Bangalore/Bengaluru(Shivaji Nagar), Mumb...",0-4 Yrs
3,Data Coordinator | Data Analyst | MS Excel | T...,Inspiration Manpower Consultancy Pvt. Ltd.,Bangalore/Bengaluru(Sadashiva Nagar),0-2 Yrs
4,Looking For Data Analyst,Trellance,"Ahmedabad, Bangalore/Bengaluru",1-3 Yrs
5,Data Analyst || Advance Excel || D Limit || Co...,Inspiration Manpower Consultancy Pvt. Ltd.,Bangalore/Bengaluru,0-5 Yrs
6,Data Analyst,Capillary Technologies,Bangalore/Bengaluru,1-2 Yrs
7,Data Analyst Work FROM Home,Fine Homes and Interior,Bangalore/Bengaluru\n(WFH during Covid),1-5 Yrs
8,MIS Executive and Analyst | Data Analyst | Dat...,D2 Retro,"Bangalore/Bengaluru, Mumbai (All Areas)",2-3 Yrs
9,Hiring For Data Analyst from 1 yr of exp in Ba...,PROMANTUS INDIA PRIVATE LIMITED,"Chennai, Bangalore/Bengaluru",0-4 Yrs


Write a python program to scrape data for “Data Scientist” Job position in “Bangalore” location. You
have to scrape the job-title, job-location, company_name. You have to scrape first 10 jobs data.
This task will be done in following steps:
1. First get the webpage https://www.naukri.com/
2. Enter “Data Scientist” in “Skill, Designations, Companies” field and enter “Bangalore” in “enter the
location” field.
3. Then click the search button.
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data.Write a python program to scrape data for “Data Scientist” Job position in “Bangalore” location. You
have to scrape the job-title, job-location, company_name. You have to scrape first 10 jobs data.
This task will be done in following steps:
1. First get the webpage https://www.naukri.com/
2. Enter “Data Scientist” in “Skill, Designations, Companies” field and enter “Bangalore” in “enter the
location” field.
3. Then click the search button.
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data.

In [22]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [23]:
url= 'https://www.naukri.com/'
driver.get(url)

In [24]:
#finding web element for search job bar using url
search_job= driver.find_element_by_class_name('suggestor-input')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="8d76168f-325f-4c4f-af10-19bf8c3dbb22")>

In [25]:
#write on search bar
search_job.send_keys('Data Scientist')

In [26]:
#finding web element for search location using absolute xpath
search_location= driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div/div/div[3]/div/div/div/input')
search_location

<selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="b5a10923-c172-4ed2-b8ae-06f0bee1a9d4")>

In [27]:
#finding web element for job location
search_location.send_keys('Bangalore')

In [28]:
search_button= driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div/div/div[6]')
search_location

<selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="b5a10923-c172-4ed2-b8ae-06f0bee1a9d4")>

In [29]:
search_button.click()

In [30]:
title_tags= driver.find_elements_by_xpath('//a[@class="title fw500 ellipsis"]')
len(title_tags)
title_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="7ff7f0ba-25cb-4bc2-ab17-00fdde76d4ff")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="d751b19a-761f-468a-a76e-2991d0a680b2")>]

In [31]:
#to scrape the job title using for loop
job_titles=[]
for i in title_tags:
    job_titles.append(i.text)
len(job_titles)
job_titles[0:2]

['Sr Data Scientist', 'HCL Tech Opening - Lead Data Scientist']

In [32]:
#extracting location tags
location_tags= driver.find_elements_by_xpath('//li[@class="fleft grey-text br2 placeHolderLi location"]')
len(location_tags)
location_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="092280b1-7799-437a-b60e-a37bb967fe49")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="7f2a2eef-9f8e-4209-af6d-fef3968f58f4")>]

In [33]:
#extracting location names
location_names= []
for i in location_tags:
    location_names.append(i.text)
len(location_names)
location_names[0:2]

['Bangalore/Bengaluru',
 'Kolkata, Hyderabad/Secunderabad, Pune, Chennai, Bangalore/Bengaluru, Delhi / NCR']

In [34]:
#extracting company tags
company_tags= driver.find_elements_by_xpath('//a[@class="subTitle ellipsis fleft"]')
company_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="007057c1-ab9a-4b5d-b5de-c921473dee44")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d89e672a96ae59952a429812053cc1ee", element="1d115088-d2c3-457a-986a-638b26b7904b")>]

In [35]:
#extracting company names using for loop
company_names=[]
for i in company_tags:
    company_names.append(i.text)
len(company_names)
company_names[0:2]

['Uber', 'HCL']

In [36]:
jobs= pd.DataFrame()
jobs['Job Title']= job_titles[:10]
jobs['Company']=company_names[:10]
jobs['Location']=location_names[:10]
jobs

Unnamed: 0,Job Title,Company,Location
0,Sr Data Scientist,Uber,Bangalore/Bengaluru
1,HCL Tech Opening - Lead Data Scientist,HCL,"Kolkata, Hyderabad/Secunderabad, Pune, Chennai..."
2,Senior Data Scientist Payments,AirSeva,Bangalore/Bengaluru
3,Senior Data Scientist (R Programming),Ignitho,Remote
4,Data Scientist/Senior Data Scientist - Python,ApicalGo Consultancy,Bangalore/Bengaluru
5,Senior Data Scientist - Python/Machine Learnin...,Altimax Business Solutions,"Mumbai, Hyderabad/Secunderabad, Pune, Bangalor..."
6,Sr . Data Scientist,Visa,Bangalore/Bengaluru
7,Need Data scientists and data engineers - WFH-...,Covalense Technologies Private Limited,"Hyderabad/Secunderabad, Bangalore/Bengaluru, M..."
8,Data Scientist,AirSeva,Bangalore/Bengaluru
9,Data Scientist,Korea Trade Center,"New Delhi, Gurgaon/Gurugram, Bangalore/Bengalu..."


Q3: In this question you have to scrape data using the filters available on the webpage as shown below:
    You have to use the location and salary filter.
You have to scrape data for “Data Scientist” designation for first 10 job results.
You have to scrape the job-title, job-location, company name, experience required.
The location filter to be used is “Delhi/NCR”. The salary filter to be used is “3-6” lakhs
The task will be done as shown in the below steps:
1. first get the webpage https://www.naukri.com/
2. Enter “Data Scientist” in “Skill, Designations, and Companies” field.
3. Then click the search button.
4. Then apply the location filter and salary filter by checking the respective boxes
5. Then scrape the data for the first 10 jobs results you get.
6. Finally create a dataframe of the scraped data.

In [37]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [38]:
url= 'https://www.naukri.com/'
driver.get(url)

In [39]:
#finding web element for search job bar using url
search_job= driver.find_element_by_class_name('suggestor-input')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="c4ebad2d-fb43-4c9f-9464-2785a55aa810")>

In [40]:
#write on search bar
search_job.send_keys('Data Scientist')

In [41]:
#finding web element for search location using absolute xpath
search_location= driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div/div/div[3]/div/div/div/input')
search_location

<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="ba1c9be8-6862-470f-bd6a-91a73296b6f6")>

In [42]:
#finding web element for job location
search_location.send_keys('Delhi/NCR')

In [43]:
search_button= driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div/div/div[6]')
search_location

<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="ba1c9be8-6862-470f-bd6a-91a73296b6f6")>

search_button.click()

In [47]:
salary_check= driver.find_element_by_xpath('/html/body/div[1]/div[3]/div[2]/section[1]/div[2]/div[4]/div[2]/div[2]/label/i')
salary_check

<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="1eb8d55c-e4ee-439b-83a0-2751ae3318bd")>

In [48]:
salary_check.click()

In [51]:
title_tags= driver.find_elements_by_xpath('//a[@class="title fw500 ellipsis"]')
len(title_tags)
title_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="5a2b59b3-3ed9-4998-80eb-3c086007ae2b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="3bb8e6a1-d84e-4440-8801-f40ef387080e")>]

In [54]:
#to scrape the job title using for loop
job_titles=[]
for i in title_tags:
    job_titles.append(i.text)
len(job_titles)
job_titles[0:2]

['Hiring For Senior Data Scientist-Noida',
 'Excellent Opportunity For Freshers For AI/ML, Data Scientist, BI, QA']

In [55]:
#extracting company tags
company_tags= driver.find_elements_by_xpath('//a[@class="subTitle ellipsis fleft"]')
company_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="0e0d8355-78cd-4c9a-b212-a771876d15ec")>,
 <selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="7e7a53e8-e1af-42f8-b1ad-4b35e80f5ddd")>]

In [56]:
#extracting company names using for loop
company_names=[]
for i in company_tags:
    company_names.append(i.text)
len(company_names)
company_names[0:2]

['Lumiq.ai', 'NTT Data']

In [57]:
#extracting location tags
location_tags= driver.find_elements_by_xpath('//li[@class="fleft grey-text br2 placeHolderLi location"]')
len(location_tags)
location_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="7f357197-deb5-418f-b9b7-cb5a0be13a43")>,
 <selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="60186779-5197-4a72-a8bb-ea256fac685d")>]

In [58]:
#extracting location names
location_names= []
for i in location_tags:
    location_names.append(i.text)
len(location_names)
location_names[0:2]

['Noida, New Delhi, Greater Noida',
 'Noida, Kolkata, Hyderabad/Secunderabad, Pune, Chennai, Bangalore/Bengaluru, Delhi / NCR, Mumbai (All Areas)\n(WFH during Covid)']

In [59]:
#extracting experience tags required for the job
experience_tags= driver.find_elements_by_xpath('//li[@class="fleft grey-text br2 placeHolderLi experience"]')
experience_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="11039be3-7e5c-41af-9bbc-bdfaecc16c5b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="f81207bb08bd16802d829c17e3573c91", element="b7f23767-28de-4671-9e71-1b890cfe8a04")>]

In [60]:
experience_years=[]
for i in experience_tags:
    experience_years.append(i.text)
len(experience_years)
experience_years[0:2]

['2-6 Yrs', '0-0 Yrs']

In [61]:
jobs= pd.DataFrame()
jobs['Job Title']= job_titles[:10]
jobs['Company']=company_names[:10]
jobs['Location']=location_names[:10]
jobs['Years of Experience Required']= experience_years[:10]
jobs

Unnamed: 0,Job Title,Company,Location,Years of Experience Required
0,Hiring For Senior Data Scientist-Noida,Lumiq.ai,"Noida, New Delhi, Greater Noida",2-6 Yrs
1,"Excellent Opportunity For Freshers For AI/ML, ...",NTT Data,"Noida, Kolkata, Hyderabad/Secunderabad, Pune, ...",0-0 Yrs
2,Data Analyst / Data Scientist / Business Analy...,GABA Consultancy services,"Noida, New Delhi, Delhi / NCR",0-0 Yrs
3,Data Scientist,Mount Talent Consulting Private Limited,"Hyderabad/Secunderabad, Pune, Gurgaon/Gurugram...",1-4 Yrs
4,Data scientist- Python,TeamPlus Staffing Solution Pvt Ltd,Gurgaon/Gurugram,3-6 Yrs
5,Data Scientist _NLP,EXL,"Bangalore/Bengaluru, Delhi / NCR\n(WFH during ...",3-8 Yrs
6,Data Scientist (freelance),2Coms,"New Delhi, Delhi",2-7 Yrs
7,Data Scientist - MIND Infotech,MOTHERSONSUMI INFOTECH & DESIGNS LIMITED,Noida,4-8 Yrs
8,Lead Data Scientist,Indihire HR Consultants Private Limited,Delhi / NCR\n(WFH during Covid),2-4 Yrs
9,Only Fresher / Python Data Scientist / Trainee...,GABA Consultancy services,"Noida, New Delhi, Gurgaon/Gurugram",0-0 Yrs


Scrape data of first 100 sunglasses listings on flipkart.com. You have to scrape four attributes:
1. Brand
2. Product Description
3. Price
To scrape the data you have to go through following steps:
1. Go to Flipkart webpage by url : https://www.flipkart.com/
2. Enter “sunglasses” in the search field where “search for products, brands andmore” is written and
click the search icon
3. After that you will reach to the page having a lot of sunglasses. From this pageyou can scrap the
required data as usual.
4. After scraping data from the first page, go to the “Next” Button at the bottom ofthe page , then
click on it.
5. Now scrape data from this page as usual
6. Repeat this until you get data for 100 sunglasses.

In [149]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [150]:
url= 'https://www.flipkart.com/'
driver.get(url)

In [151]:
#finding web element for search job bar using url
search= driver.find_element_by_class_name('_3704LK')
search

<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="7ba5c2af-c608-494b-a3e1-1644c5b5d3fb")>

In [152]:
#write on search bar
search.send_keys('Sunglasses')

In [153]:
search.click()

In [154]:
#we will first scrape name tags of the products
name_tags= driver.find_elements_by_xpath('//div[@class="_2WkVRV"]')
len(name_tags)
name_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="ae70f1d8-978a-4bfe-b42c-b0cd36872380")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="3a68d3b3-c01b-428e-8748-bb74dffc0616")>]

In [155]:
#to scrape the product name using for loop
product_name=[]
for i in name_tags:
    product_name.append(i.text)
len(product_name)
product_name[0:2]

['PIRASO', 'PIRASO']

In [156]:
#to scrape description tags of the products
description_tags= driver.find_elements_by_xpath('//a[@class="IRpwTa"]')
len(description_tags)
description_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="feb8f8fe-f0b3-4b12-985b-0a01e1481103")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="2c413cea-f2d0-431b-bec1-4b23d49e9dfd")>]

In [157]:
#to scrape the product description using for loop
product_description=[]
for i in description_tags:
    product_description.append(i.text)
len(product_description)
product_description[0:2]

['UV Protection Rectangular Sunglasses (Free Size)',
 'UV Protection Rectangular Sunglasses (52)']

In [158]:
#to scrape price tags of the products
price_tags= driver.find_elements_by_xpath('//div[@class="_25b18c"]')
len(price_tags)
price_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="c064c3a5-54b3-4ff5-aa84-0f94a646d8f7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="c4947537-5e4c-4f13-8b84-1558fdd8e9db")>]

In [159]:
#to scrape the product price using for loop
product_price=[]
for i in price_tags:
    product_price.append(i.text)
len(product_price)
product_price[0:2]

['₹228₹2,59991% off', '₹263₹2,59989% off']

In [160]:
#we will first scrape name tags of the products
name1_tags= driver.find_elements_by_xpath('//div[@class="_2WkVRV"]')
len(name_tags)
name_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="ae70f1d8-978a-4bfe-b42c-b0cd36872380")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="3a68d3b3-c01b-428e-8748-bb74dffc0616")>]

In [161]:
#to scrape the product name using for loop
product1_name=[]
for i in name1_tags:
    product1_name.append(i.text)
len(product1_name)
product1_name[0:2]

['PIRASO', 'PIRASO']

In [162]:
#to scrape description tags of the products
description1_tags= driver.find_elements_by_xpath('//a[@class="IRpwTa"]')
len(description1_tags)
description1_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="feb8f8fe-f0b3-4b12-985b-0a01e1481103")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="2c413cea-f2d0-431b-bec1-4b23d49e9dfd")>]

In [163]:
#to scrape the product description using for loop
product1_description=[]
for i in description1_tags:
    product1_description.append(i.text)
len(product1_description)
product1_description[0:2]

['UV Protection Rectangular Sunglasses (Free Size)',
 'UV Protection Rectangular Sunglasses (52)']

In [164]:
#to scrape price tags of the products
price1_tags= driver.find_elements_by_xpath('//div[@class="_25b18c"]')
len(price1_tags)
price1_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="c064c3a5-54b3-4ff5-aa84-0f94a646d8f7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="c4947537-5e4c-4f13-8b84-1558fdd8e9db")>]

In [165]:
#to scrape the product price using for loop
product1_price=[]
for i in price1_tags:
    product1_price.append(i.text)
len(product1_price)
product1_price[0:2]

['₹228₹2,59991% off', '₹263₹2,59989% off']

In [166]:
#we will scrape name tags of the products
name2_tags= driver.find_elements_by_xpath('//div[@class="_2WkVRV"]')
len(name2_tags)
name2_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="ae70f1d8-978a-4bfe-b42c-b0cd36872380")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="3a68d3b3-c01b-428e-8748-bb74dffc0616")>]

In [167]:
#to scrape the product name using for loop
product2_name=[]
for i in name2_tags:
    product2_name.append(i.text)
len(product2_name)
product2_name[0:2]

['PIRASO', 'PIRASO']

In [168]:
#to scrape description tags of the products
description2_tags= driver.find_elements_by_xpath('//a[@class="IRpwTa"]')
len(description2_tags)
description2_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="feb8f8fe-f0b3-4b12-985b-0a01e1481103")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="2c413cea-f2d0-431b-bec1-4b23d49e9dfd")>]

In [169]:
#to scrape the product description using for loop
product2_description=[]
for i in description2_tags:
    product2_description.append(i.text)
len(product2_description)
product2_description[0:2]

['UV Protection Rectangular Sunglasses (Free Size)',
 'UV Protection Rectangular Sunglasses (52)']

In [170]:
#to scrape price tags of the products
price2_tags= driver.find_elements_by_xpath('//div[@class="_25b18c"]')
len(price2_tags)
price2_tags[0:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="c064c3a5-54b3-4ff5-aa84-0f94a646d8f7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b4d02279bf99adcd06521ad7a236fdc3", element="c4947537-5e4c-4f13-8b84-1558fdd8e9db")>]

In [171]:
#to scrape the product price using for loop
product2_price=[]
for i in price2_tags:
    product2_price.append(i.text)
len(product2_price)
product2_price[0:2]

['₹228₹2,59991% off', '₹263₹2,59989% off']

In [172]:
len( product_name),len( product1_name),len( product2_name),len(product_description),len(product1_description),
len(product2_description), len(product_price),len(product1_price),len(product2_price)

(40, 45, 45, 45)

In [176]:
sunglasses= pd.DataFrame()
sunglasses['Product Name']= product_name[:40]
sunglasses['Product Name']= product1_name[:40]
sunglasses['Product Name']= product2_name[:40]
sunglasses['Product Description']= product_description[:40]
sunglasses['Product Description']= product1_description[:40]
sunglasses['Product Description']=product2_description[:40]
sunglasses['Product Price']=product_price[:40]
sunglasses['Product Price']=product1_price[:40]
sunglasses['Product Price']=product2_price[:40]
sunglasses.head(5)

Unnamed: 0,Product Name,Product Description,Product Price
0,PIRASO,UV Protection Rectangular Sunglasses (Free Size),"₹228₹2,59991% off"
1,PIRASO,UV Protection Rectangular Sunglasses (52),"₹263₹2,59989% off"
2,SRPM,UV Protection Wayfarer Sunglasses (50),"₹198₹1,29984% off"
3,SUNBEE,"UV Protection, Polarized Wayfarer Sunglasses (...","₹253₹1,29980% off"
4,ROZZETTA CRAFT,"Polarized, Night Vision, Riding Glasses Sports...","₹474₹1,99976% off"


Q5: Scrape 100 reviews data from flipkart.com for iphone11 phone. You have to go the link:

https://www.flipkart.com/apple-iphone-11-black-64-gb-includes- earpods-power-
adapter/p/itm0f37c2240b217?pid=MOBFKCTSVZAXUHGR&lid=LSTMOBFKC

TSVZAXUHGREPBFGI&marketplace.
When you will open the above link you will reach to the below shown webpage .
As shown in the above page you have to scrape the tick marked attributes.These are:
1. Rating
2. Review summary
3. Full review
4. You have to scrape this data for first 100 reviews.

In [2]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [3]:
url= 'https://www.flipkart.com/apple-iphone-11-black-64-gb-includes- earpods-powerdapter/p/itm0f37c2240b217?pid=MOBFKCTSVZAXUHGR&lid=LSTMOBFKCTSVZAXUHGREPBFGI&marketplace.'
driver.get(url)

In [4]:
rating_tags= driver.find_elements_by_xpath('//div[@class="_3LWZlK _1BLPMq"]')
rating_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="99df94a9d254f4372218936bd04e1ea7", element="0bcd19b5-6a4b-46b5-a407-42f7afcd9587")>,
 <selenium.webdriver.remote.webelement.WebElement (session="99df94a9d254f4372218936bd04e1ea7", element="584d7239-5436-40d9-a38d-1bf3516bd684")>]

In [5]:
ratings=[]
for i in rating_tags:
    ratings.append(i.text)
len(ratings)
ratings[:2]

['5', '5']

In [6]:
summary_tags= driver.find_elements_by_xpath('//p[@class="_2-N8zT"]')
summary_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="99df94a9d254f4372218936bd04e1ea7", element="14b365f4-c093-4b15-aafe-35ef923d6931")>,
 <selenium.webdriver.remote.webelement.WebElement (session="99df94a9d254f4372218936bd04e1ea7", element="77d723ce-da06-423d-98f2-bfa163fd3f82")>]

In [7]:
summary_reviews=[]
for i in summary_tags:
    summary_reviews.append(i.text)
len(summary_reviews)
summary_reviews[:2]

['Brilliant', 'Simply awesome']

In [8]:
review_tags= driver.find_elements_by_xpath('//div[@class]')
review_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="99df94a9d254f4372218936bd04e1ea7", element="29d5c079-3e51-4ef5-b826-d5f50f41e9a1")>,
 <selenium.webdriver.remote.webelement.WebElement (session="99df94a9d254f4372218936bd04e1ea7", element="e1b27a11-66d4-4f70-a8f7-1d11e309b098")>]

In [9]:
full_review=[]
for i in review_tags:
    full_review.append(i.text)
len(full_review)
full_review[:2]

['Explore Plus\nLogin\nMore\nCart', 'Explore Plus\nLogin\nMore\nCart']

In [12]:
phone= pd.DataFrame()
phone['Rating']= ratings[:10]
phone['Review Summary']=summary_reviews[:100]
phone['Full Review']=full_review[:10]
phone

Unnamed: 0,Rating,Review Summary,Full Review
0,5.0,Brilliant,Explore Plus\nLogin\nMore\nCart
1,5.0,Simply awesome,Explore Plus\nLogin\nMore\nCart
2,5.0,Best in the market!,
3,,,Explore Plus\nLogin\nMore\nCart
4,,,Explore Plus
5,,,Explore Plus
6,,,
7,,,
8,,,
9,,,Login


Scrape data for first 100 sneakers you find when you visit flipkart.com andsearch for “sneakers” in the
search field.
You have to scrape 4 attributes of each sneaker:
1. Brand
2. Product Description
3. Price
As shown in the below image, you have to scrape the tick marked attributes.

In [13]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [14]:
url= 'https://www.flipkart.com/'
driver.get(url)

In [15]:
search= driver.find_element_by_class_name('_3704LK')
search

<selenium.webdriver.remote.webelement.WebElement (session="0d464859026e02fff917b9605ad9318a", element="86fe4b9c-0ffa-4e0a-962a-0104948ae7fc")>

In [16]:
search.send_keys('Sneakers')

In [17]:
search.click()

In [18]:
brand_tags= driver.find_elements_by_xpath('//div[@class="_2WkVRV"]')
brand_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="0d464859026e02fff917b9605ad9318a", element="80d03a75-4704-456c-a78e-33137f1a1f28")>,
 <selenium.webdriver.remote.webelement.WebElement (session="0d464859026e02fff917b9605ad9318a", element="f625ae1d-3a9d-455e-addb-5ea3ba8bdc7c")>]

In [19]:
brand_name=[]
for i in brand_tags:
    brand_name.append(i.text)
len(brand_name)
brand_name[:2]

['HIGHLANDER', 'HIGHLANDER']

In [31]:
description_tags= driver.find_elements_by_xpath('//a[@class="IRpwTa"]')
description_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="0d464859026e02fff917b9605ad9318a", element="bc68966d-6eae-4ca0-81a3-418be3902540")>,
 <selenium.webdriver.remote.webelement.WebElement (session="0d464859026e02fff917b9605ad9318a", element="7cc807c8-5972-42f8-ae4e-cd88cddb757a")>]

In [32]:
brand_description=[]
for i in description_tags:
    brand_description.append(i.text)
len(brand_description)
brand_description[:2]

['Sneakers For Men',
 'Super Stylish & Trendy Combo Pack of 02 Pairs Sneakers ...']

In [22]:
price_tags= driver.find_elements_by_xpath('//div[@class="_25b18c"]')
price_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="0d464859026e02fff917b9605ad9318a", element="780049fd-559f-4122-8995-06081b8e9adb")>,
 <selenium.webdriver.remote.webelement.WebElement (session="0d464859026e02fff917b9605ad9318a", element="5acf789a-946d-4311-afc4-595d200b22ae")>]

In [23]:
price=[]
for i in price_tags:
    price.append(i.text)
len(price)
price[:2]

['₹796₹1,99060% off', '₹796₹1,99060% off']

In [33]:
len(brand_name),len(brand_description),len(price)

(40, 31, 45)

In [35]:
sneakers= pd.DataFrame()
sneakers['Brand']= brand_name[:30]
sneakers['Brand Description']=brand_description[:30]
sneakers['Price']=price[:30]
sneakers

Unnamed: 0,Brand,Brand Description,Price
0,HIGHLANDER,Sneakers For Men,"₹796₹1,99060% off"
1,HIGHLANDER,Super Stylish & Trendy Combo Pack of 02 Pairs ...,"₹796₹1,99060% off"
2,Chevit,Sneakers For Men,"₹649₹1,59859% off"
3,World Wear Footwear,Sneakers For Men,₹199₹49960% off
4,DUNKASTON,Sneakers For Men,"₹278₹1,49981% off"
5,Magnolia,Modern Trendy Sneakers Shoes Sneakers For Men,₹448₹99955% off
6,BRUTON,Sneakers For Men,"₹259₹1,29980% off"
7,LEVI'S,STYLISH MENS BLACK AND WHITE SNEAKER Sneakers ...,"₹1,119₹2,79960% off"
8,TR,Sneaker Sneakers For Men,₹322₹99967% off
9,URBANBOX,Modern & Trendy Collection Combo Pack of 02 Sh...,₹198₹99980% off


Go to the link - https://www.myntra.com/shoes
Set Price filter to “Rs. 7149 to Rs. 14099 ” , Color filter to “Black”, as shown inthe below image.
And then scrape First 100 shoes data you get. The data should include “Brand” of the shoes , Short Shoe
description, price of the shoe as shown in the below image.

In [36]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [37]:
url= 'https://www.myntra.com/shoes'
driver.get(url)

In [38]:
price_check= driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/main/div[3]/div[1]/section/div/div[5]/ul/li[2]/label/div')
price_check

<selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="80b57324-10ba-456b-a370-68b1a24ef077")>

In [39]:
price_check.click()

In [40]:
color_check= driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/main/div[3]/div[1]/section/div/div[6]/ul/li[1]/label/div')
price_check

<selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="80b57324-10ba-456b-a370-68b1a24ef077")>

In [41]:
color_check.click()

In [42]:
brand_tags= driver.find_elements_by_xpath('//h3[@class="product-brand"]')
brand_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="01eba34c-811e-4947-81e2-0fddf8a0af6f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="e4bf99b3-7507-4c28-a524-eace3fbe1b37")>]

In [43]:
brand_name=[]
for i in brand_tags:
    brand_name.append(i.text)
len(brand_name)
brand_name[:2]

['ALDO', 'Nike']

In [44]:
description_tags= driver.find_elements_by_xpath('//h4[@class="product-product"]')
description_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="ff7a1791-7b69-402c-aede-7f91dfd5eba0")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="b678fa38-da49-405f-a231-643c31bbd435")>]

In [45]:
description=[]
for i in description_tags:
    description.append(i.text)
len(description)
description[:2]

['Men Printed Sneakers', 'Men Winflo 7 Running Shoes']

In [46]:
price_tags= driver.find_elements_by_xpath('//div[@class="product-price"]')
price_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="fcd40a92-7d38-4113-a684-b501f861ab66")>,
 <selenium.webdriver.remote.webelement.WebElement (session="b92c81c35475bf4cfb1a0057a9c43b3c", element="6105c2a2-fe06-4bdc-946c-0f1d0e0f9f7c")>]

In [47]:
prices=[]
for i in price_tags:
    prices.append(i.text)
len(prices)
prices[:2]

['Rs. 9099Rs. 12999(30% OFF)', 'Rs. 7995']

In [48]:
shoes= pd.DataFrame()
shoes['Brand Name']= brand_name[:100]
shoes['Description']=description[:100]
shoes['Prices']=prices[:100]
shoes

Unnamed: 0,Brand Name,Description,Prices
0,ALDO,Men Printed Sneakers,Rs. 9099Rs. 12999(30% OFF)
1,Nike,Men Winflo 7 Running Shoes,Rs. 7995
2,ALDO,Men Leather Driving Shoes,Rs. 12999
3,Puma,Electrify Nitro Running Shoes,Rs. 9999
4,ALDO,Men Woven Design Sneakers,Rs. 13999
5,Hush Puppies,Men Solid Leather Formal Slip-Ons,Rs. 7649Rs. 8999(Rs. 1350 OFF)
6,Puma,Men Jamming 2.0 Running Shoes,Rs. 12999
7,Bugatti,Men Solid Leather Formal Derbys,Rs. 9499
8,Nike,Men Air Max Dawn Sneakers,Rs. 10995
9,Puma,Men Training or Gym Shoes,Rs. 7999


Go to webpage https://www.amazon.in/
Enter “Laptop” in the search field and then click the search icon.
Then set CPU Type filter to “Intel Core i7” and “Intel Core i9” as shown in the below image:
After setting the filters scrape first 10 laptops data. You have to scrape 3 attributesfor each laptop:
1. Title
2. Ratings
3. Price
As shown in the below image as the tick marked attributes.

In [49]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [50]:
url= 'https://www.amazon.in/'
driver.get(url)

In [59]:
search_laptop= driver.find_element_by_class_name('nav-search-field ')
search_laptop

<selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="ab78d59c-a24b-4f42-898b-a680d3a62bf6")>

In [63]:
cpu_check= driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[1]/div[2]/div/div[3]/span/div[1]/div/div/div[6]/ul[5]/li[12]/span/a/div/label/i')
cpu_check

<selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="4b11bad2-e01c-4316-8122-71461d03cf55")>

In [64]:
cpu_check.click()

In [65]:
title_tags= driver.find_elements_by_xpath('//span[@class="a-size-medium a-color-base a-text-normal"]')
title_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="9c5004e9-8eb1-440b-b396-4df407d47a9d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="29d89452-e010-4205-ae2d-472f8c9cc2a7")>]

In [66]:
title_name=[]
for i in title_tags:
    title_name.append(i.text)
len(title_name)
title_name[:2]

['Fujitsu UH-X 11th Gen Intel Core i7 13.3” FHD IPS 400Nits Thin & Light Laptop(16GB/512GB SSD/Windows 11/Office 2021/Iris Xe Graphics/Backlit Kb/Fingerprint Reader/2Yr Warranty/Black/878gms),4ZR1F38024',
 'LG Gram Intel Evo 11th Gen Core i7 17 inches Ultra-Light Laptop (16 GB RAM, 512 GB SSD, New Windows 11 Home Preload, Iris Xe Graphics, USC -C x 2 (with Power), 1.35 kg, 17Z90P-G.AH85A2, Black)']

In [67]:
rating_tags= driver.find_elements_by_xpath('//span[@class="a-size-base s-underline-text"]')
rating_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="d517169d-e83d-4e63-839a-d985329be55a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="9988319e-aaa8-4bce-8e67-1a24a0c4cfac")>]

In [68]:
ratings=[]
for i in rating_tags:
    ratings.append(i.text)
len(ratings)
ratings[:2]

['39', '140']

In [69]:
price_tags= driver.find_elements_by_xpath('//span[@class="a-price-whole"]')
price_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="41aaa45d-ba54-423c-9e9c-a10b0d0a92ab")>,
 <selenium.webdriver.remote.webelement.WebElement (session="04a91bd56e38e6e1ceb3eae41279afe2", element="e05b4800-7a87-4302-bd80-1db94c5d411f")>]

In [70]:
pricing=[]
for i in price_tags:
    pricing.append(i.text)
len(pricing)
pricing[:2]

['83,990', '96,999']

In [71]:
laptops= pd.DataFrame()
laptops['Title Name']= title_name[:9]
laptops['Prices']=pricing[:9]
laptops

Unnamed: 0,Title Name,Prices
0,Fujitsu UH-X 11th Gen Intel Core i7 13.3” FHD ...,83990
1,LG Gram Intel Evo 11th Gen Core i7 17 inches U...,96999
2,"ASUS VivoBook 14 (2021), 14-inch (35.56 cms) F...",57490
3,Mi Notebook Ultra 3.2K Resolution Display Inte...,77499
4,LG Gram Intel Evo 11th Gen Core i7 17 inches U...,96999
5,LG Gram 16 inches Intel Evo 11th Gen Core i7 U...,89999
6,"ASUS TUF Gaming F15 (2021), 15.6"" (39.62 cms) ...",89990
7,ASUS ZenBook 13 OLED (2021) Intel Core i7-1165...,92900
8,Lenovo ThinkBook 13s Intel 11th Gen Core i7 13...,89990


Write a python program to scrape data for first 10 job results for Data Scientist Designation in Noida
location. You have to scrape company name, No. of days ago when job was posted, Rating of the company.
This task will be done in following steps:
1. First get the webpage https://www.ambitionbox.com/
2. Click on the Job option as shown in the image
3. After reaching to the next webpage, In place of “Search by Designations, Companies, Skills” enter
“Data Scientist” and click on search button.
4. You will reach to the following web page click on location and in place of “Search location” enter
“Noida” and select location “Noida”.
5. Then scrape the data for the first 10 jobs results you get on the above shown page.
6. Finally create a dataframe of the scraped data.

In [115]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [117]:
url= 'https://www.ambitionbox.com/'
driver.get(url)

In [74]:
search_job= driver.find_element_by_class_name('link.jobs')
search_job 

<selenium.webdriver.remote.webelement.WebElement (session="71244f8d6d9aeaa56e3db467490d220a", element="bb97ddbf-9dd2-405b-a3cc-46c3e2fda388")>

In [75]:
search_job.click()

In [76]:
search_position= driver.find_element_by_xpath('/html/body/div/div/div/div[2]/div[1]/div/div/div/div/span/input')
search_position

<selenium.webdriver.remote.webelement.WebElement (session="71244f8d6d9aeaa56e3db467490d220a", element="95549bf2-f3d6-42f4-b640-9ffc06b56114")>

In [88]:
search_position.send_keys('Data Scientist')

In [89]:
search_position.click()

In [107]:
name_tags= driver.find_elements_by_xpath('//a[@class="title noclick"]')
name_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="71244f8d6d9aeaa56e3db467490d220a", element="86cc8741-86d1-4ece-a695-fc3ec0ff1554")>,
 <selenium.webdriver.remote.webelement.WebElement (session="71244f8d6d9aeaa56e3db467490d220a", element="12df5da9-0aa9-4394-ad23-bf7a92a14749")>]

In [113]:
names=[]
for i in name_tags:
    names.append(i.text)
len(names)
names[:2]

['Excellent Opportunity For Freshers For AI/ML, Data Scientist, BI, QA',
 "HCL Hiring Data Scientist (Loc: Noida / Chennai / B'lore)"]

In [102]:
day_tags= driver.find_elements_by_xpath('//span[@class="body-small-l"]')
day_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="71244f8d6d9aeaa56e3db467490d220a", element="ccd6c3aa-616a-4786-8e66-507667750ac1")>,
 <selenium.webdriver.remote.webelement.WebElement (session="71244f8d6d9aeaa56e3db467490d220a", element="e0f39190-dbef-4843-9b20-802b436d6ca5")>]

In [103]:
days=[]
for i in day_tags:
    days.append(i.text)
len(days)
days[:2]

['10d ago', 'via naukri.com']

In [118]:
rating_tags= driver.find_elements_by_xpath('//span[@class="body-small"]')
rating_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="0c0fa710b812f2c753df6f15228d95f0", element="a422bca9-879f-4b8e-b959-91734280fe46")>,
 <selenium.webdriver.remote.webelement.WebElement (session="0c0fa710b812f2c753df6f15228d95f0", element="dd9489fb-702b-4921-ac07-385ecc509db3")>]

In [119]:
ratings=[]
for i in rating_tags:
    ratings.append(i.text)
len(ratings)
ratings[:2]

['4.3', '4.3']

In [121]:
jobs= pd.DataFrame()
jobs['Company Names']= names[:10]
jobs['No of Days ago add was posted']= days[:10]
jobs['Ratings']=ratings[:10]
jobs


Unnamed: 0,Company Names,No of Days ago add was posted,Ratings
0,"Excellent Opportunity For Freshers For AI/ML, ...",10d ago,4.3
1,HCL Hiring Data Scientist (Loc: Noida / Chenna...,via naukri.com,4.3
2,Data Scientist,11d ago,4.1
3,Data Scientist-II,via naukri.com,4.1
4,HCL Tech Opening - Senior Data Scientist,11d ago,4.1
5,Data Scientist,via naukri.com,4.1
6,Urgent Requirement || Data Scientist || Noida,22d ago,4.2
7,Data Scientist with NLP & Python,via naukri.com,4.1
8,Data Scientist,19d ago,4.1
9,Opportunity | Tavant India,via naukri.com,4.1


Write a python program to scrape the salary data for Data Scientist designation.
You have to scrape Company name, Number of salaries, Average salary, Minsalary, Max Salary.
The above task will be, done as shown in the below steps:
1. First get the webpage https://www.ambitionbox.com/
2. Click on the salaries option as shown in the image.
reaching to the following webpage, In place of “Search Job Profile” enters “Data Scientist” and
then click on “Data Scientist”.
You have to scrape the data ticked in the above image.
4. Scrape the data for the first 10 companies. Scrape the company name, total salary record, average
salary, minimum salary, maximum salary, experience required.
5. Store the data in a dataframe.

In [122]:
driver= webdriver.Chrome(r'C:/Users/Harshita/Downloads/chromedriver_win32/chromedriver')

In [124]:
url= 'https://www.ambitionbox.com/'
driver.get(url)

In [126]:
search_salary= driver.find_element_by_class_name('link.salaries')
search_salary 

<selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="a06ab7e8-e7ef-4be6-92aa-40d70c4056b9")>

In [127]:
search_salary.click()

In [130]:
search_job= driver.find_element_by_xpath('/html/body/div/div/div/main/section[1]/div[2]/div[1]/span/input')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="85a8b3ca-a190-48a7-8c6f-285393bec7a0")>

In [133]:
search_job.click()

In [135]:
name_tags= driver.find_elements_by_xpath('//div[@class="name"]')
name_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="9020756f-97a7-4824-807c-7274d6225c60")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="0f87d781-abdb-4e90-91cd-0b421222f73a")>]

In [136]:
names=[]
for i in name_tags:
    names.append(i.text)
len(names)
names[:2]

['Ab Inbev\nbased on 28 salaries', 'ZS\nbased on 15 salaries']

In [142]:
record_tags= driver.find_elements_by_xpath('//span["data-v-2bae05f7"]')
record_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="d473f218-a856-4a65-9ed2-28e1c5b5b2b6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="ef081e4d-0143-453a-9e4e-ab497a36d9cf")>]

In [143]:
salary_record=[]
for i in record_tags:
    salary_record.append(i.text)
len(salary_record)
salary_record

['1',
 '',
 '',
 '',
 '',
 'Search',
 'Filter salaries by',
 'Salary Comparison',
 '10 results are available, use up and down arrow keys to navigate.',
 '10 results are available, use up and down arrow keys to navigate.',
 'based on 28 salaries',
 ' . ',
 'based on 15 salaries',
 ' . ',
 'based on 25 salaries',
 ' . ',
 'based on 77 salaries',
 ' . ',
 'based on 33 salaries',
 ' . ',
 'based on 52 salaries',
 ' . ',
 'based on 14 salaries',
 ' . ',
 'based on 13 salaries',
 ' . ',
 'based on 43 salaries',
 ' . ',
 'based on 57 salaries',
 ' . ',
 'Salaries for Popular Roles',
 '₹ 2.7L',
 'Low',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '₹ 13.5L',
 'Highest',
 '',
 '',
 '₹ 3.0L',
 'Low',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '₹ 15.9L',
 'Highest',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 

In [137]:
salary_tags= driver.find_elements_by_xpath('//p[@class="averageCtc"]')
salary_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="fbd76cad-cc3c-47aa-9921-a3ff52bcf4a7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="d87d1774-9945-40a4-9366-7b49ee0646fb")>]

In [138]:
salaries=[]
for i in salary_tags:
    salaries.append(i.text)
len(salaries)
salaries[:2]

['₹ 20.3L', '₹ 15.3L']

In [144]:
minsalary_tags= driver.find_elements_by_xpath('//div[@class="value body-medium"]')
minsalary_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="69685882-7073-4396-abcf-322c63dd7dcf")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="32948586-552a-4407-bb7d-6dace7d603a7")>]

In [145]:
min_salaries=[]
for i in minsalary_tags:
    min_salaries.append(i.text)
len(min_salaries)
min_salaries[:2]

['₹ 15.0L', '₹ 25.5L']

In [146]:
maxsalary_tags= driver.find_elements_by_xpath('//div[@class="value body-medium"]')
maxsalary_tags[:2]

[<selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="a8f38eb3-ea31-48d3-915d-7b6e4a2a18bc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3a5125cb17be21fa0e31a04cbc9b14c2", element="df5024c5-7c89-42fc-b89a-487c0cd4116d")>]

In [147]:
max_salaries=[]
for i in maxsalary_tags:
    max_salaries.append(i.text)
len(max_salaries)
max_salaries[:2]

['₹ 15.0L', '₹ 25.5L']

In [148]:
jobs= pd.DataFrame()
jobs['Company Names']= names[:10]
jobs['Average Salary']= salary_record[:10]
jobs['Average Salary']= salaries[:10]
jobs['Minimum Salary']=min_salaries[:10]
jobs['Maximum Salary']=max_salaries[:10]
jobs

Unnamed: 0,Company Names,Average Salary,Minimum Salary,Maximum Salary
0,Ab Inbev\nbased on 28 salaries,₹ 20.3L,₹ 15.0L,₹ 15.0L
1,ZS\nbased on 15 salaries,₹ 15.3L,₹ 25.5L,₹ 25.5L
2,Optum\nbased on 25 salaries,₹ 15.1L,₹ 9.5L,₹ 9.5L
3,Fractal Analytics\nbased on 77 salaries,₹ 15.1L,₹ 20.0L,₹ 20.0L
4,Tiger Analytics\nbased on 33 salaries,₹ 14.4L,₹ 11.0L,₹ 11.0L
5,UnitedHealth\nbased on 52 salaries,₹ 13.9L,₹ 21.3L,₹ 21.3L
6,Verizon\nbased on 14 salaries,₹ 12.7L,₹ 9.5L,₹ 9.5L
7,Ganit Business Solutions\nbased on 13 salaries,₹ 12.4L,₹ 22.0L,₹ 22.0L
8,Ericsson\nbased on 43 salaries,₹ 11.9L,₹ 8.3L,₹ 8.3L
9,Deloitte\nbased on 57 salaries,₹ 11.7L,₹ 20.0L,₹ 20.0L
