# Q1: Write a python program to scrape data for “Data Analyst” Job position in “Bangalore” location.
You have to scrape the job-title, job-location, company_name, experience_required. You have to scrape first 10 jobs data. This task will be done in following steps:

1. First get the webpage https://www.naukri.com/
2. Enter “Data Analyst” in “Skill, Designations, Companies” field and enter “Bangalore” in “enter the location” field.
3. Then click the search button.
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data. Note: All of the above steps have to be done in code. No step is to be done manually.

In [1]:
# let's first install the selenium library
! pip install selenium



In [2]:
import selenium
import pandas as pd
from selenium import webdriver
import warnings
warnings.filterwarnings("ignore")
import time

In [3]:
driver = webdriver.Chrome('chromedriver.exe')


In [4]:
driver.get('https://www.naukri.com')


In [5]:
search_field_designation = driver.find_element_by_class_name("suggestor-input ")
search_field_designation.send_keys("Data Analyst")

In [6]:
search_field_location = driver.find_element_by_xpath("/html/body/div/div[2]/div[3]/div/div/div[5]/div/div/div/input")
search_field_location.send_keys("Bangalore")

In [8]:
search_button = driver.find_element_by_xpath("/html/body/div/div[2]/div[3]/div/div/div[6]")
search_button.click()

So now let's first create 4 empty lists. In these lists data will be stored while scraping. We have created 4 empty lists for 4 features which we haveto extract 

1. job_titles 2. job_locations 3. company_names  4. experience_list

In [9]:
job_titles = []
job_locations = []
company_names = []
experience_list = []

First we will extract all the tags where we have the job titles. 

In [12]:
#so let's extract all the tags having the job-titles
titles_tag = driver.find_elements_by_xpath("//a[@class='title fw500 ellipsis']") #locating web element of title
titles_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="9ddab952-11f0-4d7c-9263-924df9e0a3a9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="2fb84f91-3cf1-4930-adfc-9bc58175d664")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="3237570b-df18-4bce-bd71-cddcf413f5c2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="256bd79e-4055-46d0-b2a0-a4016304aba2")>]

In [13]:
for i in titles_tag:
    title = i.text #iterating over web element of title
    job_titles.append(title)
job_titles[0:4]

['Business Data Analyst',
 'EY GDS Data Analyst-Finland based project',
 'Data Analyst - Data and Analytics',
 'Data Analyst - Data and Analytics']

Now we will extract the html tags where we have the company names

In [14]:
#so let's extract all the tags having company names 
companies_tag = driver.find_elements_by_xpath("//a[@class='subTitle ellipsis fleft']") #locating web element of title
companies_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="04681ccf-9dc3-4539-a642-341eeb8a7c10")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="ad2b1e29-164b-4a71-9589-ffaf4efe9441")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="3a8302e9-9e05-4dca-98da-1a620dec1eae")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="ebc1c7a0-e684-4a67-b76a-b107fb01e035")>]

In [15]:
for i in companies_tag:
    company_name = i.text #iterating over web element of title
    company_names.append(company_name)
company_names[0:4]

['NXP Semiconductors', 'EY', 'Intel', 'Intel']

Now we will extract the html tags where we have the experience required data

In [16]:
#so let's extract all the tags having company names 
experience_tag = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi experience']/span") #locating web element of title
experience_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="dee3fac0-a474-4d4e-8932-a78072273198")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="95638141-66d1-4e93-9fc5-d5e0bbc0025a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="a7a78f6c-fce3-485a-a570-81894e895fed")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="9db257ff-5326-4c02-8cec-24e1ffc4301b")>]

In [17]:
for i in experience_tag:
    experience = i.text #iterating over web element of title
    experience_list.append(experience)
experience_list[0:4]

['2-5 Yrs', '0-1 Yrs', '3-6 Yrs', '3-6 Yrs']

Now we will extract the html tags where we have the location of the job data

In [18]:
#so let's extract all the tags having company names 
location_tag = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi location']/span[1]") #locating web element of title
#indexing in html (xpath) starts from 1. not 0
location_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="9ecaf60b-69b3-4049-8170-ac43d65d63d7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="ef684230-f763-4300-8f05-f804943b5814")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="0ca6ae88-02c1-470e-bdb7-b8dbc5376f1c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fe3e3fccfaa84968f5bb0338e0c57e18", element="74408d13-91b9-42c8-8d60-74efb8bfa07a")>]

In [19]:
for i in location_tag:
    location = i.text #iterating over web element of title
    job_locations.append(location)
job_locations[0:4]

['Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru']

In [21]:
#printing length of each list or number of elements present inside each list

print(len(job_titles), len(company_names), len(experience_list), len(job_locations))

20 20 20 20


In [22]:
#import pandas as pd

jobs = pd.DataFrame({})
jobs['Job Title'] = job_titles[:10:]
jobs['Company'] = company_names[:10:]
jobs['Experience Reqd '] = experience_list[:10:]
jobs['Job Location'] = job_locations[:10:]

jobs

Unnamed: 0,Job Title,Company,Experience Reqd,Job Location
0,Business Data Analyst,NXP Semiconductors,2-5 Yrs,Bangalore/Bengaluru
1,EY GDS Data Analyst-Finland based project,EY,0-1 Yrs,Bangalore/Bengaluru
2,Data Analyst - Data and Analytics,Intel,3-6 Yrs,Bangalore/Bengaluru
3,Data Analyst - Data and Analytics,Intel,3-6 Yrs,Bangalore/Bengaluru
4,Data Analyst (CSD),Siemens,2-6 Yrs,Bangalore/Bengaluru
5,"Data Analyst - Data Science, 3 To 5 Years",Rise Finconnect Private Limited,2-6 Yrs,Bangalore/Bengaluru
6,Data Analyst / Business Analyst,METRO Cash & Carry,3-8 Yrs,Bangalore/Bengaluru
7,Data Analyst,Cigna TTK,2-4 Yrs,Bangalore/Bengaluru
8,SAS/SQL - Healthcare Data Analyst - Bangalore,Genpact,7-10 Yrs,Bangalore/Bengaluru
9,Business & Data Analyst - Alteryx (London),Imaginative Brains LLP,5-10 Yrs,"Bangalore/Bengaluru, Delhi / NCR, Mumbai (All ..."


# Q2: Write a python program to scrape data for “Data Scientist” Job position in “Bangalore” location. You
have to scrape the job-title, job-location, company_name. You have to scrape first 10 jobs data.
This task will be done in following steps:
1. First get the webpage https://www.naukri.com/
2. Enter “Data Scientist” in “Skill, Designations, Companies” field and enter “Bangalore” in “enter the
location” field.
3. Then click the search button.
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data.

In [23]:
# let's first install the selenium library
! pip install selenium



In [24]:
import selenium
import pandas as pd
from selenium import webdriver
import warnings
warnings.filterwarnings("ignore")
import time

In [25]:
driver = webdriver.Chrome('chromedriver.exe')

In [26]:
driver.get('https://www.naukri.com')

In [27]:
search_field_designation = driver.find_element_by_class_name("suggestor-input ")
search_field_designation.send_keys("Data Scientist")

In [28]:
search_field_location = driver.find_element_by_xpath("/html/body/div/div[2]/div[3]/div/div/div[5]/div/div/div/input")
search_field_location.send_keys("Bangalore")

In [29]:
search_button = driver.find_element_by_xpath("/html/body/div/div[2]/div[3]/div/div/div[6]")
search_button.click()

So now let's first create 3 empty lists. In these lists data will be stored while scraping. We have created 3 empty lists for 3 features which we haveto extract 

1. job_titles 2. company_names 3. locations_list

In [30]:
job_titles = []
company_names = []
locations_list = []

In [31]:
#so let's extract all the tags having the job-titles
titles_tag = driver.find_elements_by_xpath("//a[@class='title fw500 ellipsis']") #locating web element of title
titles_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="1b94d501-00b0-47ae-b74f-7b89dff21af2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="d1aacfa0-9a89-492c-9e02-c40ea77e7440")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="fc09ef2b-23e2-414f-9192-abaabf363c91")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="bd63fa87-736c-4f05-80d5-eee60a11efc0")>]

In [32]:
for i in titles_tag:
    title = i.text #iterating over web element of title
    job_titles.append(title)
job_titles[0:4]

['Senior Data Scientist',
 'Data Science - Engineering Manager',
 'AI Technologist Vacancy',
 'Job Opening with Wipro For Data Scientist position']

In [33]:
#so let's extract all the tags having company names 
companies_tag = driver.find_elements_by_xpath("//a[@class='subTitle ellipsis fleft']") #locating web element of title
companies_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="7cb37ab3-88d8-4ab7-b5dd-e1720ef786e4")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="1efe07cb-a78c-4ef7-9f4c-8666fdcd1cdb")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="21ec9522-6205-41a5-833b-8eecbf4da020")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="029a8ebf-7983-44d4-866f-9c24b6310a8b")>]

In [34]:
for i in companies_tag:
    company_name = i.text #iterating over web element of title
    company_names.append(company_name)
company_names[0:4]

['Baker Hughes', 'Paytm', 'Wipro', 'Wipro']

In [35]:
#so let's extract all the tags having company names 
location_tag = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi location']/span[1]") #locating web element of title
location_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="3b1eb60c-ee70-4f2b-bdc1-65871994ea4b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="e5ca2163-affa-4e8a-a3b3-2f698a030ba6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="b9fc4ae9-44e9-45b9-bc5f-9ed0a139a3db")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bf142a8d817a9d78f086e6682dbcbce8", element="ff3f954e-1e22-4b60-b20d-5d087f6271a3")>]

In [36]:
for i in location_tag:
    location = i.text #iterating over web element of title
    locations_list.append(location)
locations_list[0:4]

['Mumbai, Bangalore/Bengaluru',
 'Noida, Mumbai, Bangalore/Bengaluru',
 'Kolkata, Hyderabad/Secunderabad, Pune, Ahmedabad, Chennai, Bangalore/Bengaluru, Delhi / NCR, Mumbai (All Areas)',
 'Kolkata, Hyderabad/Secunderabad, Chennai, Bangalore/Bengaluru, Delhi / NCR, Mumbai (All Areas)']

In [37]:
#printing length of each list or number of elements present inside each list

print(len(job_titles), len(company_names), len(locations_list))

20 20 20


In [38]:
#import pandas as pd

jobs = pd.DataFrame({})
jobs['Job Title'] = job_titles[:10:]
jobs['Company'] = company_names[:10:]
jobs['Job Location'] = locations_list[:10:]

jobs

Unnamed: 0,Job Title,Company,Job Location
0,Senior Data Scientist,Baker Hughes,"Mumbai, Bangalore/Bengaluru"
1,Data Science - Engineering Manager,Paytm,"Noida, Mumbai, Bangalore/Bengaluru"
2,AI Technologist Vacancy,Wipro,"Kolkata, Hyderabad/Secunderabad, Pune, Ahmedab..."
3,Job Opening with Wipro For Data Scientist posi...,Wipro,"Kolkata, Hyderabad/Secunderabad, Chennai, Bang..."
4,DATA Scientist with Fraud Analytics Experience,Concentrix Daksh Services,Bangalore/Bengaluru
5,Data Scientist,Applied Materials,Bangalore/Bengaluru
6,Data Scientist,Applied Materials,Bangalore/Bengaluru
7,Data Scientist,Applied Materials,Bangalore/Bengaluru
8,Data Scientist,Applied Materials,Bangalore/Bengaluru
9,Principal - Data Scientist,Schneider Electric,Bangalore/Bengaluru


# Q3: In this question you have to scrape data using the filters available on the webpage as shown below:
You have to use the location and salary filter. You have to scrape data for “Data Scientist” designation for first 10 job results. You have to scrape the job-title, job-location, company name, experience required. The location filter to be used is “Delhi/NCR”. The salary filter to be used is “3-6” lakhs The task will be done as shown in the below steps:

1. first get the webpage https://www.naukri.com/
2. Enter “Data Scientist” in “Skill, Designations, and Companies” field.
3. Then click the search button.
4. Then apply the location filter and salary filter by checking the respective boxes
5. Then scrape the data for the first 10 jobs results you get.
6. Finally create a dataframe of the scraped data. Note: All of the above steps have to be done in code. No step is to be done manually.

In [39]:
import selenium
import pandas as pd
from selenium import webdriver
import warnings
warnings.filterwarnings("ignore")
import time

In [40]:
driver = webdriver.Chrome('chromedriver.exe')

In [41]:
driver.get('https://www.naukri.com')

In [42]:
search_field_designation = driver.find_element_by_class_name("suggestor-input ")
search_field_designation.send_keys("Data Scientist")

In [44]:
search_button = driver.find_element_by_xpath("/html/body/div/div[2]/div[3]/div/div/div[6]")
search_button.click()

In [45]:
location_checkbox = driver.find_element_by_xpath("/html/body/div[1]/div[3]/div[2]/section[1]/div[2]/div[5]/div[2]/div[3]/label/i")
location_checkbox.click()

In [46]:
salary_checkbox = driver.find_element_by_xpath("/html/body/div[1]/div[3]/div[2]/section[1]/div[2]/div[6]/div[2]/div[2]/label/i")
salary_checkbox.click()

So now let's first create 4 empty lists. In these lists data will be stored while scraping. We have created 4 empty lists for 4 features which we haveto extract 

1. job_titles 2. company_names 3. locations_list 4. experience_list

In [47]:
job_titles = []
company_names = []
locations_list = []
experience_list = []

In [48]:
#so let's extract all the tags having the job-titles
titles_tag = driver.find_elements_by_xpath("//a[@class='title fw500 ellipsis']") #locating web element of title
titles_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="5649f099-fd53-40eb-be0a-b45e6a370fae")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="379ad053-fd19-4f44-b380-edd2f7a71c79")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="de59722a-4c03-4a79-8f2c-6ae52fb3bede")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="b3fb028a-5b57-407a-8bf9-5d5a0eb90381")>]

In [49]:
for i in titles_tag:
    title = i.text #iterating over web element of title
    job_titles.append(title)
job_titles[0:4]

['Job Opening with Wipro For Data Scientist position',
 'Data Scientist - Machine learning AI',
 'Data Scientist -Machine Learning with Python',
 'Data Scientist']

In [50]:
#so let's extract all the tags having company names 
companies_tag = driver.find_elements_by_xpath("//a[@class='subTitle ellipsis fleft']") #locating web element of title
companies_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="2f07cf60-2314-48b8-a0af-3cd8fb6082ae")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="6c97e90d-d97d-4997-9a29-9ea581d33831")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="bff6d4bb-6324-4d7f-81bf-d2691be1fa9a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="bde9de86-5d13-44d4-b715-68a9712b9bf6")>]

In [51]:
for i in companies_tag:
    company_name = i.text #iterating over web element of title
    company_names.append(company_name)
company_names[0:4]

['Wipro',
 'Teq Analytics',
 'Genpact',
 'SS Supply Chain Solutions Pvt. Ltd. (3SC)']

In [52]:
#so let's extract all the tags having company names 
experience_tag = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi experience']/span") #locating web element of title
experience_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="308f3464-6e93-4281-a177-4c5ecb6e37bb")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="df887fa5-a2dc-4b0d-b7f6-340edc589f4f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="554648d6-8762-482c-b74f-31418eae9930")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="50e6ff44-d13a-4f81-9d03-f83d756d7df1")>]

In [53]:
for i in experience_tag:
    experience = i.text #iterating over web element of title
    experience_list.append(experience)
experience_list[0:4]

['2-7 Yrs', '3-8 Yrs', '1-4 Yrs', '2-5 Yrs']

In [54]:
#so let's extract all the tags having company names 
location_tag = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi location']/span[1]") #locating web element of title
#indexing in html (xpath) starts from 1. not 0
location_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="95e3acad-fdbc-41bc-b4d5-309f74978abb")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="4b17bc8f-2c22-4d87-a542-9af2860db769")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="1c59c3dd-56d5-4880-bdf4-7885fe5dc87a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="8d97d9148742cd0019c0422e7b8ec4e1", element="e0f39a7e-bd7d-4b08-8354-58abe9a4974a")>]

In [55]:
for i in location_tag:
    location = i.text #iterating over web element of title
    locations_list.append(location)
locations_list[0:4]

['Kolkata, Hyderabad/Secunderabad, Chennai, Bangalore/Bengaluru, Delhi / NCR, Mumbai (All Areas)',
 'Bangalore/Bengaluru, Delhi / NCR, Mumbai (All Areas)',
 'Noida, New Delhi, Gurgaon/Gurugram, Delhi / NCR',
 'Pune, Gurgaon/Gurugram, Bangalore/Bengaluru']

In [56]:
#import pandas as pd

jobs = pd.DataFrame({})
jobs['Job Title'] = job_titles[:10:]
jobs['Company'] = company_names[:10:]
jobs['Experience Reqd '] = experience_list[:10:]
jobs['Job Location'] = locations_list[:10:]

jobs

Unnamed: 0,Job Title,Company,Experience Reqd,Job Location
0,Job Opening with Wipro For Data Scientist posi...,Wipro,2-7 Yrs,"Kolkata, Hyderabad/Secunderabad, Chennai, Bang..."
1,Data Scientist - Machine learning AI,Teq Analytics,3-8 Yrs,"Bangalore/Bengaluru, Delhi / NCR, Mumbai (All ..."
2,Data Scientist -Machine Learning with Python,Genpact,1-4 Yrs,"Noida, New Delhi, Gurgaon/Gurugram, Delhi / NCR"
3,Data Scientist,SS Supply Chain Solutions Pvt. Ltd. (3SC),2-5 Yrs,"Pune, Gurgaon/Gurugram, Bangalore/Bengaluru"
4,Data Scientist - MIND Infotech,MOTHERSONSUMI INFOTECH & DESIGNS LIMITED,4-8 Yrs,Noida
5,Data Scientist - MIND Infotech,MOTHERSONSUMI INFOTECH & DESIGNS LIMITED,4-8 Yrs,Noida
6,Data Scientist - Predictive Analytics,Confidential,1-6 Yrs,"Noida, Mumbai, Chandigarh, Hyderabad/Secundera..."
7,Data Scientist - Internet Jobs - II,Jobs Territory,3-6 Yrs,"Bangalore/Bengaluru, Delhi / NCR, Mumbai (All ..."
8,Machine Learning Engineer | Data Engineer | Da...,Tidyquant (OPC) Private Limited,1-3 Yrs,"Chennai, Bangalore/Bengaluru, Delhi / NCR(Sect..."
9,Dot Net Developer,Nibha Infotech Private Limited,3-8 Yrs,"Gurgaon/Gurugram, Delhi / NCR"


# Q4: Scrape data of first 100 sunglasses listings on flipkart.com. You have to scrape four attributes:
1. Brand
2. Product Description
3. Price

In [57]:
import selenium
import pandas as pd
from selenium import webdriver
import warnings
warnings.filterwarnings("ignore")
import time

In [58]:
driver = webdriver.Chrome('chromedriver.exe')

In [59]:
driver.get('https://www.flipkart.com')

In [62]:
try:
    button = driver.find_element_by_xpath("//button[@class='_2KpZ6l _2doB4z']")
    button.click()
except:
    pass

In [63]:
search_sunglass = driver.find_element_by_xpath("//input[@class='_3704LK']")
search_sunglass.send_keys("Sunglasses")
search_sunglass.send_keys(u'\ue007')

In [65]:
desc1 = []
for counter in range(0,4):
    description = driver.find_elements_by_xpath("//div[@class='_2B099V']")
    for i in description:
        desc1.append(i.text.split('\n'))
    #browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    time.sleep(10)
    next_button = driver.find_element_by_xpath("/html/body/div[1]/div/div[3]/div[1]/div[2]/div[12]/div/div/nav/a[11]")
    next_button.click()
    time.sleep(15)
    counter=counter+1
print(len(desc1))

160


In [66]:
brand = [i[0] for i in desc1]

In [67]:
description = [i[1] for i in desc1]

In [68]:
price = [i[2] for i in desc1]

In [69]:
prod_price = [i.split('₹', 2)[1] for i in price]
discount = [i[-7:] for i in price]

In [72]:
print(len(brand), len(description), len(prod_price), len(discount))

160 160 160 160


In [73]:
Flipkart = pd.DataFrame()
Flipkart['Brand'] = brand[:100]
Flipkart['Description'] = description[:100]
Flipkart['Price'] = prod_price[:100]
Flipkart['Discount'] = discount[:100]
Flipkart

Unnamed: 0,Brand,Description,Price,Discount
0,Singco India,"Gradient, Toughened Glass Lens, UV Protection ...",598,80% off
1,Arnette,Others Oval Sunglasses (53),2879,54% off
2,Mi,Polarized Aviator Sunglasses (Free Size),839,30% off
3,SRPM,UV Protection Wayfarer Sunglasses (50),211,83% off
4,Fastrack,UV Protection Wayfarer Sunglasses (Free Size),799,20% off
...,...,...,...,...
95,Arnette,Mirrored Rectangular Sunglasses (34),2834,54% off
96,Urbanic,Others Retro Square Sunglasses (Free Size),399,55% off
97,Ray-Ban,Gradient Round Sunglasses (54),9259,10% off
98,kingsunglasses,"Mirrored, UV Protection Wayfarer Sunglasses (F...",269,86% off


# Q5: Scrape 100 reviews data from flipkart.com for iphone11 phone.

In [74]:
import selenium
import pandas as pd
from selenium import webdriver
import warnings
warnings.filterwarnings("ignore")
import time

In [77]:
driver = webdriver.Chrome('chromedriver.exe')

In [82]:
driver.get("https://www.flipkart.com/apple-iphone-11-black-64-gb-includes-earpods-power-adapter/p/itm0f37c2240b217?pid=MOBFKCTSVZAXUHGR&lid=LSTMOBFKC")

In [87]:
review_button =driver.find_element_by_xpath("/html/body/div[1]/div/div[3]/div[1]/div[2]/div[9]/div/div/div[5]/div/a/div/span")
review_button.click()

In [88]:
time.sleep(5)

In [90]:
ratings=[]
review_summary=[]
full_review = []
for counter in range(0,11):
    rating = driver.find_elements_by_xpath("//div[@class='_3LWZlK _1BLPMq']")
    for i in rating:
        ratings.append(i.text)
    summary = driver.find_elements_by_xpath("//p[@class='_2-N8zT']")
    for i in summary:
        review_summary.append(i.text)
    review = driver.find_elements_by_xpath("//div[@class='t-ZTKy']")
    for i in review:
        full_review.append(i.text.replace('\n','.'))
    time.sleep(5)
    next_button = driver.find_element_by_xpath("/html/body/div[1]/div/div[3]/div/div/div[2]/div[13]/div/div/nav/a[11]")
    next_button.click()
    time.sleep(15)
    counter=counter+1

In [91]:
print(len(ratings))
print(len(review_summary))
print(len(full_review))

107
110
110


In [92]:
iphone = pd.DataFrame()
iphone['Ratings'] = ratings[:100]
iphone['Review Summary'] = review_summary[:100]
iphone['Full Review'] = full_review[:100]
iphone

Unnamed: 0,Ratings,Review Summary,Full Review
0,5,Simply awesome,Really satisfied with the Product I received.....
1,5,Best in the market!,Great iPhone very snappy experience as apple k...
2,5,Perfect product!,Amazing phone with great cameras and better ba...
3,5,Worth every penny,Previously I was using one plus 3t it was a gr...
4,5,Highly recommended,What a camera .....just awesome ..you can feel...
...,...,...,...
95,5,Worth every penny,Really a giant for battery backup and really g...
96,4,Wonderful,loved it
97,5,Wonderful,camera is very good.
98,5,Not recommended at all,Very bad experience on buying iphone 11 on fli...


# Q6: Scrape data for first 100 sneakers you find when you visit flipkart.com and search for “sneakers” in the
search field.
You have to scrape 4 attributes of each sneaker:
1. Brand
2. Product Description
3. Price

In [93]:
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.flipkart.com')
try:
    button = driver.find_element_by_xpath("//button[@class='_2KpZ6l _2doB4z']")
    button.click()
except:
    pass
search_sneakers = driver.find_element_by_xpath("//input[@class='_3704LK']")
search_sneakers.send_keys("Sneakers")
search_sneakers.send_keys(u'\ue007')

In [94]:
desc1 = []
for counter in range(0,4):
    description = driver.find_elements_by_xpath("//div[@class='_2B099V']")
    for i in description:
        desc1.append(i.text.split('\n'))
    #browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    time.sleep(10)
    next_button = driver.find_element_by_xpath("/html/body/div[1]/div/div[3]/div[1]/div[2]/div[12]/div/div/nav/a[11]")
    next_button.click()
    time.sleep(15)
    counter=counter+1
print(len(desc1))

160


In [95]:
desc1

[['Hot & Knot',
  'High Top Casual Party Wear Boot Stylish Sneakers For Me...',
  '₹559₹99944% off',
  'Free delivery',
  'Deal of the Day'],
 ['RODDICK SHOES',
  'Fashion Outdoor Canvas Casual Light Weight Lace-up Even...',
  '₹485₹99951% off',
  'Free delivery'],
 ['Magnolia',
  'Sneakers For Men',
  '₹374₹99962% off',
  'Delivery by 2 PM, Tomorrow'],
 ['BRUTON',
  'Lightweight Pack Of 1 Trendy Sneakers Sneakers For Men',
  '₹188₹59968% off',
  'Free delivery',
  'Deal of the Day'],
 ['BIRDE',
  'Stylish Comfortable Lightweight, Breathable Walking Sho...',
  '₹314₹99968% off',
  'Free delivery',
  'Deal of the Day'],
 ['BRUTON',
  'Modern Trendy Sneakers Shoes Sneakers For Men',
  '₹299₹1,29976% off',
  'Free delivery',
  'Deal of the Day'],
 ['Layasa',
  'Sneakers For Men',
  '₹399₹99960% off',
  'Free delivery',
  'Lowest price since launch'],
 ['ZF - ALFIYA', 'Sneakers For Men', '₹439₹99956% off', 'Free delivery'],
 ['Robbie jones',
  'Casual Sneakers Green Shoes For Men And Boys 

In [96]:
brand = [i[0] for i in desc1]

In [97]:
description = [i[1] for i in desc1]

In [98]:
price = [i[2] for i in desc1]

In [99]:
prod_price = [i.split('₹', 2)[1] for i in price]

In [105]:
print(len(brand), len(description), len(prod_price))


160 160 160


In [106]:
Flipkart = pd.DataFrame()
Flipkart['Brand'] = brand[:100]
Flipkart['Description'] = description[:100]
Flipkart['Price'] = prod_price[:100]
Flipkart

Unnamed: 0,Brand,Description,Price
0,Hot & Knot,High Top Casual Party Wear Boot Stylish Sneake...,559
1,RODDICK SHOES,Fashion Outdoor Canvas Casual Light Weight Lac...,485
2,Magnolia,Sneakers For Men,374
3,BRUTON,Lightweight Pack Of 1 Trendy Sneakers Sneakers...,188
4,BIRDE,"Stylish Comfortable Lightweight, Breathable Wa...",314
...,...,...,...
95,Ardeo,A STAR BLACK SNEAKERS FOR MAN Sneakers For Men,548
96,Chevit,Speed Set of 5 Pairs Sneakers Outdoors Casuals...,719
97,SPARX,SM-322 Sneakers For Men,882
98,asics,STORMER LS Sneakers For Men,2558


# Q7: Go to the link - https://www.myntra.com/shoes
Set Price filter to “Rs. 7149 to Rs. 14099 ” , Color filter to “Black”, as shown inthe below image. And then scrape First 100 shoes data you get. The data should include “Brand” of the shoes , Short Shoe description, price of the shoe as shown in the below image.

In [138]:
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.myntra.com/shoes')
time.sleep(10)

In [143]:
price_selector = driver.find_element_by_xpath("/html/body/div[2]/div/div[1]/main/div[3]/div[1]/section/div/div[5]/ul/li[2]/label/div")
price_selector.click()

In [142]:
color_selector = driver.find_element_by_xpath("/html/body/div[2]/div/div[1]/main/div[3]/div[1]/section/div/div[6]/ul/li[1]/label/div")
color_selector.click()

In [144]:
desc1 = []
for counter in range(0,3):    
    time.sleep(5)
    if counter==0:
        time.sleep(45) #sleep timer given as the element was not being loaded and throwing an exception of StaleElement
    description = driver.find_elements_by_xpath("//div[@class='product-productMetaInfo']")
    for i in description:
        desc1.append(i.text.split('\n'))
        counter=counter+1
print(len(desc1))

150


In [145]:
brand = [i[0] for i in desc1]

In [146]:
description = [i[1] for i in desc1]

In [147]:
price = [i[2] for i in desc1]
prod_price = [i.split('Rs.', 2)[1] for i in price]

In [148]:
print(len(brand), len(description), len(prod_price))

150 150 150


In [149]:
Myntra = pd.DataFrame()
Myntra['Brand'] = brand[:100]
Myntra['Description'] = description[:100]
Myntra['Price'] = prod_price[:100]
Myntra

Unnamed: 0,Brand,Description,Price
0,Skechers,Men Go Walk 5 Walking Shoes,8499
1,Skechers,Men ENIGMA Running Shoes,7124
2,Skechers,Men Max Cushioning Running,8999
3,Puma,Men Running Shoes,9999
4,Nike,Men Zoom C Pro HC Tennis Shoes,7116
...,...,...,...
95,Geox,Men Textured Leather Driving Shoes,7693
96,Geox,Men Textured Leather Driving Shoes,6993
97,Geox,Men Textured Leather Slip-On Sneakers,8991
98,Geox,Men Textured Leather Driving Shoes,7693


# Q8: Go to webpage https://www.amazon.in/
Enter “Laptop” in the search field and then click the search icon. Then set CPU Type filter to “Intel Core i7” and “Intel Core i9”: After setting the filters scrape first 10 laptops data. You have to scrape 3 attributesfor each laptop:

1. Title
2. Ratings
3. Price

In [279]:
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.amazon.in/')

In [280]:
search_field_designation = driver.find_element_by_xpath("/html/body/div[1]/header/div/div[1]/div[2]/div/form/div[2]/div[1]/input")
search_field_designation.send_keys("Laptop")

In [281]:
search_button = driver.find_element_by_xpath("/html/body/div[1]/header/div/div[1]/div[2]/div/form/div[3]/div/span/input")
search_button.click()

In [282]:
cpu_selector = driver.find_element_by_xpath("/html/body/div[1]/div[2]/div[1]/div[2]/div/div[3]/span/div[1]/div/div/div[6]/ul[4]/li[13]/span/a/div/label/i")
cpu_selector.click()

In [283]:
titles = []
ratings = []
price = []

In [284]:
titles_tag = driver.find_elements_by_xpath("//span[@class='a-size-medium a-color-base a-text-normal']") #locating web element of title
titles_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="42aa3146-6f31-44ac-ad7e-494b9222f9e2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="74600991-f0ff-4bd2-ae4e-212249e0fac1")>,
 <selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="24568b07-4855-4ca7-bc94-ffa472f1d507")>,
 <selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="856f187f-11da-4958-9088-ab6118898edb")>]

In [285]:
for i in titles_tag:
    title = i.text #iterating over web element of title
    titles.append(title)
titles[0:4]

['ASUS TUF Gaming F15 (2021), 15.6" (39.62 cms) FHD 144Hz, Intel Core i7-11600H 11th Gen, 4GB RTX 3050 Graphics, Gaming Laptop (16GB/512GB SSD/Windows 10/Office 2019/Gray/2.3 Kg), FX566HCB-HN299TS',
 'ASUS VivoBook 14 (2021), 14-inch (35.56 cms) FHD, Intel Core i7-1065G7 10th Gen, Thin and Light Laptop (16GB/512GB SSD/Integrated Graphics/Office 2021/Windows 11/Silver/1.6 Kg), X415JA-EK701WS',
 'Lenovo IdeaPad 5 Pro 11th Gen Intel Core i7 14 inches QHD IPS Thin and Light Laptop (16GB/512GB SSD/Iris Xe Graphics/Windows 11/Office 2021/Backlit/300Nits/Storm Grey/1.41Kg), 82L3006YIN',
 'Samsung Galaxy Book2 Intel 12th Gen core i7 39.6cm (15.6") FHD LED Thin & Light Laptop (16 GB/512 GB SSD/Windows 11/MS Office/Backlit Keyboard/Fingerprint Reader/Silver/1.55Kg), NP750XED-KC2IN']

In [286]:
price_tag = driver.find_elements_by_xpath("//span[@class='a-price']") #locating web element of title
price_tag[0:4] #using range to print only top 4 results

[<selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="0f5e871b-e050-46bd-81c6-66a3d973bad2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="c3911677-7377-4405-8c10-341a890514a0")>,
 <selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="2140201d-c21f-4d6f-b515-73ade66df1a8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="453ad00ec05d4239b1a88b526cab859a", element="303c6bb4-bc84-4889-9d26-f62bb0bbaec9")>]

In [287]:
for i in price_tag:
    cost = i.text #iterating over web element of title
    price.append(cost)
price[0:4]

['₹83,990', '₹57,490', '₹75,309', '₹79,990']

In [288]:
print(len(titles), len(price))

24 24


In [290]:
laptop = pd.DataFrame()
laptop['Title'] = titles[:10]
laptop['Price'] = price[:10]
laptop

Unnamed: 0,Title,Price
0,"ASUS TUF Gaming F15 (2021), 15.6"" (39.62 cms) ...","₹83,990"
1,"ASUS VivoBook 14 (2021), 14-inch (35.56 cms) F...","₹57,490"
2,Lenovo IdeaPad 5 Pro 11th Gen Intel Core i7 14...,"₹75,309"
3,Samsung Galaxy Book2 Intel 12th Gen core i7 39...,"₹79,990"
4,LG Gram 17 Intel Evo 11th Gen i7 Thin & Light ...,"₹88,490"
5,HP Pavilion x360 11th Gen Intel Core i7 14 inc...,"₹84,490"
6,LG Gram 16 Intel Evo 11th Gen i7 Thin & Light ...,"₹86,490"
7,ASUS Vivobook X515JA-EJ701WS Intel Core I7-106...,"₹57,290"
8,HP Pavilion 15 12th Gen Intel Core i7 16GB SDR...,"₹89,009"
9,Lenovo IdeaPad Flex 5 11th Gen Intel Core i7 1...,"₹81,490"



# Q9: Write a python program to scrape data for first 10 job results for Data Scientist Designation in Noida location.

In [203]:
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.ambitionbox.com/')
time.sleep(5)

In [204]:
jobs = driver.find_element_by_xpath("//a[@class='link jobs']")
jobs.click()

In [205]:
designation = driver.find_element_by_xpath("//input[@class='input tt-input']")
designation.send_keys("Data Scientist")

In [208]:
search = driver.find_element_by_xpath("//button[@class='ab_btn search-btn round']")
search.click()

In [210]:
location = driver.find_element_by_xpath("//div[@title='Location']")
location.click()

In [211]:
location_noida = driver.find_element_by_xpath("//input[@placeholder='Search locations']")
location_noida.send_keys("Noida")

In [212]:
label_noida = driver.find_element_by_xpath("//input[@id='location_Noida']")
label_noida.click()

In [None]:
more_jobs = driver.find_element_by_xpath("//button[@class='ab_btn load-more-btn invert']")
more_jobs.click()

In [214]:
company = driver.find_elements_by_xpath("//p[@class='company body-medium']")
company_name = []
for i in company:
    company_name.append(i.text)

In [215]:
company_name

['GENPACT India Private Limited',
 'Optum Global Solutions (India) Private Limited',
 'GENPACT India Private Limited',
 'Hcl Technologies Limited',
 'EXL Services.com ( I ) Pvt. Ltd.',
 'Paytm',
 'Om Software Internet Solutions Private Limited',
 'Paytm',
 'MOTHERSONSUMI INFOTECH & DESIGNS LIMITED',
 'Ashkom Media India Private Limited']

In [216]:
other_info = driver.find_elements_by_xpath("//div[@class='other-info']")
info = []
for i in other_info:
    info.append(i.text.split('\n'))

In [217]:
days = [i[-1] for i in info]
days_ago = [i.split('·', 2)[0] for i in days]

In [218]:
rate = driver.find_elements_by_xpath("//div[@class='rating-wrapper']")


In [219]:
rating = []
for i in rate:
    rating.append(i.text.split('\n'))
rating[3:]

[['4.0', 'based on 19.2k Reviews'],
 ['4.0', '(19.2k Reviews)'],
 ['4.2', '(1.8k Reviews)'],
 ['4.0', '(19.2k Reviews)'],
 ['3.9', '(20.6k Reviews)'],
 ['3.9', '(4.8k Reviews)'],
 ['3.7', '(4.1k Reviews)'],
 ['4.5', '(44 Reviews)'],
 ['3.7', '(4.1k Reviews)'],
 ['3.3', '(280 Reviews)'],
 ['3.5', '(24 Reviews)']]

In [220]:
ratings = [i[0] for i in rating]
ratings = ratings[3:]

In [221]:
ambtionbox = pd.DataFrame()
ambtionbox['Company Name'] = company_name[:10]
ambtionbox['Days Ago'] = days_ago[:10]
ambtionbox['Ratings'] = ratings[:10]
ambtionbox

Unnamed: 0,Company Name,Days Ago,Ratings
0,GENPACT India Private Limited,8d ago,4.0
1,Optum Global Solutions (India) Private Limited,16d ago,4.0
2,GENPACT India Private Limited,15d ago,4.2
3,Hcl Technologies Limited,18d ago,4.0
4,EXL Services.com ( I ) Pvt. Ltd.,29d ago,3.9
5,Paytm,11d ago,3.9
6,Om Software Internet Solutions Private Limited,19d ago,3.7
7,Paytm,29d ago,4.5
8,MOTHERSONSUMI INFOTECH & DESIGNS LIMITED,16d ago,3.7
9,Ashkom Media India Private Limited,15d ago,3.3


# Q10: Write a python program to scrape the salary data for Data Scientist designation.

In [229]:
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.ambitionbox.com/')

In [230]:
salaries = driver.find_element_by_xpath("//a[@class='link salaries']")
salaries.click()

In [237]:
designation = driver.find_element_by_xpath("/html/body/div/div/div/main/section[1]/div[2]/div[1]/span/input")
designation.send_keys("Data Scientist")

In [None]:
data_scientist = driver.find_element_by_xpath("/html/body/div/div/div/div[2]/div/div/div/div[1]/span/div/div[2]/div[2]/div[2]/div/p")
data_scientist.click()

In [240]:
search_button = driver.find_element_by_xpath("/html/body/div/div/div/main/section[1]/div[2]/div[1]/i[1]")
search_button.click()

In [241]:
result = driver.find_elements_by_xpath("//div[@class='result-row']")
results = []
for i in result:
    results.append(i.text.split('\n'))

In [242]:
company_name = [i[0] for i in results]


In [243]:
salary_record = [i[1] for i in results]


In [244]:
average_salary = [i[4] for i in results]


In [245]:
min_salary = [i[5] for i in results]


In [246]:
max_salary = [i[6] for i in results]


In [247]:
exp_reqd = [i[3] for i in results]


In [248]:
Ambition_Box = pd.DataFrame()
Ambition_Box['Company Name'] = company_name
Ambition_Box['Total Salary Record'] = salary_record
Ambition_Box['Average Salary'] = average_salary
Ambition_Box['Minimum Salary'] = min_salary
Ambition_Box['Maximum Salary'] = max_salary
Ambition_Box['Experience Required'] = exp_reqd
Ambition_Box

Unnamed: 0,Company Name,Total Salary Record,Average Salary,Minimum Salary,Maximum Salary,Experience Required
0,Walmart,based on 12 salaries,3 yrs exp,₹ 30.2L,₹ 25.0L,.
1,Ab Inbev,based on 33 salaries,3-4 yrs exp,₹ 20.6L,₹ 15.0L,.
2,American Express,based on 10 salaries,4 yrs exp,₹ 19.9L,₹ 14.1L,.
3,ZS,based on 15 salaries,2 yrs exp,₹ 16.7L,₹ 11.0L,.
4,Optum,based on 33 salaries,3-4 yrs exp,₹ 16.1L,₹ 11.0L,.
5,Reliance Jio,based on 21 salaries,3-4 yrs exp,₹ 15.7L,₹ 5.6L,.
6,Fractal Analytics,based on 83 salaries,2-4 yrs exp,₹ 15.4L,₹ 10.0L,.
7,Tiger Analytics,based on 50 salaries,2-4 yrs exp,₹ 14.8L,₹ 9.0L,.
8,UnitedHealth,based on 57 salaries,2-4 yrs exp,₹ 14.0L,₹ 8.3L,.
9,EXL Service,based on 10 salaries,4 yrs exp,₹ 13.3L,₹ 8.9L,.
