1) Collect and use job market data to explore data related positions or machine learning related 
positions in a particular region (e.g., US, India). Suggested websites for this assignment are:
    
- Indeed
- Glassdoor
- Naukri
- Monster

You will have to use appropriate keywords (for e.g., “data scientist”, “data engineer”, “ML 
engineer”) to extract information on the positions available on websites of your choice.
While it is not compulsory to do so, you could use web scraping techniques to collect data from these websites. 
Also, note that you are not limited to the sites 
mentioned above. 

Note: Any one of the recommended websites can be used to extract data. Or you can select any job-related 
website of your choosing. You may select 1 keyword of your choice.
Please make sure to use Web scraping technique for data extraction.


In [1]:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

url_template = "https://www.naukri.com/machine-learning-engineer-jobs-in-india?page={page}&k=machine%20learning%20engineer&l=india&nignbevent_src=jobsearchDeskGNB&experience=0"
num_pages = 20

Title = []
Company = []
Location = []
Ratings = []
Reviewscount = []
Key_skills = []
link_to_apply = []

driver = webdriver.Chrome()

for page in range(1, num_pages + 1):
    url = url_template.format(page=page)
    driver.get(url)
    time.sleep(3)
   
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    job_elements = soup.find_all('article', class_='jobTuple')
   
    for job_elem in job_elements:
        title = job_elem.find('a', class_='title')
        company = job_elem.find('a', class_='subTitle')
        location = job_elem.find('span', class_='ellipsis fleft locWdth')
        rating = job_elem.find('span', class_='starRating fleft')
        reviewscount = job_elem.find('span', class_='reviewsCount fleft')
        key_skills = job_elem.find('li', class_='fleft dot')
        link = job_elem.find("a", href=True)
        base_url = "https://www.naukri.com"
        final_link = base_url + link["href"]
       
        Title.append(title.text.replace('\n', " ").strip())
        Company.append(company.text.replace('\n', " ").strip())
        Location.append(location.text)
        Ratings.append(rating.text if rating else "N/A")
        Reviewscount.append(reviewscount.text if reviewscount else "N/A")
        Key_skills.append(key_skills.text.replace('\n', " ").strip())
        link_to_apply.append(final_link.replace('\n', " "))
       
    # Close the page after extracting data
    driver.close()
    driver = webdriver.Chrome()  # Re-initialize WebDriver
   
# Create and save DataFrame
data = list(zip(Title, Company, Location, Ratings, Reviewscount, Key_skills, link_to_apply))
df = pd.DataFrame(data, columns=["Title", "Company", "Location", "Ratings", "Reviewscount", "Key_skills","Link"])
df.to_csv("job_data.csv", index=False)


In [2]:
data=pd.read_csv("job_data.csv")

In [3]:
data.shape

(400, 7)

In [4]:
data.head(5)

Unnamed: 0,Title,Company,Location,Ratings,Reviewscount,Key_skills,Link
0,Machine Learning Engineer,Viga Entertainment Technology,Bangalore/Bengaluru,4.5,2 Reviews,Computer vision,https://www.naukri.comhttps://www.naukri.com/j...
1,Machine Learning Engineer,Mantras2success Consultants,Hyderabad/Secunderabad,4.9,8 Reviews,BPO,https://www.naukri.comhttps://www.naukri.com/j...
2,Machine Learning Engineer,Cubicle Compass - Pointing You To Success,Chennai,,,Telecom,https://www.naukri.comhttps://www.naukri.com/j...
3,Machine Learning Engineer,e con Systems,Chennai,3.7,45 Reviews,3D,https://www.naukri.comhttps://www.naukri.com/j...
4,Machine Learning Engineer,Samyak Infotech,Ahmedabad,4.3,18 Reviews,Machine learning,https://www.naukri.comhttps://www.naukri.com/j...


In [5]:
data.tail(5)

Unnamed: 0,Title,Company,Location,Ratings,Reviewscount,Key_skills,Link
395,Data Scientist,Yulu Bikes,Bangalore/Bengaluru,3.9,27 Reviews,Mining,https://www.naukri.comhttps://www.naukri.com/j...
396,data scientist,Global Talent Pool,Bangalore/Bengaluru,,,Computer vision,https://www.naukri.comhttps://www.naukri.com/j...
397,Data Scientist,Akaike Technologies,Bangalore/Bengaluru,,,deep learning,https://www.naukri.comhttps://www.naukri.com/j...
398,Data Scientist,Pacific Placements &amp;amp; Consultancy,Kolhapur,,,Computer science,https://www.naukri.comhttps://www.naukri.com/j...
399,Data Scientist (Ph.D. Must),NGI Ventures,Remote,4.1,56 Reviews,Python,https://www.naukri.comhttps://www.naukri.com/j...


In [6]:
data.isnull().sum()

Title             0
Company           0
Location          0
Ratings         180
Reviewscount    180
Key_skills        0
Link              0
dtype: int64

In [7]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline 

In [8]:
data['Company'].nunique()

20

In [9]:
data.dtypes

Title            object
Company          object
Location         object
Ratings         float64
Reviewscount     object
Key_skills       object
Link             object
dtype: object