# Proejct 4: Web Scraping Job Posting
Business Case Overview

You're working as a data scientist for a contracting firm that's rapidly expanding. Now that they have their most valuable employee (you!), they need to leverage data to win more contracts. Your firm offers technology and scientific solutions and wants to be competitive in the hiring market. Your principal has two main objectives:


1. **Determine the industry factors that are most important in predicting the salary amounts for these data.**
2. **Determine the factors that distinguish job categories and titles from each other. For example, can required skills accurately predict job title?**

To limit the scope, your principal has suggested that you focus on data-related job postings, e.g. **data scientist, data analyst, research scientist, business intelligence**, and any others you might think of. You may also want to decrease the scope by limiting your search to a **single region**.

Hint: Aggregators like [Indeed.com](https://www.indeed.com.sg/?r=us) regularly pool job postings from a variety of markets and industries.

Goal: Scrape your own data from a job aggregation tool like Indeed.com in order to collect the data to best answer these two questions.

## Scraping data from Indeed
Example website screencapture:
![image.png](attachment:image.png)

References:<br>
[Web Scraping Indeed for Key Data Science Job Skills - Jesse Steinweg-Woods](https://jessesw.com/Data-Science-Skills/) <br>
[Web Scraping Indeed.com to Predict Salaries - Nathan Mitchell](https://ntmitchell.github.io/indeed-jobs-postings/) <br>
https://bigdata-madesimple.com/top-big-data-tools-used-to-store-and-analyse-data/

The scraping process mainly deal with the main search page, and the long process make it extremely difficult to access individual job posting. There also a challenge that the website access is not stable,make iterating very difficult.This error often poped up with no reason. Thus, individual search keywords were used.
![image.png](attachment:image.png)

In [1]:
#web scraping
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import requests

import re

import pandas as pd
import numpy as np

In [2]:
# #key words
# #job title keywords
# job_title=['data scientist', 'data analyst', 'data engineer','research scientist', 'business intelligence','analytics']

# #job types
# job_type=['permanent','fulltime','contract','temporary','internship','parttime','apprenticeship']

First look at the 1st page of the search results by searching data analyst in UK, with 50 results display per page: <br>
https://www.indeed.co.uk/jobs?q=data+analyst&jt=permanent&limit=50<br> and open web development tool to inspect the webpage. <br>
It seems that the Indeed UK webiste can only allow maximum to display 50 search result at one page, further increase of the number after limit does not actually increase the display results. <br>

*The main focus* is on two job types: permanent and fulltime, as well as 6 positions: 'data scientist', 'data analyst', 'data engineer','research scientist', 'business intelligence','analytics



In [3]:
#test the function above 
#url: https://www.indeed.com/viewjob?jk=27162dddbf2459fa&from=serp&vjs=3
#website_cleaner('https://www.indeed.com/viewjob?jk=27162dddbf2459fa&from=serp&vjs=3')

In [9]:
#Start web scraping
#The main purpose is to extract as much as information as possible
#yet keep the information succinct 

def generate_search_df(position,types):

    result_df=pd.DataFrame()
    
    print ('Searching for ',position,'and','types')
    
    title='+'.join(position.split(' '))
    
    
    #initiate an empty list to collect data everytime for new position and job type
    result_list=[]

    print ('Under ',position,'search for ',types,' job')

    url='https://www.indeed.co.uk/jobs?q={}&jt={}&limit=50'.format(title,types)
    response = requests.get(url,headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
    first_page=BeautifulSoup(response.content,'html.parser')

    job_counts=(first_page.find("div", attrs = {"id": "searchCount"}).text.split(' '))[-1]

    #loop over the pages and viewing 100 results per page at a time
    i = int(int(job_counts.replace(',',''))/50)

    for page_number in range(i+1):
        print ('Accessing page ',page_number,'\n')

        url_each_page = 'https://www.indeed.co.uk/jobs?q={}&jt={}&limit=50&start={}'.format(position,types,(str(50*page_number)))
        response = requests.get(url_each_page,headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
        soup_each_page = BeautifulSoup(response.content,'html.parser')
        # Get all advertised job descriptions
        results = soup_each_page.find_all('div', attrs={'data-tn-component': 'organicJob'})

        #extract job ID for each job listing
        for x in results:
            job_id = x.find(name='h2', attrs={"class": "jobtitle"})['id']
            job_title = x.find('a', attrs={'data-tn-element': "jobTitle"})
            job_link = "https://www.indeed.com" + x.find('h2', attrs={"class": "jobtitle"}).find('a')['href']
            job_summary=x.find('span',{'class':'summary'})
            company=x.find(name='span', attrs={'class':'company'})
            location = x.find(name='span', attrs={'class':'location'})
            time_posted = x.find('span',{'class':'date'})
            salary = x.find(name='span', attrs={'class':'no-wrap'})


            #set default missing values
            if job_title != None:
                job_title_result = job_title.get_text().strip()
            else:
                job_title_result = np.nan

            if company != None:
                company_result = company.get_text().strip()
            else:
                company_result = np.nan

            if job_summary != None:
                job_summary_result = job_summary.get_text().strip()
            else:
                job_summary_result = np.nan

            if salary != None:
                salary_result = salary.get_text().strip()
            else:
                salary_result = np.nan

            if location != None:
                location_result = location.get_text().strip()
            else:
                location_result = np.nan

            if time_posted != None:
                time_posted_result = time_posted.get_text().strip()
            else:
                time_posted_result=np.nan

            for div in x.find_all(name='td',attrs={"class":"snip"}):
                try:
                    span = div.find(name="span", attrs={"class":"no-wrap"})
                    salary_result = span.get_text().strip()
                except:
                    salary_result = np.nan

            review = x.find('a', attrs={"class": "ratingsLabel"})
            if review != None:
                review_result = review.get_text().strip()
            else:
                review_result=np.nan


            result_list.append([job_id,types, title,time_posted_result, job_title_result, company_result, salary_result, 
                                location_result, review_result,
                                job_summary_result])
        result_df=result_df.append(result_list)
    result_df.drop_duplicates(inplace=True)
        
    result_df.columns = ['job_id','job_types','job_tag','post_time', 'job_title', 'company', 'salary', 
                                    'location', 'no_of_review','job_summary']
    
    #save list into csv file

    print ('It\'s done now! You can look at the output csv file.')
    print ('There are ',len(result_df),'jobs successfully extracted. ')
    result_df.to_csv('./{}.csv'.format(position + types))

    return result_df.head(2)



In [6]:
generate_search_df('data scientist','permanent')

Searching for  data scientist and types
Under  data scientist search for  permanent  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

It's done now! You can look at the output csv file.
There are  944 jobs successfully extracted. 


Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_3becc778f10b2727,permanent,data+scientist,3 hours ago,Data Scientist,Barclays,,Glasgow,"4,332 reviews",Data Scientist - 90210626. Job Title – Data Sc...
1,jl_311f8c87f377fec1,permanent,data+scientist,9 days ago,Graduate Data Scientist,N Brown,"£25,000 a year",Manchester M60,,What a Data Scientist will be responsible for:...
2,jl_950a6428afc8792c,permanent,data+scientist,10 days ago,Data Scientist,Office for National Statistics,"£35,200 a year",Newport NP10,43 reviews,We are recruiting a Data Scientist to fill a v...
3,jl_8e7a2074815a0521,permanent,data+scientist,2 days ago,Data Scientist,Innovative Technology Ltd,,Oldham,11 reviews,"Are you an experienced Data Scientist, who’s l..."
4,jl_a48d2101a8c79123,permanent,data+scientist,8 days ago,Senior Data Scientist,Office for National Statistics,"£47,400 a year",Newport NP10,43 reviews,As a Senior Data Scientist within the Data Sci...
5,jl_2ef1238fcfa3f7ef,permanent,data+scientist,14 days ago,Data Scientist,Ambassador Theatre Group,"£45,000 - £55,000 a year",London,20 reviews,The Data Scientist is a key role within the au...
6,jl_1c2067a43b42f93c,permanent,data+scientist,2 days ago,Data Scientist,Inmarsat,,London,32 reviews,Our work analyses the masses of data gathered ...
7,jl_6ef0e4669d224598,permanent,data+scientist,3 days ago,Data Scientist,Asset Resourcing Ltd,"£60,000 - £80,000 a year",Leeds,,Data Science Consultant:. Work in collaboratio...
8,jl_4eb6eb7f377b9215,permanent,data+scientist,30+ days ago,Data Scientist (AVP),Barclays,,Northampton,"4,332 reviews",Data Scientist (AVP). Data Scientist (AVP) - 9...
9,jl_d69e84de1f1c38fc,permanent,data+scientist,30+ days ago,Data Scientist,Saint-Gobain,,West Midlands,"2,262 reviews",In this exciting Data Scientist role you will ...


In [7]:
generate_search_df('data analyst','permanent')

Searching for  data analyst and types
Under  data analyst search for  permanent  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Accessing page  43 

A

Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_d8844c8a0c78b2ef,permanent,data+analyst,2 hours ago,Data Analyst,Wilkinson Hardware Stores,,Worksop,"1,246 reviews",Provide / support the Data Governance team wit...
1,jl_5c9490b54fdb4072,permanent,data+analyst,3 hours ago,Data Analyst,Barclays,,Glasgow,"4,332 reviews",Data Analyst - 90210723. Job Title – Data Anal...
2,jl_650cff5dfb789610,permanent,data+analyst,9 hours ago,Data Visualisation Analyst,Barclays,,Glasgow,"4,332 reviews",Data Visualisation Analyst. Data Visualisation...
3,jl_02c8bfacb7b2d1cb,permanent,data+analyst,2 days ago,Retail Data Analyst,Jet2.com and Jet2holidays,,Leeds,135 reviews,"Reporting to the Retail Revenue Manager, the R..."
4,jl_6b8942012bd01c63,permanent,data+analyst,2 days ago,Research and Data Analyst,Office of Qualifications and Examinations Regu...,"£31,000 - £36,540 a year",Coventry CV1,, Working with others in the Data and Analytic...
5,jl_c2f2e1627367aad3,permanent,data+analyst,30+ days ago,Data & Reporting Analyst,Attwood Perks,"£22,000 - £24,000 a year",Chelmsford,,Developing an understanding of data fields and...
6,jl_8639d5233cedec6a,permanent,data+analyst,1 day ago,Data Analyst,Legal & General Group Plc.,"£35,000 - £40,000 a year",Hove,10 reviews,"Able to adapt to and work with change, in a fa..."
7,jl_0c018b18cc1f4426,permanent,data+analyst,1 day ago,Research Data Analyst,Which?,,London,9 reviews,We have an exciting opportunity for a Research...
8,jl_2a2d280e3bb6893e,permanent,data+analyst,8 days ago,PowerBI Data Analyst,Rank Group,,Maidenhead,18 reviews,"Translates customer, transactional and behavio..."
9,jl_95469b8d44cf0d18,permanent,data+analyst,2 days ago,Systems Analyst - Data,Costa Coffee,,Dunstable,"1,684 reviews",Previous experience of data mapping and data m...


In [8]:
generate_search_df('data engineer','permanent')

Searching for  data engineer and types
Under  data engineer search for  permanent  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Accessing page  43 


Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_0c101a183a2bb72e,permanent,data+engineer,8 days ago,Engineer (Data) - x 2,University of the West of England,"£29,515 - £33,199 a year",Bristol,40 reviews,In this role you would be supporting the Princ...
1,jl_1b6051b03b8d9c83,permanent,data+engineer,23 days ago,Data Engineer,Shop Direct,,Liverpool,225 reviews,We work cross-functionally to make our data co...
2,jl_55fc29e7d7c20bac,permanent,data+engineer,22 hours ago,Data Engineer,Barclays,,Northampton,"4,332 reviews",Data Engineer - 90208128. Conceptualize and qu...
3,jl_5687f2b75f153096,permanent,data+engineer,2 days ago,Data Engineer,Arnold Ash Ltd,"£60,000 - £75,000 a year",London,,Data Engineer - Data Science. Data Engineer - ...
4,jl_294d237b64fe6c8a,permanent,data+engineer,2 days ago,Data Engineer,Brightred Resourcing Limited,"£50,000 - £60,000 a year",London,,Main Job duties and Responsibilities of Data E...
5,jl_24eef8c5def82a90,permanent,data+engineer,30+ days ago,BI Data Engineer,Babylon Health,,London,,Bringing data from different data sources and ...
6,jl_4e21ef92dcd89d02,permanent,data+engineer,30+ days ago,Data Engineer,Argos,,Milton Keynes,"2,609 reviews","To help with this journey, we are looking for ..."
7,jl_740e28bccb2d441d,permanent,data+engineer,2 days ago,Data Analyst & Simulation Engineer,Buro Happold,,London,9 reviews,With the role of Data Analyst & Simulation Eng...
8,jl_3c3d8949a3f66650,permanent,data+engineer,2 days ago,Data Analyst & Simulation Engineer,BuroHappold Engineering,,London,12 reviews,With the role of Data Analyst & Simulation Eng...
9,jl_7bf59c332838f0b9,permanent,data+engineer,6 days ago,Data Engineer,Digitech Resourcing Ltd,"£60,000 - £80,000 a year",London,,Data Engineer benefits:. As the Data Engineer ...


In [11]:
generate_search_df('research scientist','permanent')

Searching for  research scientist and types
Under  research scientist search for  permanent  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

It's done now! You can look at the output csv file.
There are  491 jobs successfully extracted. 


Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_aa82ca0de298dbe6,permanent,research+scientist,22 days ago,Explosive Research Scientist,Defence Science and Technology Laboratory,"£31,500 - £35,500 a year",Salisbury SP4,17 reviews, Delivering high quality research to meet oft...
1,jl_7c6c809b5e0f60d9,permanent,research+scientist,8 days ago,Laboratory Scientist,Anthony Nolan,"£25,000 a year",London,5 reviews,We are looking for a Laboratory Scientist to j...


In [12]:
generate_search_df('business intelligence','permanent')

Searching for  business intelligence and types
Under  business intelligence search for  permanent  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Acce

Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_1c4bbb310c1865b4,permanent,business+intelligence,21 hours ago,Business Intelligence Developer,Codex,"£35,000 a year",Hampshire,10 reviews,Business Intelligence Developer (Qlikview) - L...
1,jl_8561f1b6cd4a665e,permanent,business+intelligence,6 days ago,Business Intelligence Developer,Joules Limited,,Corby,,Supporting the day to day running of Joules Bu...


In [15]:
generate_search_df('analytics','permanent')

Searching for  analytics and types
Under  analytics search for  permanent  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Accessing page  43 

Accessi

Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_aefa77662d5bfc1d,permanent,analytics,6 days ago,Head of Customer Insight & Analytics,Rank Group,,Maidenhead,18 reviews,� Responsible for leading a team of Analysts a...
1,jl_b31a196a2fff5772,permanent,analytics,1 day ago,"Advanced Analytics Consultant (Predictive, AI,...",Capita Plc,,Edinburgh,"2,017 reviews",Become an Advanced Analytics Consultant with B...


In [17]:
generate_search_df('data scientist','fulltime')

Searching for  data scientist and types
Under  data scientist search for  fulltime  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

It's done now! You can look at the output csv file.
There are  1000 jobs successfully extracted. 


Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_6635075b094accda,fulltime,data+scientist,1 day ago,Data Scientist Intern,Illumina,,England,144 reviews,Data Scientist Intern. Machine Learning on DNA...
1,jl_950a6428afc8792c,fulltime,data+scientist,10 days ago,Data Scientist,Office for National Statistics,"£35,200 a year",Newport NP10,43 reviews,We are recruiting a Data Scientist to fill a v...


In [18]:
generate_search_df('data analyst','fulltime')

Searching for  data analyst and types
Under  data analyst search for  fulltime  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Accessing page  43 

Ac

Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_3b7d740f0102253d,fulltime,data+analyst,22 days ago,Trainee Quality Improvement Data Analyst,East London NHS Foundation Trust,"£27,628 - £35,530 a year",London E1,21 reviews,The Trainee Data Analyst will form part of the...
1,jl_e35d91216e282815,fulltime,data+analyst,3 days ago,Data Analyst,Royal Cornwall Hospitals NHS Trust,"£28,050 - £36,644 a year",Truro TR1,2 reviews,Someone who can link data together to present ...


In [22]:
generate_search_df('data engineer','fulltime')

Searching for  data engineer and types
Under  data engineer search for  fulltime  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Accessing page  43 



Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_64eaa46804364549,fulltime,data+engineer,11 days ago,Graduate Big Data Engineer,Jagex,,Cambridge,4 reviews,"As Graduate Big Data Engineer, you will:. Jage..."
1,jl_7c235a26d266283a,fulltime,data+engineer,2 days ago,Data Engineer,RBS,,Edinburgh,"2,920 reviews","As a data engineer, you'll be supporting data ..."


In [24]:
generate_search_df('research scientist','fulltime')

Searching for  research scientist and types
Under  research scientist search for  fulltime  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

It's done now! You can look at the output csv file.
There are  1000 jobs successfully extracted. 


Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_a407a4dfaf332ef1,fulltime,research+scientist,5 days ago,R&D Research Scientist,ConvaTec,,Deeside,136 reviews,Proven research & development laboratory testi...
1,jl_bd0cfe61a897cc7e,fulltime,research+scientist,9 days ago,Research Vascular Scientist,Chelsea and Westminster Hospital NHS Foundatio...,"£26,682 - £34,049 a year",Isleworth TW7,11 reviews,"Research Vascular Scientist, West Middlesex Lo..."


In [25]:
generate_search_df('business intelligence','fulltime')

Searching for  business intelligence and types
Under  business intelligence search for  fulltime  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Acces

Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_a6aac63c6c160a07,fulltime,business+intelligence,3 days ago,Business Intelligence Internship,Kapeinternships,,London,,Distributing these templated Business Intellig...
1,jl_5873a76350c2f1ef,fulltime,business+intelligence,7 days ago,Business Intelligence Officer,Central and North West London NHS Foundation T...,"£26,565 - £42,046 a year",London NW1,16 reviews,An exciting opportunity has arisen for a Busin...


In [28]:
generate_search_df('analytics','fulltime')

Searching for  analytics and types
Under  analytics search for  fulltime  job
Accessing page  0 

Accessing page  1 

Accessing page  2 

Accessing page  3 

Accessing page  4 

Accessing page  5 

Accessing page  6 

Accessing page  7 

Accessing page  8 

Accessing page  9 

Accessing page  10 

Accessing page  11 

Accessing page  12 

Accessing page  13 

Accessing page  14 

Accessing page  15 

Accessing page  16 

Accessing page  17 

Accessing page  18 

Accessing page  19 

Accessing page  20 

Accessing page  21 

Accessing page  22 

Accessing page  23 

Accessing page  24 

Accessing page  25 

Accessing page  26 

Accessing page  27 

Accessing page  28 

Accessing page  29 

Accessing page  30 

Accessing page  31 

Accessing page  32 

Accessing page  33 

Accessing page  34 

Accessing page  35 

Accessing page  36 

Accessing page  37 

Accessing page  38 

Accessing page  39 

Accessing page  40 

Accessing page  41 

Accessing page  42 

Accessing page  43 

Accessin

Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,jl_aefa77662d5bfc1d,fulltime,analytics,6 days ago,Head of Customer Insight & Analytics,Rank Group,,Maidenhead,18 reviews,� Responsible for leading a team of Analysts a...
1,jl_719d91b70ae97f1c,fulltime,analytics,1 day ago,Head of Performance & Analytics,Barnet Clinical Commissioning Group,"£63,754 - £75,907 a year",Barnet,,"Reporting to the Director of QIPP, Planning an..."


In [32]:
import os

In [33]:
pwd

'C:\\Users\\zhixi\\Dropbox\\Postgraduate\\GA Data Immsersive\\Other jupyter notebook resources\\dsi-project-submission\\project-4_zhixin(jason)_wu'

In [40]:
df1=pd.read_csv('data/analyticsfulltime.csv')
df2=pd.read_csv('data/analyticspermanent.csv')

In [45]:
# test_df=pd.concat([df1,df2],ignore_index=False)
# test_df

In [46]:
# len(df1)

In [47]:
# len(df2)

In [48]:
# len(test_df)

In [49]:
df3=pd.read_csv('data/business intelligencefulltime.csv')
df4=pd.read_csv('data/business intelligencepermanent.csv')

In [50]:
df5=pd.read_csv('data/data analystfulltime.csv')
df6=pd.read_csv('data/data analystpermanent.csv')

In [51]:
df7=pd.read_csv('data/data engineerfulltime.csv')
df8=pd.read_csv('data/data engineerpermanent.csv')

In [52]:
df9=pd.read_csv('data/research scientistfulltime.csv')
df10=pd.read_csv('data/research scientistpermanent.csv')

In [53]:
final_df=pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10])

In [54]:
final_df.drop_duplicates(inplace=True)

In [59]:
final_df.to_csv('combined.csv')

In [57]:
final_df.head()

Unnamed: 0.1,Unnamed: 0,job_id,job_types,job_tag,post_time,job_title,company,salary,location,no_of_review,job_summary
0,0,jl_aefa77662d5bfc1d,fulltime,analytics,6 days ago,Head of Customer Insight & Analytics,Rank Group,,Maidenhead,18 reviews,� Responsible for leading a team of Analysts a...
1,1,jl_719d91b70ae97f1c,fulltime,analytics,1 day ago,Head of Performance & Analytics,Barnet Clinical Commissioning Group,"£63,754 - £75,907 a year",Barnet,,"Reporting to the Director of QIPP, Planning an..."
2,2,jl_d123655931ba246c,fulltime,analytics,43 minutes ago,Insights & Analytics Intern | 8 week summer pl...,Asos.com,,London,233 reviews,Developing a great understanding of modern ana...
3,3,jl_72f97ac8201bc2a2,fulltime,analytics,30+ days ago,Intern (Data Analytics),Keppel Corporation,,Keppel,3 reviews,Supports the Data Analytics team on project ba...
4,4,jl_bbd4d855dae0fede,fulltime,analytics,19 hours ago,Data & Analytics Analyst,RBS,,Belfast,"2,920 reviews",Strong analytic and problem solving abilities....


In [58]:
len(final_df)

9742