<font color='#2F4F4F'>To use this notebook on Colaboratory, you will need to make a copy of it. Go to File > Save a Copy in Drive. You can then use the new copy that will appear in the new tab.</font>


# <font color='#2F4F4F'>AfterWork Data Science: Web Scraping with Python</font>

##Background Information
Together with a team of startup entrepreneurs, you decide to work on an idea that could
change the way people search for jobs. You decide that job scraping could be the next
big thing as there are actively many people looking for jobs in the country, in this case,
Kenya.
##Problem Statement
The problem is that there are many job listings which can not get visits for the target job
seekers. While working in a team, your task as a data scientist for this project is to
scrape for job titles and links and then put them in a single table that can be used by
your team members to further build a job aggregator.
You will be required to scrape for data from the following three technology webpages:

<br>
● PigiaMe: https://www.pigiame.co.ke/it-software-jobs<br>
● MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology<br>
● KenyaJob:
https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133<br>



##Success Criteria

The deliverable will be a single table containing job titles and links that can be used to further build a job aggregator

## <font color='#2F4F4F'>Prerequisites</font>

In [None]:
# We first import the required libraries
# ---
#
import pandas as pd             # library for data manupation
import requests                 # library for fetching a web page 
from bs4 import BeautifulSoup   # library for extrating contents from a webpage 

## <font color='#2F4F4F'>Step 1: Obtaining our Data</font>

In [None]:
# PigiaMe: https://www.pigiame.co.ke/it-software-jobs
# ---
#

pigia_url_list = [
  'https://www.pigiame.co.ke/it-software-jobs',
  'https://www.pigiame.co.ke/it-software-jobs?page=2',
  'https://www.pigiame.co.ke/it-software-jobs?page=3',
  'https://www.pigiame.co.ke/it-software-jobs?page=4'
]


In [None]:
# MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology
# ---
#
myj_url_list = [
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/2',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/3',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/4',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/5',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/6',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/7',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/8',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/9',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/10',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/11',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/12',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/13',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/14',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/15',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/16',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/17',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/18',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/19',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/20',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/21',
  'https://www.myjobmag.co.ke/jobs-by-field/information-technology/22'

]

myj_base_url = 'https://www.myjobmag.co.ke'


In [None]:
# KenyanJob: https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133
# ---
#

kj_url_list = [
  'https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133',
  'https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133&page=1',
  'https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133&page=2'

]

kj_base_url = 'https://www.kenyajob.com'


## <font color='#2F4F4F'>Step 2: Parsing & Extracting required elements</font>

<font color=green><i>NOTE: ....Solution notebook sections sequence was modified because the search results are all on multiple pages and 'for loop' is needed to parse and extract data from each website...</i></font>

####pigiame scraper....

In [None]:
#temp data holders
job_titles  = list()
job_links = list()

In [None]:
# Parsing our document: pigia_me
# ---
# 
for u in pigia_url_list:
  #Fetch page 
  pigia_me = requests.get(u)
  
  # Parsing data using BeautifulSoup
  soup1 = BeautifulSoup(pigia_me.text, "html.parser")

  #Extract the needed elements
  tit1 = soup1.find_all('div', {'class': 'listing-card__header__title'})
  lin1 = soup1.find_all('a', {'class': "listing-card__inner"})

  for t in tit1:
    job_titles.append(t.get_text()) 

  for l in lin1:
    job_links.append(l.get('href'))
  


In [None]:
#filter out unwanted text ['\n']
t2 = list()
for t in job_titles:
    t2.append(t.replace("\n", ""))
#t2

In [None]:
#Load data into a df
df = pd.DataFrame({"Job Title": t2, "Link": job_links})
pd.set_option('display.max_colwidth', None)
df

Unnamed: 0,Job Title,Link
0,Senior Engineering Manager,https://www.pigiame.co.ke/listings/senior-engineering-manager-5453736
1,Senior Cloud Infrastructure Engineer,https://www.pigiame.co.ke/listings/senior-cloud-infrastructure-engineer-5453727
2,Junior Research Assistant-Data and Information Systems,https://www.pigiame.co.ke/listings/junior-research-assistant-data-and-information-systems-5453464
3,Research Associate-Data and Information Systems,https://www.pigiame.co.ke/listings/research-associate-data-and-information-systems-5453442
4,Data Scientist,https://www.pigiame.co.ke/listings/data-scientist-5451538
5,AI Software Engineer,https://www.pigiame.co.ke/listings/ai-software-engineer-5451531
6,Data Analytics Engineer,https://www.pigiame.co.ke/listings/data-analytics-engineer-5451518
7,Data Manager,https://www.pigiame.co.ke/listings/data-manager-5447874
8,Senior Software Engineer,https://www.pigiame.co.ke/listings/senior-software-engineer-5439676
9,Data Analyst - Kisumu,https://www.pigiame.co.ke/listings/data-analyst-kisumu-5437315


####my_job_mag scraper....

In [None]:
#temp data holders
job_titles  = list()
job_links = list()

In [None]:
for u in myj_url_list:
  #Fetch page
  source = requests.get(u)
  
  # Parsing data using BeautifulSoup
  soup1 = BeautifulSoup(source.text, "html.parser")

  #Extract the needed elements
  tit1 = soup1.find_all('li', {'class': "mag-b"})

  for t in tit1:
    job_titles.append((t.get_text()).replace("\n", ""))
    job_links.append(myj_base_url + t.a.get('href'))
 

In [None]:
df2 = pd.DataFrame({"Job Title": job_titles, "Link": job_links})
pd.set_option('display.max_colwidth', None)
df2

Unnamed: 0,Job Title,Link
0,M-Pesa Africa - Digital Workspace Engineer at Safaricom Kenya,https://www.myjobmag.co.ke/job/m-pesa-africa-digital-workspace-engineer-safaricom-kenya
1,Senior UX Researcher at Wasoko,https://www.myjobmag.co.ke/job/senior-ux-researcher-wasoko
2,ICT Sales Representative at HCS Africa,https://www.myjobmag.co.ke/job/hcs-africa-hcs-africa-1
3,"Cybersecurity Specialist, Security Awareness at KCB Bank Kenya",https://www.myjobmag.co.ke/job/cybersecurity-specialist-security-awareness-kcb-bank-kenya-1
4,Change Evaluation Analyst at KCB Bank Kenya,https://www.myjobmag.co.ke/job/change-evaluation-analyst-kcb-bank-kenya-2
...,...,...
391,Hardware QA Engineer at Koko Networks,https://www.myjobmag.co.ke/job/hardware-qa-engineer-koko-networks-2
392,Support Engineer Internship at Koko Networks,https://www.myjobmag.co.ke/job/support-engineer-internship-koko-networks
393,Acquiring Implementation & Tech Support at Absa Bank Limited,https://www.myjobmag.co.ke/job/acquiring-implementation-tech-support-absa-bank-limited-2
394,Solution Architect Area Chapter Lead at Safaricom Kenya,https://www.myjobmag.co.ke/job/solution-architect-area-chapter-lead-safaricom-kenya


####kenyan_job scraper....

In [None]:
#temp data holders
job_titles  = list()
job_links = list()

In [None]:
for u in kj_url_list:
  #Fetch the page
  source = requests.get(u)
  
  # Parsing data using BeautifulSoup
  soup1 = BeautifulSoup(source.text, "html.parser")

  #Extract the needed elements
  tit1 = soup1.find_all('div', {'class': "col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title"})

  for t in tit1:
    job_titles.append(t.a.get_text())
    job_links.append(kj_base_url + t.a.get('href'))


In [None]:
df3 = pd.DataFrame({"Job Title": job_titles, "Link": job_links})
pd.set_option('display.max_colwidth', None)
df3

Unnamed: 0,Job Title,Link
0,JAVA EE / JAVA 8 Developer with SQL Skills,https://www.kenyajob.com/job-vacancies-kenya/java-ee-java-8-developer-sql-skills-130458
1,Senior Freelance Web Designer,https://www.kenyajob.com/job-vacancies-kenya/senior-freelance-web-designer-130459
2,Sales and Marketing Agent,https://www.kenyajob.com/job-vacancies-kenya/sales-marketing-agent-131641
3,Accountant/Administrator,https://www.kenyajob.com/job-vacancies-kenya/accountantadministrator-126803
4,CCTV and Fire Alarms Systems Technician,https://www.kenyajob.com/job-vacancies-kenya/cctv-fire-alarms-systems-technician-127106
5,CCTV and Fire Alarms Systems Technician,https://www.kenyajob.com/job-vacancies-kenya/cctv-fire-alarms-systems-technician-127107
6,Information Technology Sales Specialist,https://www.kenyajob.com/job-vacancies-kenya/information-technology-sales-specialist-129253
7,AWS Cloud Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/aws-cloud-architect-mf-129511
8,AWS Solutions Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/aws-solutions-architect-mf-129512
9,AZURE Solutions Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/azure-solutions-architect-mf-129513


## <font color='#2F4F4F'>Step 3: Saving our Data</font>

In [None]:
# Saving the scraped contents in a dataframe and preview our data
# ---
#
full_df = pd.concat([df, df2, df3], ignore_index=True)
full_df.head(30)


Unnamed: 0,Job Title,Link
0,Senior Engineering Manager,https://www.pigiame.co.ke/listings/senior-engineering-manager-5453736
1,Senior Cloud Infrastructure Engineer,https://www.pigiame.co.ke/listings/senior-cloud-infrastructure-engineer-5453727
2,Junior Research Assistant-Data and Information Systems,https://www.pigiame.co.ke/listings/junior-research-assistant-data-and-information-systems-5453464
3,Research Associate-Data and Information Systems,https://www.pigiame.co.ke/listings/research-associate-data-and-information-systems-5453442
4,Data Scientist,https://www.pigiame.co.ke/listings/data-scientist-5451538
5,AI Software Engineer,https://www.pigiame.co.ke/listings/ai-software-engineer-5451531
6,Data Analytics Engineer,https://www.pigiame.co.ke/listings/data-analytics-engineer-5451518
7,Data Manager,https://www.pigiame.co.ke/listings/data-manager-5447874
8,Senior Software Engineer,https://www.pigiame.co.ke/listings/senior-software-engineer-5439676
9,Data Analyst - Kisumu,https://www.pigiame.co.ke/listings/data-analyst-kisumu-5437315


In [None]:
#preview df tail data
full_df.tail(20)

Unnamed: 0,Job Title,Link
468,RUBY ON RAILS Developer (M/F),https://www.kenyajob.com/job-vacancies-kenya/ruby-on-rails-developer-mf-129539
469,SWIFT Developer (M/F),https://www.kenyajob.com/job-vacancies-kenya/swift-developer-mf-129540
470,Test and Validation Engineer (M/F),https://www.kenyajob.com/job-vacancies-kenya/test-validation-engineer-mf-129541
471,Test Automation Engineer (M/F),https://www.kenyajob.com/job-vacancies-kenya/test-automation-engineer-mf-129542
472,UX / UI Designer (M/F),https://www.kenyajob.com/job-vacancies-kenya/ux-ui-designer-mf-129543
473,Software Developer,https://www.kenyajob.com/job-vacancies-kenya/software-developer-130123
474,Freelance Product Photographers,https://www.kenyajob.com/job-vacancies-kenya/freelance-product-photographers-130438
475,Motion Graphic Designer,https://www.kenyajob.com/job-vacancies-kenya/motion-graphic-designer-130876
476,OpenStack Technical Support Manager,https://www.kenyajob.com/job-vacancies-kenya/openstack-technical-support-manager-129771
477,Senior Web Developer - Workplace Engineering,https://www.kenyajob.com/job-vacancies-kenya/senior-web-developer-workplace-engineering-129772
