**Background Information** 

Together with a team of startup entrepreneurs, you decide to work on an idea that could change the way people search for jobs. You decide that job scraping could be the next big thing as there are actively many people looking for jobs in the country, in this case, Kenya. 
**Problem Statement** 

The problem is that there are many job listings which can not get visits for the target job seekers. While working in a team, your task as a data scientist for this project is to scrape for job titles and links and then put them in a single table that can be used by your team members to further build a job aggregator. You will be required to scrape for data from the following three technology webpages: 

● PigiaMe: https://www.pigiame.co.ke/it-software-jobs 

● MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology 

● KenyaJob: https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offr e_secteur%3A133 


## <font color='#2F4F4F'>Prerequisites</font>

In [1]:
# We first import the required libraries
# ---
#
import pandas as pd             # library for data manupation
import requests                 # library for fetching a web page 
from bs4 import BeautifulSoup   # library for extrating contents from a webpage 
from urllib3.exceptions import InsecureRequestWarning
from urllib3 import disable_warnings

disable_warnings(InsecureRequestWarning)

## <font color='#2F4F4F'>Step 1: Obtaining our Data</font>

In [2]:
# PigiaMe: https://www.pigiame.co.ke/it-software-jobs
# ---
#
pigia_me = requests.get('https://www.pigiame.co.ke/it-software-jobs')
pigia_me

<Response [200]>

In [3]:
# MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology
# ---
#
myjob_mag = requests.get('https://www.myjobmag.co.ke/jobs-by-field/information-technology')
myjob_mag

<Response [200]>

In [4]:
# KenyanJob: https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133
# ---
kenyan_job = requests.get('https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133')
kenyan_job

<Response [200]>

## <font color='#2F4F4F'>Step 2: Parsing</font>

In [5]:
# Parsing our document: pigia_me
# ---
#
pigia_me_soup= BeautifulSoup(pigia_me.text, "html.parser")

In [6]:
# Parsing our document: my_job_mag
# ---
#  
myjob_mag_soup = BeautifulSoup(myjob_mag.text,"html.parser")

In [7]:
# Parsing our document: kenyan_job
# ---
kenyan_job_soup= BeautifulSoup(kenyan_job.text, "html.parser")

## <font color='#2F4F4F'>Step 3: Extracting Required Elements</font>

In [9]:

# Each job is enclosed in 'div' class 'listings-cards__list-item'
pigia_me_job =  pigia_me_soup.find_all('div', class_= 'listings-cards__list-item')

# 1. Extracting job titles and links: pigia me
# ---

# # 1. Extracting job titles
#job_title_pigia_me  =  pigiame_job.find('div', class_='listing-card__header__title').get_text().replace('\n', '')


#Extract links

# links are in href of class 'listing-card listing-card--tab listing-card--has-content listing-card--highlight-placeholder'
#link_pigia_me = pigiame_job.find('div', class_='listing-card listing-card--tab listing-card--has-content listing-card--highlight-placeholder').a['href']


#create a list for link and title
pigia_me_link_title = []
pigia_me_link_url = []

# loop in through the pigiame website and colleact the link and title 

for p in pigia_me_job:
    # getting the title and links
    job_title_pigia_me  =  p.find('div', class_='listing-card__header__title').get_text().replace('\n', '')
    link_pigia_me = p.find('div', class_='listing-card listing-card--tab listing-card--has-content listing-card--highlight-placeholder').a['href']

    #appending the title and links to the list
    # Then appending the text to our link_content list
    pigia_me_link_title.append(job_title_pigia_me)
      # Then appending the text to our link_url list
    pigia_me_link_url.append(link_pigia_me)
    
print(pigia_me_link_title)
['Bioinformatic and Software Developer', 'IT Sales - Cybersecurity & Cloud', 'Executive, Support & Services', 'Data Engineer', 'Software Development Trainer', 'Oracle Database Administrator', 'UI/UX & Frontend Developer – Limuru, Kenya', 'IT Sales - Cybersecurity & Cloud', 'Wordpress and Shopify Web Designer', 'Entry level Software Developer']
# 2. Extracting job titles: myjob_mag
# ---
# 
#jobtitle_myjob = soup_myjob.find('div', {'class':'listing-card__header__title'})[2].get_text()

# Extracting job titles
#Jobtilte is under <li class="job-list-li">
#<ul>
#<li class="job-logo">
#<a href="/jobs-at/cannon-general-insurance-k-ltd"><img src="/company_logo/86/61466Cannon General Insurance (K) Ltd.png" alt="Cannon General Insurance (K) Ltd logo" width="100%" height="auto" title="Cannon General Insurance (K) Ltd logo"></a>
#</li>
#<li class="job-info">
#<ul>
#<li class="mag-b">
#<h2><a href="/job/systems-developer-cannon-general-insurance-k-ltd">Systems Developer at Cannon General Insurance (K) Ltd</a></h2>
#</li>
#the job title and job link can be derived from 'li', class_ = 'mag-b'

myjob = myjob_mag_soup.find_all('li', class_ ='mag-b')

#create a list for link and title
myjob_link_title = []
myjob_link_url = []


#Extracting my job links and title

for link in myjob :
    myjob_link = 'https://www.myjobmag.co.ke' + link.find('h2').a['href']
    myjob_title = link.find('h2').a.text
    
     # Then appending the text to our link_content list
    myjob_link_title.append(myjob_title)

    # Then appending the text to our link_url list
    myjob_link_url.append(myjob_link)
    
    print(myjob_link_url)
# 3. Extracting job titles: kenya_job
# ---
#
# Each job is enclosed in 'div' class 'job-description-wrapper'
kenyanjob_job =  kenyan_job_soup.find_all('div', class_= 'job-description-wrapper')

# 1. Extracting job titles and links: 
# ---

# # 1. Extracting job titles . Titles are in calss 'col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title'.h5.a.text

#kenyanjob_jobtitle  = kenyanjob_job.find('div', class_='col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title').h5.a.text

#Extract links

# links are in calss 'col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title'.h5.a['href']

#kenyanjob_joblink  = 'https://www.kenyajob.com' + kenyanjob_job.find('div', class_='col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title').h5.a['href']


#create a list for link and title
kenyajob_link_title = []
kenyajob_link_url = []
# loop in through the pigiame website and colleact the link and title 

for k in kenyanjob_job :
    # getting the title and links
    
    kenyanjob_jobtitle  = k.find('div', class_='col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title').h5.a.text
    kenyanjob_joblink  = 'https://www.kenyajob.com' + k.find('div', class_='col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title').h5.a['href']

    #appending the title and links to the list
    # Then appending the text to our link_content list
    kenyajob_link_title.append(kenyanjob_jobtitle)

    # Then appending the text to our link_url list
    kenyajob_link_url.append(kenyanjob_joblink)
    
print(kenyajob_link_title)


['VP, Data at Flutterwave', 'Process Engineer at DryGro', 'Analyst,MLU 3 at Church of Jesus Christ of Latter-day Saints', 'MEAN Stack Developer-MongoDB, AngularJS, and Node.js', 'Process Analyst at I&M Bank', 'ICT Support Technician at KCA University', 'Director, Data Products, Community Pass at MasterCard', 'Market Development Analyst', 'IT Officer - Systems', 'Engineering Lead']
['https://www.myjobmag.co.ke/job/information-technology-it-assistant-kakuzi-plc']
['https://www.myjobmag.co.ke/job/information-technology-it-assistant-kakuzi-plc', 'https://www.myjobmag.co.ke/job/python-software-engineer-ubuntu-hardware-certification-team-canonical-3']
['https://www.myjobmag.co.ke/job/information-technology-it-assistant-kakuzi-plc', 'https://www.myjobmag.co.ke/job/python-software-engineer-ubuntu-hardware-certification-team-canonical-3', 'https://www.myjobmag.co.ke/job/ict-intern-spear-the-science-for-africa-foundation']
['https://www.myjobmag.co.ke/job/information-technology-it-assistant-kaku

## <font color='#2F4F4F'>Step 4: Saving our Data</font>

In [10]:
# Saving the scraped contents in a dataframe and preview our data
# ---
#
# Saving the scraped contents in a dataframe and preview our data
# ---
#pigiame dataframe
df_pigiame = pd.DataFrame({"Title": pigia_me_link_title , "Link": pigia_me_link_url})
df_pigiame.head()

#pigiame dataframe
df_myjob = pd.DataFrame({"Title":  myjob_link_title , "Link": myjob_link_url})


#pigiame dataframe
df_kenyajob = pd.DataFrame({"Title":kenyajob_link_title , "Link": kenyajob_link_url})
# concatenate the tables
df_job_search = pd.concat([df_pigiame, df_myjob,df_kenyajob], axis=0)

df_job_search

Unnamed: 0,Title,Link
0,"VP, Data at Flutterwave",https://www.pigiame.co.ke/listings/vp-data-at-...
1,Process Engineer at DryGro,https://www.pigiame.co.ke/listings/process-eng...
2,"Analyst,MLU 3 at Church of Jesus Christ of Lat...",https://www.pigiame.co.ke/listings/analystmlu-...
3,"MEAN Stack Developer-MongoDB, AngularJS, and N...",https://www.pigiame.co.ke/listings/mean-stack-...
4,Process Analyst at I&M Bank,https://www.pigiame.co.ke/listings/process-ana...
5,ICT Support Technician at KCA University,https://www.pigiame.co.ke/listings/ict-support...
6,"Director, Data Products, Community Pass at Mas...",https://www.pigiame.co.ke/listings/director-da...
7,Market Development Analyst,https://www.pigiame.co.ke/listings/market-deve...
8,IT Officer - Systems,https://www.pigiame.co.ke/listings/it-officer-...
9,Engineering Lead,https://www.pigiame.co.ke/listings/engineering...
