# Scraping jobs on Careers@Gov
Careers@Gov is a career portal to become a public servant in the Government of Singapore.
This career portal is relevant to get jobs in the government, which is usually not posted in normal web pages.

## Get the job URLs
First, we will collect all the URLs from the search pages.

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
links = []
N_JOBS = 1832 #Number of available jobs on the portal. Change accordingly.
PAGES = N_JOBS // 20 + 1

In [3]:
# Index starts from 1
for i in range(1, PAGES + 1):
    search_page = requests.get('https://careers.pageuppeople.com/688/cwlive/en/listing/?page={}'.format(i))
    search_page_soup = BeautifulSoup(search_page.text, 'lxml')
    links.extend(list(map(lambda link: "https://careers.pageuppeople.com" + link["href"], search_page_soup.find_all("a", {"class": "job-link"}))))

## Get the job pages
After collecting the URLs, download the job pages

In [4]:
links = list(set(links))
pages = []
got_links = []

In [5]:
def getPage(link):
    try:
        req = requests.get(link)
        got_links.append(link)
        return req
    except:
        return None

In [6]:
pages = list(filter(lambda page: page, map(getPage, links)))
link = got_links

## Convert the job pages into Beautiful Soup

In [7]:
texts = list(map(lambda response: BeautifulSoup(response.text, 'lxml'), pages))

## Get information from job pages

Not all organisations are stated. We have a failsafe here.

In [8]:
def findOrg(txt):
    try:
        ans = txt.find("div", {"class": "jobDetails"}).find_all("span")[1].text
        return ans
    except: 
        return ""

In [9]:
job_title = list(map(lambda txt: txt.find("div", {"id": "job-content"}).find("h2").text, texts))
organisation = list(map(lambda txt: findOrg(txt), texts))
job_code = list(map(lambda txt: txt.find("span", {"class": "job-externalJobNo"}).text, texts))
work_type = list(map(lambda txt: txt.find("span", {"class": "work-type"}).text, texts))
location = list(map(lambda txt: txt.find("span", {"class": "location"}).text, texts))
category = list(map(lambda txt: txt.find("span", {"class": "categories"}).text, texts))
job_description = list(map(lambda txt: txt.find("div", {"id": "job-details"}).text, texts))
date_retrieved = list(map(lambda txt: txt.find("span", {"class": "open-date"}).text, texts))

# Save as Dataframe

In [10]:
import pandas as pd

In [11]:
col = {'Job Title':job_title, 'Organisation': organisation, 
      'Job Code': job_code, 'work_type': work_type,
      'Location': location, 'Category': category,
      'Job Description': job_description, 'Date Retrieved': date_retrieved,
      'Link': link}

In [12]:
jobs = pd.DataFrame(col)

In [13]:
jobs.to_csv('careergov.csv', index=False)