# Creating a Job Board with Beautiful Soup

## Introduction and Objetives

#### The objective of this project is to create a Job Board with information on job offers that appear on the Internet. 

#### For the sake of the argument, let’s say I am interested on “Data Analysis” job offers in Japan and I want to get the data out from two different known job sites.

## Importing Libraries

In [1]:
import requests 
import pandas as pd
import numpy as np  
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import time

## Iteration for Scraping Job Sites - Data Analysis

In [2]:
web_analysis = {'CareerCross' : 'https://www.careercross.com/en/job-search/result/67335683', 
                    'DaiJob' : 'https://www.daijob.com/en/jobs/search_result?job_search_form_hidden=1&keywords=Data+Analyst'} 

titles = []
urls = []
updates = []
locations = []
jobs = []
salaries = []
experiences = []
careers = []
english_level = []
japanese_level=[]
educations=[]
visas = []
skills= []
    
for w,j in web_analysis.items():
    
    page = 0
    
    while True:

        time.sleep(1)

        r = requests.get(j, params = {"page" : page+1})
        if 'No jobs were found that matched your search.' in r.text or r.status_code != 200:
            break
        else:
            html = r.content
            soup = BeautifulSoup(html, "lxml")
            print('\033[1m' + '{0}, page {1}'.format(w,page+1) + '\033[0m')
            

            if w == 'CareerCross':
                titles_r1 = [t.text.strip().replace('\n', '') for t in soup.find_all('a', {'class': 'job-details-url'})] #title
                titles_r1 = list(map(lambda t: titles.append(t), titles_r1))
                full_url = [urljoin(j,r) for r in [l.get('href') for l in soup.find_all('a', {'class': "btn btn-lg-14 btn-primary"})]] #url
                full_urls = list(map(lambda f: urls.append(f), full_url))
                i = 0
                for f in full_url: #going to each page
                    time.sleep(1) 
                    r = requests.get(f)
                    if r.status_code == 200:
                        print('#URL {0}: {1}'.format(i+1, f))
                    else:
                        print('ERROR {0}: Skipping #URL{1}: {2}'.format(r.status_code, i+1, f))
                        i = i+1
                        continue
                    r_html = r.content
                    r_soup = BeautifulSoup(r_html, 'lxml')
                    updates.append(r_soup.find_all('span', {'id':"jsonld-date-posted"})[0].text.strip()) #date of update
                    locations.append(r_soup.find_all('span', {'id': 'jsonld-job-location'})[0].text.strip()) #location
                    try:
                        job = r_soup.find_all('span', {'id': 'jsonld-employment-type'})[0].text.strip() #type of job
                    except IndexError:
                        jobs.append(np.nan)
                    else:
                        jobs.append(job)
                    salaries.append(r_soup.find_all('span', {'id': 'jsonld-base-salary'})[0].text.strip()) #salary
                    experiences.append(r_soup.find_all('span',  {'id' : "jsonld-experience-requirements"})[0].text.strip()) #experience
                    careers.append(r_soup.find_all('span',  {'id' : "jsonld-experience-requirements"})[1].text.strip()) #career
                    english_level.append(r_soup.find_all('span', {'id': "skill-english-text"})[0].text.strip()) #english
                    japanese_level.append(r_soup.find_all('span', {'id': "skill-japanese-text"})[0].text.strip()) #japanese
                    educations.append(r_soup.find_all('span', {'id': "jsonld-education-requirements"})[0].text.strip()) #education
                    visas.append(r_soup.find_all('span' , {'id' : "qualifications-visa-status"})[0].text.strip()) #visa
                    try:
                        skill=r_soup.find_all('span', {'id': 'qualifications-required-skills'})[0].text.strip() #skill description
                    except IndexError:
                        try:
                            skill = [s.find_all('ul') for s in r_soup.find_all('span', {'id':"jsonld-description"})][0][2].text.strip()
                            if len(skill) < 100:
                                skills.append([s.find_all('ul') for s in r_soup.find_all('span', {'id':"jsonld-description"})][0][1].text.strip())
                            else: 
                                skills.append(skill)
                        except:
                            skills.append(r_soup.find_all('span', {'id':"jsonld-description"})[0].text.strip())
                    else:
                        skills.append(skill)
                    i=i+1
                
            elif w == 'DaiJob':
                titles_r1 = [t.text.strip() for t in soup.find_all('a', {'id': '_job'})] #title
                titles_r1 = list(map(lambda t: titles.append(t), titles_r1))
                full_url = [urljoin(j,l) for l in [l.get('href') for l in soup.find_all('a', {'id': '_job'})]] #url
                full_urls = list(map(lambda f: urls.append(f), full_url))
                i = 0
                for f in full_url: #going to each page
                    time.sleep(1) 
                    r = requests.get(f)
                    if r.status_code == 200:
                        print('#URL {0}: {1}'.format(i+1, f))
                    else:
                        print('ERROR {0}: Skipping #URL{1}: {2}'.format(r.status_code, i+1, f))
                        i = i+1
                        continue
                    r_html = r.content
                    r_soup = BeautifulSoup(r_html, 'lxml')
                    updates.append(r_soup.find_all('span', {'class': 'roboto'})[1].text) #date of update
                    try:
                        loc = r_soup.find_all('td')[3].text.split('\n')[3].strip() #location
                    except IndexError:
                        locations.append(r_soup.find_all('td')[4].text.split('\n')[3].strip())
                    else:
                        locations.append(loc)
                    if 'Job Contract' in r_soup.find_all('tr')[-1].text: #type of job
                        jobs.append(r_soup.find_all('td')[-1].text.strip())
                    elif 'Job Contract' in r_soup.find_all('tr')[-2].text:
                        jobs.append(r_soup.find_all('td')[-2].text.strip())
                    else:
                        jobs.append(np.nan) 
                    for s in r_soup.find_all('td'): #salary
                        if s.find('a') and 'JPY' in s.text or s.find('a') and 'Depends on experience' in s.text:
                            salaries.append(s.text.strip())
                    experiences.append(np.nan) #experience
                    careers.append(r_soup.find('div', class_ = 'recruit_level').text.strip()) #career
                    if 'English Level' in r_soup.find('div', class_="jobs_box jobs_box_detail mb25").text: #english
                        for x in range(20):
                            if 'English Level' in r_soup.find_all('tr')[x].text:
                                english_level.append(r_soup.find_all('td')[x].text.strip())  
                                break
                    else:
                        english_level.append(np.nan)
                    if 'Japanese Level' in r_soup.find('div', class_="jobs_box jobs_box_detail mb25").text: #japanese
                        for z in range(20):
                            if 'Japanese Level' in r_soup.find_all('tr')[z].text:
                                japanese_level.append(r_soup.find_all('td')[z].text.strip())  
                                break
                    else:
                        japanese_level.append(np.nan)  
                    educations.append(np.nan) #education
                    visas.append(np.nan) #visa
                    if 'Job Requirements' in r_soup.find('div', class_="jobs_box jobs_box_detail mb25").text: #japanese
                        for z in range(20):
                            if 'Job Requirements' in r_soup.find_all('tr')[z].text:
                                skills.append(r_soup.find_all('td')[z].text.strip().replace("\n", " "))  
                                break
                    else:
                        skills.append(np.nan)  
                    i=i+1
            page +=1

[1mCareerCross, page 1[0m
#URL 1: https://www.careercross.com/en/job/detail-1054674?sid=67335683&page=1
#URL 2: https://www.careercross.com/en/job/detail-1066517?sid=67335683&page=1
#URL 3: https://www.careercross.com/en/job/detail-944095?sid=67335683&page=1
#URL 4: https://www.careercross.com/en/job/detail-1084723?sid=67335683&page=1
#URL 5: https://www.careercross.com/en/job/detail-1015848?sid=67335683&page=1
#URL 6: https://www.careercross.com/en/job/detail-1094338?sid=67335683&page=1
#URL 7: https://www.careercross.com/en/job/detail-907992?sid=67335683&page=1
#URL 8: https://www.careercross.com/en/job/detail-1083212?sid=67335683&page=1
#URL 9: https://www.careercross.com/en/job/detail-1085829?sid=67335683&page=1
#URL 10: https://www.careercross.com/en/job/detail-1068392?sid=67335683&page=1
[1mCareerCross, page 2[0m
#URL 1: https://www.careercross.com/en/job/detail-978587?sid=67335683&page=2
#URL 2: https://www.careercross.com/en/job/detail-1036420?sid=67335683&page=2
#URL 3: ht

## Create the Data Frame

In [3]:
columns = {'Title': titles, 'URL': urls, 'Update': updates, 'Location': locations, 'Type of job': jobs, 'Salary': salaries, 
           'Experience needed': experiences, 'Career': careers, 'English Level': english_level, 'Japanese Level': japanese_level,
          'Education': educations, 'Visa': visas, 'Skill Description': skills}


df_analyst = pd.DataFrame(columns)

def make_clickable(val):
    return '<a href="{0}">{0}</a>'.format(val)


pd.reset_option('display.max_rows', None)
pd.reset_option('display.max_columns', None)
pd.reset_option('display.width', None)
pd.reset_option('display.max_colwidth', -1)

df_analyst.style.format({'URL' :make_clickable})


Unnamed: 0,Title,URL,Update,Location,Type of job,Salary,Experience needed,Career,English Level,Japanese Level,Education,Visa,Skill Description
0,【Global Insurance】Data Scientist (Data Analyst),https://www.careercross.com/en/job/detail-1054674?sid=67335683&page=1,"October 28th, 2020",Tokyo - 23 Wards,Permanent Full-time,9 million yen ~ 11 million yen,Over 3 years,Mid Career,Daily Conversation,Fluent,Technical/Vocational College,Permission to work in Japan required,"･ Experience as a Data Scientist/Data Analyst･ Insurance industry experience mandatory･ Proficiency in Python, R, SAS, VBA and Microsoft Office Suite･ Technical, analytical, excellent problem solving skills with attention to details･ Ability to work independently with minimal supervision & team player･ Business oriented with business acumen"
1,Data Analyst,https://www.careercross.com/en/job/detail-1066517?sid=67335683&page=1,"November 3rd, 2020",Tokyo - 23 Wards,Permanent Full-time,10 million yen ~ 11 million yen,No experience,Mid Career,Business Level,Fluent,Bachelor's Degree,No permission to work in Japan required,"- A degree in Computing, Mathematics, Statistics or Engineering- Minimum 2 years working experience in a programming or data analyst role- Technical skills: Python, R, Java, Matlab, VBA- Good communication skills- Business level English and Japanese languages- Team player- Ability to perform under pressure- Good reporting skills"
2,Data Analyst,https://www.careercross.com/en/job/detail-944095?sid=67335683&page=1,"October 29th, 2020",Tokyo - 23 Wards,Permanent Full-time,"Negotiable, based on experience",Over 1 year,Mid Career,Daily Conversation,Business Level,Bachelor's Degree,Permission to work in Japan required,"■　Python, C++, Ruby, or Golang■　Experience in Cloud-based environments or container application platforms■　2+ years of experience working as a Data Analyst, preferably at an IT company or in healthcare or medical sector■　Experience doing Data Analysis, requirement gathering, and documentation■　Able to work on multiple projects/clients simultaneously■　Fluent communication skills in Japanese, English a plus■　Ability to interact professionally with a diverse group, Users, managers, vendor touch points etc. Preferred:■　a personal interest in health and fitness (company has its own gym that employees can use), or sports-oriented/active professionalsFor a quick consultation about this role and other roles at Next Move, please contact eri.aldea@nextmove.co.jp"
3,Data Analyst,https://www.careercross.com/en/job/detail-1084723?sid=67335683&page=1,"October 26th, 2020","Tokyo - 23 Wards, Minato-ku",Permanent Full-time,"Negotiable, based on experience",Over 6 years,Mid Career,Business Level,Business Level,Associate Degree/Diploma,Permission to work in Japan required,"Relevant Experience Required 8 to 10 yrs.Total IT Experience Minimum 8 to 10 yrs.8+ years direct experience in analyzing and deriving data governance, metadata management, data architecture, data quality and metadata related outputStrong experience in different type of “Data Analysis” covering business data, metadata, master data, analytical data.Strong hands-on technical experience in Scala, Hadoop or related technologies.Programming/Technical experience in working on technical platforms"
4,【Native/Web Game】Data Analyst,https://www.careercross.com/en/job/detail-1015848?sid=67335683&page=1,"October 30th, 2020",Tokyo - 23 Wards,Permanent Full-time,4.5 million yen ~ 7 million yen,Over 1 year,Mid Career,,Fluent,Bachelor's Degree,Permission to work in Japan required,【必須要件】 ・課題解決を推進するための資料作成、及びプレゼンテーション能力 ・課題解決へと導くための他部署との円滑なコミュニケーション能力 【歓迎要件】 ・ゲームやエンタメコンテンツに対する知識 ・データ分析業務経験 ・デジタル領域におけるコンサルティング業務経験 ・SQLによるデータ集計経験
5,NewData Analyst (Any programming language OK)｜Data Analysis&AI,https://www.careercross.com/en/job/detail-1094338?sid=67335683&page=1,"November 6th, 2020",Tokyo - 23 Wards,Permanent Full-time,4 million yen ~ 7 million yen,Over 3 years,Mid Career,Daily Conversation,Business Level,Bachelor's Degree,Permission to work in Japan required,"【Requirements】 ・Development experience using any programming language ・Proposal material creation and proposal experience for client companies ・Basic skills such as Excel, PowerPoint, Word, Access, Google Analytics ・Person with Japanese qualification N1[Welcome ] ・Experience extracting data from DB using SQL, Big Query, Red Shift, etc. ・SPSS, SAS, R, Python data analysis and reporting experience ・Development experience in SQL, Python, R language ・Business experience in marketing research and market research ・Knowledge related to statistics and machine learning"
6,(URGENT) Data Analyst / データアナリスト,https://www.careercross.com/en/job/detail-907992?sid=67335683&page=1,"November 4th, 2020",Tokyo - 23 Wards,Permanent Full-time,"Negotiable, based on experience",Over 1 year,Entry Level,Business Level,Business Level,Bachelor's Degree,Permission to work in Japan required,"■ 2 years + experience analysing data using Python and SQL■ Web service development expeirence■ Business level Japanese■ Strong teamwork, passion in learning For further details please feel free to contact me directly: matteo.giuberto@nextmove.co.jp / 03-4580-6615"
7,Data Analyst (ITS),https://www.careercross.com/en/job/detail-1083212?sid=67335683&page=1,"November 5th, 2020",Tokyo - 23 Wards,Contract,"Negotiable, based on experience",Over 1 year,Mid Career,Business Level,Business Level,Bachelor's Degree,Permission to work in Japan required,大手企業にてBAポジションを募集しております！ 企業概要 大手企業にてBAポジションを募集しております！ 業務内容 IT部門所属のビジネスアナリストとして、ビジネス部門と協働し、ビジネス部門の課題及びニーズを把握･分析し、ITでどう解決ができるのかを企画・提案していきます。 1.ビジネス部門の課題及びニーズの把握･分析 2.ITによる解決策の提案及び予算取得 3.製品・ベンダー選定 4.プロジェクトの計画・実行 5.新業務の定着化支援 応募条件 必須： l システム設計･開発の経験 をお持ちの方 l 業務分析の経験 をお持ちの方 l 企画及び提案の経験 をお持ちの方 l ビジネス英語力（TOEIC750目安） ご応募の際は 最新の職務経歴書をご送付の上ご連絡お待ちしております。
8,Data Analyst (ITS),https://www.careercross.com/en/job/detail-1085829?sid=67335683&page=1,"October 29th, 2020",Tokyo - 23 Wards,Contract,"Negotiable, based on experience",Over 1 year,Mid Career,Business Level,Business Level,Bachelor's Degree,Permission to work in Japan required,大手企業にてBAポジションを募集しております！ 企業概要 大手企業にてBAポジションを募集しております！ 業務内容 IT部門所属のビジネスアナリストとして、ビジネス部門と協働し、ビジネス部門の課題及びニーズを把握･分析し、ITでどう解決ができるのかを企画・提案していきます。 1.ビジネス部門の課題及びニーズの把握･分析 2.ITによる解決策の提案及び予算取得 3.製品・ベンダー選定 4.プロジェクトの計画・実行 5.新業務の定着化支援 応募条件 必須： l システム設計･開発の経験 をお持ちの方 l 業務分析の経験 をお持ちの方 l 企画及び提案の経験 をお持ちの方 l ビジネス英語力（TOEIC750目安） ご応募の際は 最新の職務経歴書をご送付の上ご連絡お待ちしております。
9,データ分析エンジニア/Data Analyst Engineer wanted!,https://www.careercross.com/en/job/detail-1068392?sid=67335683&page=1,"November 6th, 2020",Tokyo - 23 Wards,Permanent Full-time,"Negotiable, based on experience ~ 6.5 million yen",Over 3 years,Mid Career,Basic,Business Level,Bachelor's Degree,Permission to work in Japan required,"Required:● Japanese: Business level for client facing work (JLPT N1)● Data Analysis experience with Python, SQL, R, SAS ect ● Statistics, Data analysis knowledge Welcome● System development experience ● Experience with analysis tools such as Azure ML, SAS, SPSSThis opportunity is suited for those who want to learn new skills and work on a variety of tasks and are comfortable talking with Japanese clients/end users in high level Japanese."


In [4]:
df_analyst.shape

(122, 13)

#### More information on this project on: https://jgarciaportillo.medium.com/web-scraping-project-create-a-job-board-with-beautiful-soup-65d4b1a498fb