# Scraping Job Offers From "getonboard"

- Get on Board is platform to find and apply to jobs in startups and tech companies.
- Used tools: Python, Pandas, requests, Beautiful Soup.

### Steps:
- Scrape https://www.getonbrd.com/
- Get a list of links to job fields
- For each link, get the items (job offers)
- Visit each job offer of items and get the:
    - Company Name
    - Gross Salary
    - Qualification
    - Job Title
    - Location
    - Working Hours
    - Published Date
    - Job Field Name
    - Job Offer Link
- Make a list of all job offers
- Make a DataFrame from the list
- Export to a csv format

In [3]:
# Import the packages
import requests
import lxml
from bs4 import BeautifulSoup

In [47]:
# Request the main url
url = 'https://www.getonbrd.com'
r = requests.get(url)

In [48]:
# Check the request
print(r.status_code)
print(type(r.content))

200
<class 'bytes'>


In [50]:
# Parse the request content
soup = BeautifulSoup(r.content, 'lxml')

In [54]:
# Get the tags of the job fields
sections = soup.find_all('a', attrs = {'class': 'gb-tags__item'})
sections

[<a class="gb-tags__item" href="/jobs/design-ux">Design / UX</a>,
 <a class="gb-tags__item" href="/jobs/programming">Programming</a>,
 <a class="gb-tags__item" href="/jobs/data-science-analytics">Data Science / Analytics</a>,
 <a class="gb-tags__item" href="/jobs/mobile-developer">Mobile Development</a>,
 <a class="gb-tags__item" href="/jobs/digital-marketing">Digital Marketing</a>,
 <a class="gb-tags__item" href="/jobs/sysadmin-devops-qa">SysAdmin / DevOps / QA</a>,
 <a class="gb-tags__item" href="/jobs/sales">Sales</a>,
 <a class="gb-tags__item" href="/jobs/innovation-agile">Product, Innovation &amp; Agile</a>]

In [58]:
# Make a list of the relative paths to the job fields
links= []
for s in sections:
    links.append(s.get('href'))

links

['/jobs/design-ux',
 '/jobs/programming',
 '/jobs/data-science-analytics',
 '/jobs/mobile-developer',
 '/jobs/digital-marketing',
 '/jobs/sysadmin-devops-qa',
 '/jobs/sales',
 '/jobs/innovation-agile']

In [165]:
# Function to extract the required attributes
def job_search(items):
    
    jobs = []
    
    # "Items" is the main container for job offers
    for item in items:

        # Get the link to the job offer page
        link = item.get('href')
        
        # Request the job offer url
        r = requests.get(link)
        
        # Parse the url content
        soup = BeautifulSoup(r.content, 'lxml')

        
        # Search for the required attributes
        company = soup.find('strong', attrs={'itemprop': 'name'}).get_text()
        date = soup.find('time').get_text()
        title = soup.find('span', attrs={'itemprop': 'title'}).text
        location = soup.find('span', attrs={'class': 'location'}).get_text()
        try:
            qualification = soup.find('span', attrs={'itemprop': 'qualifications'}).text
        except:
            qualification = 'Unknown'
        working_hours = soup.find('h2', attrs={'class': 'size1 mb-3 w400 lh2'}).text



        try:
            gross_salary = soup.find('span', attrs={'itemprop': 'baseSalary'}).text
        except:
            gross_salary = 'Unknown'


        # Append a dictionary of the attributes for each job to the list "Jobs"
        jobs.append(
            {
                'company': company,
                'date': date.replace('\n', ''),
                'title': title.replace('\n', ''),
                'location': location.replace('\n', '').replace('\xa0', ''),
                'qualification': qualification.replace('\n', ''),
                'working_hours': working_hours.replace('\n', '').replace('\xa0', '').split('|')[-2],
                'job_field': working_hours.replace('\n', '').replace('\xa0', '').split('|')[-1],
                'link': link,
                'gross_salary': gross_salary.replace('\n', '')
            }
        )
        
    # Return the list
    return jobs

In [175]:
# Empty list
jobs = []

# For each job field URL apply the function "search_job"
for link in links:
    r = requests.get(url+link)
    s = BeautifulSoup(r.content, 'lxml')
    
    # "Items" is the main container for job offers
    items = s.find_all('a', attrs={'class': 'color-hierarchy2 gb-results-list__item'})
    
    # Extend the final list with the results of each loop
    jobs.extend(job_search(items))
    

In [176]:
# Number of jobs 
len(jobs)

695

In [178]:
# Overview
jobs[:5]

[{'company': 'MATCH · Agencia-consultora',
  'date': 'November 11, 2022',
  'title': 'Design UX/UI Semi Senior',
  'location': 'Santiago(hybrid)',
  'qualification': 'Semi Senior',
  'working_hours': 'Full time',
  'job_field': 'Design / UX',
  'link': 'https://www.getonbrd.com/jobs/design-ux/desing-ux-ui-semi-senior-match-agencia-consultora-santiago',
  'gross_salary': 'Gross Salary$1600 - 2200USD/month'},
 {'company': '2Brains',
  'date': 'November 11, 2022',
  'title': 'Practicante Digital Designer',
  'location': 'Santiago(hybrid)',
  'qualification': 'Unknown',
  'working_hours': 'Internship',
  'job_field': 'Design / UX',
  'link': 'https://www.getonbrd.com/jobs/design-ux/practicante-digital-designer-2brains-santiago',
  'gross_salary': 'Unknown'},
 {'company': '2Brains',
  'date': 'November 11, 2022',
  'title': 'Product Designer Senior',
  'location': 'Santiago(hybrid)',
  'qualification': 'Senior',
  'working_hours': 'Full time',
  'job_field': 'Design / UX',
  'link': 'https:

In [179]:
import pandas as pd

In [183]:
# Convert the list into a DataFrame
df = pd.DataFrame(jobs)
df.head()

Unnamed: 0,company,date,title,location,qualification,working_hours,job_field,link,gross_salary
0,MATCH · Agencia-consultora,"November 11, 2022",Design UX/UI Semi Senior,Santiago(hybrid),Semi Senior,Full time,Design / UX,https://www.getonbrd.com/jobs/design-ux/desing...,Gross Salary$1600 - 2200USD/month
1,2Brains,"November 11, 2022",Practicante Digital Designer,Santiago(hybrid),Unknown,Internship,Design / UX,https://www.getonbrd.com/jobs/design-ux/practi...,Unknown
2,2Brains,"November 11, 2022",Product Designer Senior,Santiago(hybrid),Senior,Full time,Design / UX,https://www.getonbrd.com/jobs/design-ux/produc...,Unknown
3,2Brains,"November 11, 2022",UX Designer,Santiago(hybrid),Semi Senior,Full time,Design / UX,https://www.getonbrd.com/jobs/design-ux/ux-des...,Unknown
4,2Brains,"November 11, 2022",Email MKT Designer Semi Senior,Santiago(hybrid),Semi Senior,Full time,Design / UX,https://www.getonbrd.com/jobs/design-ux/email-...,Unknown


In [184]:
# Number of rows and columns
df.shape

(695, 9)

In [186]:
df.to_csv('getonboard_joboffers.csv')