<p>The following notebook provides script of scraping the data <b>staff.am.</b>
The general sequence of steps I followed are:
    
1. [Getting page urls with job annoucements](#load)
2. [Getting job urls and defining a function getting data of interest from job announcements](#func)  
3. [Crawling, getting and saving the data into csv](#crawl)

 <h2>1.Getting page urls with job annoucements</h2> <a name="load"></a>

In [1]:
#Importing all needed libraries
import pandas as pd
import time
import requests
from bs4 import BeautifulSoup
from pprint import pprint
import csv
import more_itertools

In [2]:
#getting staff.am pages
links=[f"https://staff.am/en/jobs?page={i}&per-page=50" for i in range(1,15)]

In [3]:
#links of staff.am pages have the following view
links

['https://staff.am/en/jobs?page=1&per-page=50',
 'https://staff.am/en/jobs?page=2&per-page=50',
 'https://staff.am/en/jobs?page=3&per-page=50',
 'https://staff.am/en/jobs?page=4&per-page=50',
 'https://staff.am/en/jobs?page=5&per-page=50',
 'https://staff.am/en/jobs?page=6&per-page=50',
 'https://staff.am/en/jobs?page=7&per-page=50',
 'https://staff.am/en/jobs?page=8&per-page=50',
 'https://staff.am/en/jobs?page=9&per-page=50',
 'https://staff.am/en/jobs?page=10&per-page=50',
 'https://staff.am/en/jobs?page=11&per-page=50',
 'https://staff.am/en/jobs?page=12&per-page=50',
 'https://staff.am/en/jobs?page=13&per-page=50',
 'https://staff.am/en/jobs?page=14&per-page=50']

 <h2>2.Getting job urls and defining a function getting data of interest from job announcements</h2> <a name="func"></a>

In [4]:
#getting job urls out of staff.am pages
def get_job_links(url):
    '''function getting job urls out of the main pages of staff.am and returning href list'''
    response=requests.get(url)
    time.sleep(3)
    response=response.text
    page=BeautifulSoup(response, "html.parser")
    job_urls=[i.find_next().get("href") for i in page.find_all("div", attrs={"data-id":True})]
    return job_urls

In [5]:
#making a list out of list of lists and getting full urls
all_job_links=[get_job_links(i) for i in links]
all_job_links=list(more_itertools.collapse(all_job_links))
all_job_links=[f"https://staff.am{i}" for i in all_job_links ]

In [6]:
#getting the selected data out of job announcement
def get_all_staff(url):
    '''function sending request to job announcement pages and returning the outlined details'''
    response=requests.get(url)
    time=3
    page=response.text
    page=BeautifulSoup(page, "html.parser")
    company_name=page.find("h1", class_="job_company_title").get_text()
    position=page.find("div", attrs={"id":"job-post"}).find_next("h2").get_text()
    location=page.find("span", text="Location:").next_sibling
    category=page.find("span", text="Category:").next_sibling
    job_load=page.find("span", text="Job type:").next_sibling
    responsibilities=page.find("h3", text="Job responsibilities").find_next().get_text()
    qualifications=page.find("h3", text="Required qualifications").find_next().get_text()
    return company_name,position,location,category,job_load,responsibilities,qualifications

 <h2>3.Crawling, getting and saving the data into csv</h2> <a name="crawl"></a>

In [7]:
#crawling staff.am and saving the data into final_data_staff list
# continue is for going to next iteration regardless of the exceptions raised by the code
final_data_staff=[]
for i in  all_job_links:
    try:
        all_data=get_all_staff(i)
    except:
        continue
    final_data_staff.append(all_data)

In [8]:
final_data_staff=pd.DataFrame(final_data_staff,columns=["company_name","position","location","category","job_load","responsibilities","qualifications"])

In [9]:
final_data_staff.head()

Unnamed: 0,company_name,position,location,category,job_load,responsibilities,qualifications
0,Digitain,Outbound and Retention Specialist,Yerevan,Marketing/Advertising/PR,Full time,"\n\nPresent, promote and sell company services...","\n\nHigher education\nProficient in Armenian, ..."
1,Digitain,Turkish Speaking Customer support Specialist,Yerevan,Sales/service management,Full time,"\n\nCustomer support through online chat, and...","\nOur customers call us 24 hours a day, 365 da..."
2,Digitain,English Speaking Customer Care Specialist,Yerevan,Sales/service management,Full time,\n\nCustomer support through online chat and ...,"\nOur customers call us 24 hours a day, 365 da..."
3,FINCA UCO CJSC,Loan Officer in Myasnikyan Branch,Armavir,Banking/credit,Full time,"\n\nԻրականացնել ուղիղ մարքեթինգ, պոտենցիալ հաճ...","\n\nԲարձրագույն կրթություն, նախընտրելի է ֆինան..."
4,FINCA UCO CJSC,Internal Control Specialist,Yerevan,Banking/credit,Full time,\n\nCarries out the internal control functions...,\n\n Bachelor degree in business/finance requi...


In [10]:
final_data_staff.to_csv("Final_data_staff.csv", header=True, index=False) #saving into csv