## LinkedIn Jobs Section parser

Looking for a job can be time-consuming and tiresome. It takes more time to find the both-side perfect position than to be interviewed. 

The goal of this project is to parse LinkeIn Jobs section to extract crucial info about data analyst positions. The search would be completed worldwide, however it can be narrowed to a country, region, city, etc.

The result should be a csv file with all the data: job title, company name, location, short description, position link.

In [1]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import lxml

Since the remote work is one of the most comfortable working options to me, I will be using the **Remote** filter

In [2]:
link = 'https://www.linkedin.com/jobs/search/?f_WRA=true&geoId=92000000&keywords=data%20analyst'

In [3]:
responce = requests.get(link)
responce

<Response [200]>

In [4]:
page = responce.text
soup = bs(page, 'lxml')

In [5]:
soup

<!DOCTYPE html>
<html lang="en">
<head>
<meta content="d_jobs_guest_search" name="pageKey"/>
<meta content="urlType=jserp_custom;emptyResult=false" name="linkedin:pageTag"/>
<meta content="en_US" name="locale"/>
<meta data-app-version="2.0.517" data-browser-id="40cae822-a766-49aa-82bd-efadfc3af3db" data-call-tree-id="5Lw7dPdymBbwmgmccCsAAA==" data-enable-page-view-heartbeat-tracking="" data-multiproduct-name="jobs-guest-frontend" data-service-name="jobs-guest-frontend" id="config"/>
<link href="https://www.linkedin.com/jobs/data-analyst-jobs-worldwide" rel="canonical"/>
<!-- --><!-- -->
<!-- -->
<!-- -->
<!-- -->
<link href="https://static-exp1.licdn.com/sc/h/al2o9zrvru7aqj8e1x2rzsrca" rel="icon"/>
<script>
            function getDfd() {let yFn,nFn;const p=new Promise(function(y, n){yFn=y;nFn=n;});p.resolve=yFn;p.reject=nFn;return p;}
            window.lazyloader = getDfd();
            window.tracking = getDfd();
            window.impressionTracking = getDfd();
            window.i

In [6]:
type(soup)

bs4.BeautifulSoup

To get the info about a position I will be using the a tag with its class, that is written on the page. 

In [7]:
def get_info(soup, class_, selection_class):
    '''
    get text info 
  
    '''
    info = []
    tags = soup.find_all(class_ , {'class' : selection_class})
    for tag in tags:
        info.append(tag.text.strip())
        
    return info

Tags and classes to get info:

    - a, hidden-nested-link - company name
    - a, base-card__full-link - position title
    - span, job-search-card__location - location
    - p, job-search-card__snippet - short position description

In [8]:
job_title = get_info(soup, 'a', 'base-card__full-link')
company = get_info(soup, 'a', 'hidden-nested-link')
location = get_info(soup, 'span', 'job-search-card__location')
short_desc = get_info(soup, 'p', 'job-search-card__snippet')

Since links are stored within href tag we will extract them in a different way

In [9]:
links = []
for link in soup.find_all('a', class_ = "base-card__full-link"):
    links.append(link.get('href'))

In [10]:
df = pd.DataFrame(list(zip(job_title, company, location, short_desc, links)), columns = ['title', 'company', 'location', 'description', 'link'])

In [11]:
df.head()

Unnamed: 0,title,company,location,description,link
0,Data Analyst,EMPG,Egypt,We are looking for a passionate Data Analyst. ...,https://eg.linkedin.com/jobs/view/data-analyst...
1,Data Analyst | Exp: 1-3 Yrs,grofers,"Bengaluru, Karnataka, India",Excellent communication skills at all levels w...,https://in.linkedin.com/jobs/view/data-analyst...
2,Data Analyst,WEN- Women Entrepreneur Network,"Bengaluru, Karnataka, India","Looking for a ""Lead Data Analyst""having releva...",https://in.linkedin.com/jobs/view/data-analyst...
3,Data Analyst,"Denken Solutions, Inc",United States,"At Canva, our mission is to empower the world ...",https://www.linkedin.com/jobs/view/data-analys...
4,Data Analyst,Canva,"Sydney, New South Wales, Australia",The ideal candidate will use their passion for...,https://au.linkedin.com/jobs/view/data-analyst...


And last but not least - we store our results in csv format

In [12]:
df.to_csv(r'C:\Users\Olya\Desktop\job search.csv', index = False)