# Project 01: **LinkedIn Jobs**

[Kaggle dataset of scraped LinkedIn jobs](https://www.kaggle.com/datasets/arshkon/linkedin-job-postings/)

## To do list:

- celé přepsat do **.py skriptu** (main) - **DONE**
- write into #logger that it will log max. 1000 rows. - **DONE**
- extract state names from 'location' column - **DONE**
- recalculate salaries (min, med, max) according to pay_period - **DONE**
- extract domain name from 'application_url' and 'posting_domain' -
- try Logging of the class Errors - **DONE**
- change searching by title using ALL title keywords, not only the exact match - maybe write different functions for 'exact match' or 'partial match' - **DONE**
- change the class interaction using **input method** - The client would be asked e.g. location, job title, etc. and the result would be job applications -
- create a method for filtering jobs by multiple conditions at the same time, e.g. job title and location, etc. -

## Setting-up an environment:

In [1]:
# Importing main libraries:
import datetime
import logging
import yaml
import pandas as pd
import numpy as np
from itertools import chain

# Statistics and EDA libraries:
import scipy as stats
import sweetviz as sv
from ydata_profiling import ProfileReport  # former pandas_profiling!

# Plotting libraries:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Setting-up the display options for Pandas dataframes to display all the columns (not truncated):
pd.options.display.max_columns = None
pd.options.display.max_rows = None

# Creating a class for different print styles:
class style:
    #-------------------
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    RED = '\033[91m'
    #-------------------
    YELLOW = '\033[93m'
    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    #-------------------
    END = '\033[0m'
    #-------------------

In [2]:
### FUNCTIONS ###

# =================================================================================================================== #
def get_dataframe_name(dataframe: pd.DataFrame) -> str:
    """
    The function returns a name of the DataFrame variable as a string.
    
    Parameters: 
    - dataframe (pd.DataFrame): DataFrame variable
    
    Returns: 
    - str: name of the DataFrame variable as a string
    """
    
    for obj_name, obj in globals().items():
        if obj is dataframe and isinstance(obj, pd.DataFrame):
            return obj_name
# =================================================================================================================== #


# =================================================================================================================== #
# Function for loading the file:
def csv_load(file: str, delimiter: str) -> pd.DataFrame:
    """
    The function loads the files in the CSV format (with specified delimiter) into Pandas dataframe.
    
    Parameters:
    - file (string): the whole path of the CSV file (including a file name with file extension), e.g.: '../dataset/job_postings.csv'
    - delimiter (str): the delimiter used in the CSV, e.g.: comma, semicolon, etc.
    
    Return:
    - pd.Dataframe: the dataframe that holds the data from the CSV file
    """
    
    try:
        print(f'Loading the {file} file using a delimiter: {delimiter}')
        pd.read_csv(file, delimiter=delimiter)
        print(f'File was succesfully loaded into the dataframe using a delimiter: {delimiter}')
    except Exception as error:
        print(f'Error: {error}')
        print('File was not loaded into the dataframe!')
    else:
        return pd.read_csv(file, delimiter=delimiter)
# =================================================================================================================== #


# =================================================================================================================== #
def flatten_list(cell_list):
    """
    The function flattens a list structure.
    
    Parameters:
    - cell_list (string): a list to flatten
    
    Return:
    - list: a list with flattened structure
    """
    
    return list(chain.from_iterable(cell_list))
# =================================================================================================================== #

## Importing final CSV file

In [3]:
# Final dataset:
# Reading the CSV file into a Pandas DataFrame:
file = 'final_dataset.csv'
data = csv_load(file, ',')
display(data.loc[0:3])

Loading the final_dataset.csv file using a delimiter: ,
File was succesfully loaded into the dataframe using a delimiter: ,


Unnamed: 0,job_id,company_id,title,description,work_type,location,applies,original_listed_time,remote_allowed,views,job_posting_url,application_url,application_type,expiry_time,closed_time,experience_level,skills_desc,listed_time,posting_domain,sponsored,scraped,company_name,company_url,industry,salary_id,max_salary,med_salary,min_salary,pay_period,currency,compensation_type,employee_count,follower_count,skills_abbr,skills,original_listed_time_ms,expiry_time_ms,closed_time_ms,listed_time_ms,state,state_abbr,min_salary_normalized,med_salary_normalized,max_salary_normalized
0,3757940104,553718.0,Hearing Care Provider,Overview\n\nHearingLife is a national hearing ...,Full-time,"Little River, SC",,1699090000000.0,,9.0,https://www.linkedin.com/jobs/view/3757940104/...,https://careers-demant.icims.com/jobs/19601/he...,OffsiteApply,1701680000000.0,,Entry level,,1699090000000.0,careers-demant.icims.com,0,1699138101,HearingLife,https://www.linkedin.com/company/hearing-life,['Retail'],13493.0,,5250.0,,MONTHLY,USD,BASE_SALARY,[1171],[11417],['OTHR'],['Other'],2023-11-04 09:26:40,2023-12-04 08:53:20,,2023-11-04 09:26:40,South carolina,SC,,63000.0,
1,3757940025,2192142.0,Shipping & Receiving Associate 2nd shift (Beav...,Metalcraft of Mayville\nMetalcraft of Mayville...,Full-time,"Beaver Dam, WI",,1699080000000.0,,,https://www.linkedin.com/jobs/view/3757940025/...,https://www.click2apply.net/mXLQz5S5NEYEXsKjwH...,OffsiteApply,1701680000000.0,,,,1699080000000.0,www.click2apply.net,0,1699085420,"Metalcraft of Mayville, Inc.",https://www.linkedin.com/company/metalcraft-of...,['Industrial Machinery Manufacturing'],,,,,,,,[300],[2923],"['MGMT', 'MNFC']","['Management', 'Manufacturing']",2023-11-04 06:40:00,2023-12-04 08:53:20,,2023-11-04 06:40:00,Wisconsin,WI,,,
2,3757938019,474443.0,"Manager, Engineering",\nThe TSUBAKI name is synonymous with excellen...,Full-time,"Bessemer, AL",,1699080000000.0,,,https://www.linkedin.com/jobs/view/3757938019/...,https://www.click2apply.net/LwbOykH2yAJdahB5Ah...,OffsiteApply,1701680000000.0,,,Bachelor's Degree in Mechanical Engineering pr...,1699080000000.0,www.click2apply.net,0,1699085644,"U.S. Tsubaki Power Transmission, LLC",https://www.linkedin.com/company/u.s.-tsubaki-...,['Automation Machinery Manufacturing'],,,,,,,,[314],[8487],['ENG'],['Engineering'],2023-11-04 06:40:00,2023-12-04 08:53:20,,2023-11-04 06:40:00,Alabama,AL,,,
3,3757938018,18213359.0,Cook,descriptionTitle\n\n Looking for a great oppor...,Full-time,"Aliso Viejo, CA",,1699080000000.0,,1.0,https://www.linkedin.com/jobs/view/3757938018/...,https://jobs.apploi.com/view/854782?utm_campai...,OffsiteApply,1701680000000.0,,Entry level,,1699080000000.0,jobs.apploi.com,0,1699087461,Episcopal Communities & Services,https://www.linkedin.com/company/episcopal-com...,"['Non-profit Organization Management', 'Non-pr...",12025.0,,22.27,,HOURLY,USD,BASE_SALARY,[36],[305],"['MGMT', 'MNFC']","['Management', 'Manufacturing']",2023-11-04 06:40:00,2023-12-04 08:53:20,,2023-11-04 06:40:00,California,CA,,46321.6,


## Use Case - Searching for jobs:

Possible other search conditions: 
- remote
- views
- posting_domain
- listing_time
- linkedin_url
- application_url

In [41]:
###  VERSION 04 (Chaining filters)  ###
# Defining a class for finding jobs according to specific properties:

class JobFinder:
    '''
    A class for finding jobs according to specific properties.
    
    NOTE: Inthe end of each search query needs to be '.get_data()' method 
    in order to retrieve the final filtered data.
    Example: jobs = job_finder.location("California").salary_range(60000, 80000).get_data()
    '''
    
    def __init__(self, jobs):
        self.jobs = jobs
    
    def title(self, title):
        filtered_jobs = self.jobs[ self.jobs['title'].astype(str).str.contains(title, case=False, regex=True) ]
        return JobFinder(filtered_jobs)
    
    def company_name(self, company_name):
        filtered_jobs = self.jobs[ self.jobs['company_name'].astype(str).str.contains(company_name, case=False, regex=True) ]
        return JobFinder(filtered_jobs)
    
    def description(self, description):
        filtered_jobs = self.jobs[ self.jobs['description'].astype(str).str.contains(description, case=False, regex=True) ]
        return JobFinder(filtered_jobs)
    
    def location(self, location):
        filtered_jobs = self.jobs[
            (self.jobs['state'].astype(str).str.lower() == location.lower()) |
            (self.jobs['state'].isna()) |
            (self.jobs['state_abbr'].astype(str).str.lower() == location.lower()) |
            (self.jobs['state_abbr'].isna())
        ]
        return JobFinder(filtered_jobs)
    
    def experience_level(self, experience_level):
        filtered_jobs = self.jobs[ self.jobs['experience_level'].apply(flatten_list).astype(str).str.contains(experience_level, case=False, regex=True) ]
        return JobFinder(filtered_jobs)
    
    def skills(self, skills):
        filtered_jobs = self.jobs[ self.jobs['skills'].astype(str).str.contains(skills, case=False, regex=True) ]
        return JobFinder(filtered_jobs)
    
    def work_type(self, work_type):
        filtered_jobs = self.jobs[ self.jobs['work_type'].astype(str).str.contains(work_type, case=False, regex=True) ]
        return JobFinder(filtered_jobs)
    
    def salary_range(self, min_salary, max_salary):
        filtered_jobs = self.jobs[
            (((self.jobs['min_salary_normalized'] >= min_salary) | self.jobs['min_salary_normalized'].isna()) & 
             ((self.jobs['max_salary_normalized'] <= max_salary) | self.jobs['max_salary_normalized'].isna())) | 
            (((self.jobs['med_salary_normalized'] >= min_salary) | self.jobs['med_salary_normalized'].isna()) & 
             ((self.jobs['med_salary_normalized'] <= max_salary) | self.jobs['med_salary_normalized'].isna()))
        ]
        return JobFinder(filtered_jobs)
    
    def salary_range_simple(self, min_salary, max_salary):
        filtered_jobs = self.jobs[ (self.jobs['min_salary_normalized'] >= min_salary) & (self.jobs['max_salary_normalized'] <= max_salary)  ]
        return JobFinder(filtered_jobs)
    
    def salary_range_median(self, min_salary, max_salary):
        filtered_jobs = self.jobs[ (self.jobs['med_salary_normalized'] >= min_salary) & (self.jobs['med_salary_normalized'] <= max_salary)  ]
        return JobFinder(filtered_jobs)
    
    
    # Function to retrieve the filtered data:
    def get_data(self):
        return self.jobs

# Creating an instance of the JobFinder class with data DataFrame:
job_finder = JobFinder(data)

In [35]:
# USE case: Filtering jobs by many conditions:
filtered_jobs = job_finder.description('python').location('California').salary_range(60000, 80000).skills('information').work_type('full-time').get_data()
print('Number of matches:', len(filtered_jobs))
filtered_jobs.iloc[0:5]

Number of matches: 303


Unnamed: 0,job_id,company_id,title,description,work_type,location,applies,original_listed_time,remote_allowed,views,job_posting_url,application_url,application_type,expiry_time,closed_time,experience_level,skills_desc,listed_time,posting_domain,sponsored,scraped,company_name,company_url,industry,salary_id,max_salary,med_salary,min_salary,pay_period,currency,compensation_type,employee_count,follower_count,skills_abbr,skills,original_listed_time_ms,expiry_time_ms,closed_time_ms,listed_time_ms,state,state_abbr,min_salary_normalized,med_salary_normalized,max_salary_normalized
15,3757935012,18583501.0,Quantitative Trader [5048],"Quantitative traders research, develop, and re...",Full-time,"New York, United States",36.0,1699080000000.0,,158.0,https://www.linkedin.com/jobs/view/3757935012/...,,ComplexOnsiteApply,1701670000000.0,,Associate,,1699080000000.0,,0,1699135011,Stealth Startup,https://www.linkedin.com/company/stealth-start...,['Software Development'],,,,,,,,"[16720, 16721, 16736]","[572517, 572617, 573230]",['IT'],['Information Technology'],2023-11-04 06:40:00,2023-12-04 06:06:40,,2023-11-04 06:40:00,,,,,
153,3757922135,2957445.0,Fullstack Engineer,"About FareHarbor\n\nAt FareHarbor, our mission...",Full-time,"San Francisco, CA",16.0,1696920000000.0,,40.0,https://www.linkedin.com/jobs/view/3757922135/...,https://fareharbor.com/careers/jobs/?gh_jid=68...,OffsiteApply,1701670000000.0,,Entry level,,1699080000000.0,fareharbor.com,0,1699081419,FareHarbor,https://www.linkedin.com/company/fareharbor,['Travel Arrangements'],11223.0,160610.0,,144591.0,YEARLY,USD,BASE_SALARY,[737],[12893],"['ENG', 'IT']","['Engineering', 'Information Technology']",2023-10-10 06:40:00,2023-12-04 06:06:40,,2023-11-04 06:40:00,California,CA,144591.0,,160610.0
170,3757920617,54350022.0,Full Stack Developer - API Integrations,Full Stack Developer - API IntegrationsJob Des...,Full-time,United States,4.0,1699080000000.0,1.0,6.0,https://www.linkedin.com/jobs/view/3757920617/...,,SimpleOnsiteApply,1701670000000.0,,,,1699080000000.0,,0,1699080309,Jobsrefer Indonesia,https://www.linkedin.com/company/jobsrefer-ind...,['Human Resources Services'],,,,,,,,[7],[18020],"['ENG', 'IT']","['Engineering', 'Information Technology']",2023-11-04 06:40:00,2023-12-04 06:06:40,,2023-11-04 06:40:00,,,,,
198,3757918639,,Staff Scientist,\nResponsibilitiesLooking for a Staff Scientis...,Full-time,"San Jose, CA",,1699080000000.0,,23.0,https://www.linkedin.com/jobs/view/3757918639/...,,ComplexOnsiteApply,1701670000000.0,,,,1699080000000.0,,0,1699132529,,,,12779.0,155000.0,,145000.0,YEARLY,USD,BASE_SALARY,,,"['RSCH', 'ANLS', 'IT']","['Research', 'Analyst', 'Information Technology']",2023-11-04 06:40:00,2023-12-04 06:06:40,,2023-11-04 06:40:00,California,CA,145000.0,,155000.0
224,3757917592,10091.0,Principal VMware Cloud Architect,Company Summary Statement\n\nAs one of the lar...,Full-time,United States,1.0,1696920000000.0,1.0,5.0,https://www.linkedin.com/jobs/view/3757917592/...,https://careers.pplweb.com/jobs/10012?lang=en-us,OffsiteApply,1701670000000.0,,Mid-Senior level,,1699080000000.0,careers.pplweb.com,0,1699085346,PPL Corporation,https://www.linkedin.com/company/pplcorporation,['Utilities'],,,,,,,,"[6098, 6120]","[37777, 37456]","['ENG', 'IT']","['Engineering', 'Information Technology']",2023-10-10 06:40:00,2023-12-04 06:06:40,,2023-11-04 06:40:00,,,,,
