In this analysis, we will explore a dataset of google job ads. Those ads correspond either to the search string `machine learning engineer` or `data scientist`.

In [1]:
import os
import re

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
%matplotlib inline

In [3]:
plt.style.use('fivethirtyeight')
plt.rcParams['figure.dpi']= 180

In [4]:
file_path = os.path.abspath(os.path.join(os.getcwd(), '..', 'job_ads.csv'))
ads = pd.read_csv(file_path, index_col=0)
ads.drop_duplicates(inplace=True)

In [5]:
ads.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 643 entries, 0 to 419
Data columns (total 7 columns):
id                  643 non-null object
job_descr           497 non-null object
location            638 non-null object
minimum_qual        497 non-null object
preferred_qual      497 non-null object
responsibilities    497 non-null object
title               643 non-null object
dtypes: object(7)
memory usage: 40.2+ KB


First, let us try using simple string operations to derive the additional information from the job title. Such information could be: seniority, function, etc.

In [6]:
ads[['role', 'department', 'area', 'other']] = ads.title.str.split(',', expand=True)

In [7]:
seniority_re = r'.*(intern|senior|lead|staff|principal|manager|head).*'
function_re = r'.*(engineer|scientist|science|consultant|architect|advocate|developer|analyst|specialist).*'

ads['seniority'] = ads.role.str.extract(pat=seniority_re, flags=re.I).fillna('L3-L4')[0].str.lower()
ads['function'] = (ads.role.str.extract(pat=function_re, flags=re.I)[0]
                   .str.lower().str.replace('science', 'scientist', flags=re.I))

In [9]:
ads[['title', 'role', 'department', 'area', 'seniority', 'function']].head()

Unnamed: 0,title,role,department,area,seniority,function
0,"Cloud AI Engineer, Professional Services",Cloud AI Engineer,Professional Services,,l3-l4,engineer
1,"Cloud AI Engineer, Professional Services",Cloud AI Engineer,Professional Services,,l3-l4,engineer
2,"Conversational AI Engineer, Google Cloud Profe...",Conversational AI Engineer,Google Cloud Professional Services,,l3-l4,engineer
3,"Cloud AI Engineer, Professional Services",Cloud AI Engineer,Professional Services,,l3-l4,engineer
4,"Data Scientist, Engineering",Data Scientist,Engineering,,l3-l4,scientist


An interesting insight could be to see the most relevant terms from the qualifications and responsibilities with regards to the function. This could give us an isight into the keywords that one should emphasize on in their resume.