In [4]:
# !python -m spacy download en
# nltk.download('stopwords') # run this one time

### Resume tailoring

A company has a requirement for two job roles i.e Android Developer and Journalist. And your manager wants you to do some basic analysis for him.

For most job openings, a particular skill set is desired to perform specific tasks. Tailoring your resume is about recognizing those skills and responsibilities on the job description and making it obvious that you’re up to the task. Your company's goal is to draw the shortest line possible between your experience and what’s stated in the job description.

Tailoring your resume connects the dots for recruiters and hiring managers who are overwhelmed by a flood of generic applicants. Instead of proving that you’re an experienced professional in general, it shows them that you’re a perfect fit for this specific job.

 


### About the dataset
When performing data science tasks, it’s common to use data found on the internet. You’ll usually be able to access this data in CSV format, or via an Application Programming Interface (API). However, there are times when the data you want can only be accessed as part of a web page. In cases like this, you’ll want to use a technique called web scraping to get the data from the web page into a format you can work within your analysis.

You need to perform Topic Modelling on the given data and extract useful topics that will help your manager to short list the candidates based on the topics for a specified job role.

The scraped data is been provided to you in the form of `csv`.

|Feature|Description|
|-----|-----|
|company| Name of the company|
|job| job title|
|job_desc| description of jobs|
|location|job locaton|
|url|Link of the jobs from it was scraped|
|job_type|type of the job|

### Importing necessary libraries

In [2]:
import pandas as pd
import numpy as np
import re
import spacy
import gensim
from gensim import corpora
pd.set_option("display.max_colwidth", 200)


import operator
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer

import nltk
from nltk.stem import WordNetLemmatizer
from string import punctuation
from nltk.tokenize import word_tokenize
from collections import Counter
import operator

# libraries for visualization
import pyLDAvis
import pyLDAvis.gensim
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
%matplotlib inline

### Loading the dataset 

In [5]:
jobs = pd.read_csv("merged_indeed_new.csv")
jobs.head()

Unnamed: 0,company,job,job_desc,location,url,job_type
0,Micro Focus,Core Java Developer,"Core Java Developer\r\r\n\r\r\nJob Description:\r\r\n\r\r\nAt Micro Focus, everything we do is based on a simple idea: The fastest way to get results is to build on what you have. Our software sol...",Bangalore,https://www.indeed.co.in/pagead/clk?mo=r&ad=-6NYlbfkN0BduEgvIgdT7EDM_O2GxHkw7QoaouEWjxefAvaX3ZwZ9tYBt705y4baMnhcBFo_61Er-rE59t0DIcl816jpSPhQlE2-XsX8ZtBLxXVsMZdq8VWbZfs3uXN1oqCxQ7zxUL2JTHVmAEZthPEL...,Android Developer
1,MS Clinical Research (P) Ltd,Android App Developer,"Job Summary\r\r\nOver 3 years experience designing, developing, integrating, and supporting Android App development\r\r\nApply in-depth understanding of business and IT requirements to streamline ...",Bangalore,https://www.indeed.co.in/pagead/clk?mo=r&ad=-6NYlbfkN0BduEgvIgdT7EDM_O2GxHkw7QoaouEWjxefAvaX3ZwZ9tYBt705y4baMnhcBFo_61Er-rE59t0DIcl816jpSPhQlE2-XsX8ZtBLxXVsMZdq8VWbZfs3uXN1oqCxQ7zxUL2JTHVmAEZthPEL...,Android Developer
2,Applied Materials Inc.,Software Engineer – Unity Developer,"Company Introduction\r\r\nApplied Materials , Inc. is the global leader in materials engineering solutions for the semiconductor, flat panel display and solar photovoltaic (PV) industries. applied...",Bangalore,https://www.indeed.co.in/pagead/clk?mo=r&ad=-6NYlbfkN0BduEgvIgdT7EDM_O2GxHkw7QoaouEWjxefAvaX3ZwZ9tYBt705y4baMnhcBFo_61Er-rE59t0DIcl816jpSPhQlE2-XsX8ZtBLxXVsMZdq8VWbZfs3uXN1oqCxQ7zxUL2JTHVmAEZthPEL...,Android Developer
3,Shaw Academy,Lead Mobile Developer,"Senior Mobile App Developer/Lead (Android or iOS)\r\r\nShaw Academy is seeking a Mobile Development Lead, initially to be the hands-on coder for our apps and then to build a team aroundyou. The ro...",Bangalore,https://www.indeed.co.in/pagead/clk?mo=r&ad=-6NYlbfkN0BduEgvIgdT7EDM_O2GxHkw7QoaouEWjxefAvaX3ZwZ9tYBt705y4baMnhcBFo_61Er-rE59t0DIcl816jpSPhQlE2-XsX8ZtBLxXVsMZdq8VWbZfs3uXN1oqCxQ7zxUL2JTHVmAEZthPEL...,Android Developer
4,Letsgettin Private Limited,Android Developer- Freshers,Job Summary\r\r\nPosition: Android developer\r\r\nEducation: Bachelor's\r\r\nRequired candidates: Freshers\r\r\nResponsibilities and Duties\r\r\nDesign and build advanced applications for the Andr...,Bangalore,https://www.indeed.co.in/pagead/clk?mo=r&ad=-6NYlbfkN0BduEgvIgdT7EDM_O2GxHkw7QoaouEWjxefAvaX3ZwZ9tYBt705y4baMnhcBFo_61Er-rE59t0DIcl816jpSPhQlE2-XsX8ZtBLxXVsMZdq8VWbZfs3uXN1oqCxQ7zxUL2JTHVmAEZthPEL...,Android Developer


### Drop unnecessary Columns

For the analysis of the job description, we are only interested in the text data associated with the jobs. We will analyze this text data using natural language processing. Since the file contains some metadata such as company, location and url. It is necessary to remove all the columns that do not contain useful text information.

In [6]:
jobs.drop(columns=['company', 'location', 'url'], inplace=True)

In [9]:
jobs.head()

Unnamed: 0,job,job_desc,job_type
0,Core Java Developer,"Core Java Developer\r\r\n\r\r\nJob Description:\r\r\n\r\r\nAt Micro Focus, everything we do is based on a simple idea: The fastest way to get results is to build on what you have. Our software sol...",Android Developer
1,Android App Developer,"Job Summary\r\r\nOver 3 years experience designing, developing, integrating, and supporting Android App development\r\r\nApply in-depth understanding of business and IT requirements to streamline ...",Android Developer
2,Software Engineer – Unity Developer,"Company Introduction\r\r\nApplied Materials , Inc. is the global leader in materials engineering solutions for the semiconductor, flat panel display and solar photovoltaic (PV) industries. applied...",Android Developer
3,Lead Mobile Developer,"Senior Mobile App Developer/Lead (Android or iOS)\r\r\nShaw Academy is seeking a Mobile Development Lead, initially to be the hands-on coder for our apps and then to build a team aroundyou. The ro...",Android Developer
4,Android Developer- Freshers,Job Summary\r\r\nPosition: Android developer\r\r\nEducation: Bachelor's\r\r\nRequired candidates: Freshers\r\r\nResponsibilities and Duties\r\r\nDesign and build advanced applications for the Andr...,Android Developer


###  Calculate the number of jobs for each job type. 

In [27]:
jobs.groupby('job_type')['job'].value_counts()

job_type                   job                                                                    
Android Developer          Android Developer                                                          30
                           Android App Developer                                                       3
                           Android Application Developer                                               2
                           Junior Android Developer                                                    2
                           Android & PHP Software Developer                                            1
                           Android - Mobile Developer                                                  1
                           Android Developer (Trainee)                                                 1
                           Android Developer Required                                                  1
                           Android Developer for Pune, Hinjew

In [16]:
jobs['job'].value_counts()

Android Developer                                                      30
Data Scientist                                                         24
Front End Developer                                                    21
iOS Developer                                                          18
Machine Learning Engineer                                              14
Full Stack Developer                                                   14
Backend Developer                                                      12
Data Analyst                                                            9
Full Stack Web Developer                                                7
Web Developer                                                           6
Software Engineer                                                       6
Research Analyst                                                        6
PHP Developer                                                           6
Market Research Analyst               

### Subset the jobs(i.e Android Developer & Journalist) based on job_type. And store only job_desc based on job type.

Further analysis will be done only on `job_desc` column.

In [14]:
jobs[jobs['job_type'] == 'Android Developer']

Unnamed: 0,job,job_desc,job_type
0,Core Java Developer,"Core Java Developer\r\r\n\r\r\nJob Description:\r\r\n\r\r\nAt Micro Focus, everything we do is based on a simple idea: The fastest way to get results is to build on what you have. Our software sol...",Android Developer
1,Android App Developer,"Job Summary\r\r\nOver 3 years experience designing, developing, integrating, and supporting Android App development\r\r\nApply in-depth understanding of business and IT requirements to streamline ...",Android Developer
2,Software Engineer – Unity Developer,"Company Introduction\r\r\nApplied Materials , Inc. is the global leader in materials engineering solutions for the semiconductor, flat panel display and solar photovoltaic (PV) industries. applied...",Android Developer
3,Lead Mobile Developer,"Senior Mobile App Developer/Lead (Android or iOS)\r\r\nShaw Academy is seeking a Mobile Development Lead, initially to be the hands-on coder for our apps and then to build a team aroundyou. The ro...",Android Developer
4,Android Developer- Freshers,Job Summary\r\r\nPosition: Android developer\r\r\nEducation: Bachelor's\r\r\nRequired candidates: Freshers\r\r\nResponsibilities and Duties\r\r\nDesign and build advanced applications for the Andr...,Android Developer
5,Android Developer,Job Summary\r\r\nWe are looking for an Android developer responsible for the development and maintenance of applications aimed at a vast number of diverse Android devices. Following are the requir...,Android Developer
6,Android Developer,We are looking for an Android Developer who possesses a passion for pushing mobile technologies to the limits. This Android app developer will work with our team of talented engineers to design an...,Android Developer
7,Android Developer,"Job Summary\r\r\nLooking for a Android Developer for Bangalore Location\r\r\nResponsibilities and Duties\r\r\nJob description\r\r\nImplementing, and enhancing the Android mobile application.\r\r\n...",Android Developer
8,Android Developer,As an Android Developer you will be responsible for design and development of Android applications. You must be comfortable with Android Studios and knowledgable of latest Android SDK including An...,Android Developer
9,Android Developer,"Responsibilities and Duties\r\r\n* Design and build advanced applications for the Android platform\r\r\n* Collaborate with cross-functional teams to define, design, and ship new features\r\r\n* Wo...",Android Developer


### Retain alphabets/remove unnecessary space
Now, we will perform some simple preprocessing on the job description column(i.e `job_desc`) in order to make them more amenable for analysis. We will use a regular expression to retain only alphabets in the description and remove unnecessary space.

### Exploratory Analysis: Plot the word cloud of the most common words
In order to verify whether the preprocessing happened correctly, we can make a word cloud of the text of the job descriptions. This will give us a visual representation of the most common words. Visualization is key to understanding whether we are still on the right track! In addition, it allows us to verify whether we need additional preprocessing before further analyzing the text data. Python has a massive number of open libraries! Instead of trying to develop a method to create word clouds ourselves, we'll use Andreas Mueller's wordcloud library

### Let's remove some common words that every job description contain. A common words list is provided to you(you can add more). Display top 10 most occuring words.
LDA does not work directly on text data. First, it is necessary to convert the documents into a simple vector representation. This representation will then be used by LDA to determine the topics. Each entry of a 'document vector' will correspond with the number of times a word occurred in the document.

It seems that now most frequent terms in our data are relevant.

### Perform lowercasing and calculate the frequency of top 10  words. 

### Model buliding with LSA

### Model buliding with LDA

In LSA we saw that Topic 1 has some different words which are not related to the Journalist job. Lets see if we can improve our topics by using LDA algorithm.

### Analyzing with LDA model

To visualize the topics in a 2-dimensional space we will use the pyLDAvis library. This visualization is interactive in nature and displays topics along with the most relevant words.

pyLDAvis package is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The interactive visualization pyLDAvis produces is helpful for both:

1. Better understanding and interpreting individual topics, and
2. Better understanding the relationships between the topics.

For (1), you can manually select each topic to view its top most freqeuent and/or “relevant” terms, using different values of the λ parameter. This can help when you’re trying to assign a human interpretable name or “meaning” to each topic.

For (2), exploring the Intertopic Distance Plot can help you learn about how topics relate to each other, including potential higher-level structure between groups of topics.


Here is the documentation for <a href="https://pyldavis.readthedocs.io/en/latest/readme.html">pyLDAvis</a>