***
This notebook is a part of the solution for DSG: City of LA competition. The solution splited into 5 parts. Here is the list of notebook in correct order. The part of solution you are currently reading is highlighted in bold.

[1. Introduction to the solution of DSG: City of LA](https://www.kaggle.com/niyamatalmass/1-introduction-to-the-solution-of-dsg-city-of-la)

[2. Raw Job Postings to structured CSV](https://www.kaggle.com/niyamatalmass/2-raw-job-bulletins-to-structured-csv)

[3. Identify biased language](https://www.kaggle.com/niyamatalmass/3-identify-biased-language)

[4. Improve the diversity and quality](https://www.kaggle.com/niyamatalmass/4-improve-the-diversity-and-quality)

[**5. Jobs Promotional Pathway**](https://www.kaggle.com/niyamatalmass/5-jobs-promotional-pathway)
***

In [None]:
!echo Y | apt-get install graphviz libgraphviz-dev pkg-config
!pip install pygraphviz
!pip install pyvis

<h1 align="center"><font color="#5831bc" face="Comic Sans MS">Jobs Promotional Pathway</font></h1> 

# <font color="#5831bc" face="Comic Sans MS">Notebook Overview</font>
Finally, we have to come to our last part. In this notebook, We will try to make it easier to determine which promotions are available to employees in each job class. Put more simply that we have to find all the possible promotional pathway an employer can traverse given a job. 

The solution to this problem is mostly done in part 2 where we extracted the data field and build a structured CSV dataset of the job description. There is a column name ```EXP_JOB_CLASS_TITLE```. It's store the title of the job that one must experience with to have that job. We can use this information to build our promotional graph. Without further talking, let's get started. 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import networkx as nx
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

%matplotlib inline

# <font color="#5831bc" face="Comic Sans MS">Load Dataset</font>
First, let's load our structured CSV file. We will use this dataset to find and create our promotional graph. 

In [None]:
df_jobs_struct = pd.read_csv('../input/2-raw-job-bulletins-to-structured-csv/jobs.csv')
df_jobs_struct[['JOB_CLASS_TITLE', 'JOB_CLASS_NO', 'EXP_JOB_CLASS_TITLE']].head(20)

That looks promising. We can easily use this dataset to create a graph like structure that employee can use to see all the promotional pathway a job has. 

# <font color="#5831bc" face="Comic Sans MS">Define function for finding promotions</font>
In this section, we will define two important functions for building our graph. It returns a list of tuples containing all the promotional path an employee can traverse given the job title. 

It will return a list of tuples. Each tuple contains promotional information like this pattern ```(PROMOTION JOB TITLE, GIVEN CURRENT JOB TITLE)``` so that we can easily create graph data structure from that. 

In [None]:
def get_upper_positions(current_job_title):
    return df_jobs_struct.loc[
        df_jobs_struct['EXP_JOB_CLASS_TITLE'].str.upper()\
        == current_job_title.upper()]

def build_graph(job_title,list_relations=None):
    # job title is the lower position
    # we will find upper position of this job_title
    
    # first get all the position this job_title as a experience
    
    
    upper_positions = get_upper_positions(job_title)
    
    if len(upper_positions) == 0:
        return
    else:
        if list_relations is None:
            list_relations = list(
                zip(
                    upper_positions.JOB_CLASS_TITLE.str.upper(),
                    upper_positions.EXP_JOB_CLASS_TITLE.str.upper()))

            if len(upper_positions) >= 1:
                for index, row in upper_positions.iterrows():
                    build_graph(row['JOB_CLASS_TITLE'], list_relations)
                    break
            else:
                return
        else:
            temp_list_relations = list(
                zip(
                    upper_positions.JOB_CLASS_TITLE.str.upper(),
                    upper_positions.EXP_JOB_CLASS_TITLE.str.upper()))
            for t in temp_list_relations:
                list_relations.append(t)
                
            if len(upper_positions) >= 1:
                for index, row in upper_positions.iterrows():
                    build_graph(row['JOB_CLASS_TITLE'])
                    break
            else:
                return
                
    return list_relations

We defined our functions, let's test the functions. 

In [None]:
network_tuple = build_graph('MANAGEMENT ANALYST')
network_tuple

Wow! We passed ```MANAGEMENT ANALYST``` in our function and it returns a list of tuples. Each tuple contains promotional information like this ```(PROMOTION JOB TITLE, GIVEN CURRENT JOB TITLE)```. Now we can easily make a network graph using this information for all possible promotional pathways. 

# <font color="#5831bc" face="Comic Sans MS">Build job promotional graph and draw</font>
Previously, we build functions that return relations to all possible promotional position by given a job. Now we will use that function and create a graph data structure using ```networkX```. NetworkX is a python library which makes very easy to create and drawing graph. We will use that in our solutions. Without further talking, let's get to work. 

In [None]:
%config InlineBackend.figure_format = 'retina'
from networkx.drawing.nx_agraph import write_dot, graphviz_layout

network_tuple = build_graph('MANAGEMENT ANALYST')
# build a networkx graph
G = nx.DiGraph() 
# add main node our given job title
G.add_node('MANAGEMENT ANALYST') 
# add edges of our graph 
if network_tuple is not None:
    G.add_edges_from(network_tuple)
    
    
# now plot that graph using networkx build in function
plt.figure(figsize=(20, 20))
pos =graphviz_layout(G, prog='dot')
nx.draw(G.reverse(), pos,with_labels=False, arrows=True, arrowstyle='simple', arrowsize=18)
text = nx.draw_networkx_labels(G,pos)
for _,t in text.items():
    t.set_rotation(45)
    t.set_fontsize(22)
plt.show()

YES! That's really awesome! We finally made a way to show a graph that shows all the possible promotions for a given job. Now let's build a function for doing that really easily. 

# <font color="#5831bc" face="Comic Sans MS">Build function for easy drawing of promotion graphs</font>
We previously saw our techniques are working. Now let's make functions to put it all together. We will create a function, that will draw a promotional graph by giving a job title. If the job doesn't have any promotion then it will just plot the job title. 

In [None]:
#########################################
# function for ploting promotions graph very easily
#########################################

def plot_promotion_graph(job_title):
    network_tuple = build_graph(job_title)
    # build a networkx graph
    G = nx.DiGraph() 
    # add main node our given job title
    G.add_node(job_title) 
    # add edges of our graph 
    if network_tuple is not None:
        G.add_edges_from(network_tuple)


    # now plot that graph using networkx build in function
    plt.figure(figsize=(20, 20))
    pos =graphviz_layout(G, prog='dot')
    nx.draw(G.reverse(), pos,with_labels=False, arrows=True, arrowstyle='simple', arrowsize=18)
    text = nx.draw_networkx_labels(G,pos)
    for _,t in text.items():
        t.set_rotation(45)
        t.set_fontsize(20)
    plt.show()

In [None]:
plot_promotion_graph('SENIOR MANAGEMENT ANALYST')

In [None]:
plot_promotion_graph("DETENTION OFFICER")

In [None]:
plot_promotion_graph("ANIMAL KEEPER")

Alright! Now we see that our promotion graph is working perfectly. This is a very helpful feature. An employee can easily see what promotions are available for his/her job position. 

    This will increase the quality and diversity among the applicants. Because they will be motivated by seeing all possible promotions/position they can have if they apply for the job. 

# <font color="#5831bc" face="Comic Sans MS">Build function for getting info for the promotion</font>
We already see our network graph. Now we want to go even further. We want to make a function that takes job title and output a person need to know to get a promotion to that job. This will be helpful, in previous graph employee can see which job avialable for promotions. Now we want to show them what they need. 

In [None]:
def show_promotion_details(job_title):
    temp = df_jobs_struct.loc[
        df_jobs_struct["JOB_CLASS_TITLE"] == job_title.upper()]
    return temp[['EXP_JOB_CLASS_TITLE', 'EXP_JOB_CLASS_FUNCTION',
                 'EXPERIENCE_LENGTH', 'EDUCATION_MAJOR', 'EXP_JOB_COMPANY']]

In [None]:
show_promotion_details('SENIOR MANAGEMENT ANALYST')

Yes! this is a very simple function but it is very useful. Suppose, in the first promotion graph of MANAGEMENT ANALYST, an employee of this position could see from that graph he/she can get promotion to SENIOR MANAGEMENT ANALYST and there he has more opportunity to grow. So, with this function, we print out that if he worked 2 years in management analyst in any company, he could get a promotion in SENIOR MANAGEMENT ANALYST. 

# <font color="#5831bc" face="Comic Sans MS">Let's get even deeper</font>
Now, let's create an interactive graph like previous graph, but this time when you hover over an job node it will show necessary information to get there. Let's see! 

In [None]:
from pyvis.network import Network

In [None]:
def plot_interactive_graph(job_title):
    network_tuple = build_graph(job_title)
    # build a networkx graph
    G = nx.DiGraph() 
    # add main node our given job title
    G.add_node(job_title) 
    # add edges of our graph 
    if network_tuple is not None:
        G.add_edges_from(network_tuple)

    nt = Network(height="800px",
                     width="750px",
                     directed=True,
                     notebook=True,
                     bgcolor="#ffffff",
                     font_color=False,
                     layout=True)
    nt.from_nx(G)

    neighbor_map = nt.get_adj_list()
    # add neighbor data to node hover data
    for node in nt.nodes:
        node["title"] = "Experience Job Title: "+ df_jobs_struct.loc[
            df_jobs_struct['JOB_CLASS_TITLE'].str.upper() == node['title'], 'EXP_JOB_CLASS_TITLE'].str.upper(
        ).str.cat(sep=', ') + "; Experience Years: " + df_jobs_struct.loc[
            df_jobs_struct['JOB_CLASS_TITLE'].str.upper() == node['title'], 'EXPERIENCE_LENGTH'].str.upper(
        ).str.cat(sep=', ')
        node["value"] = len(neighbor_map[node["id"]])
    return nt
    

In [None]:
network = plot_interactive_graph('MANAGEMENT ANALYST')
network.show("mygraph.html")

In [None]:
network = plot_interactive_graph("ANIMAL KEEPER")
network.show("mygraph2.html")

That's really awesome! This interactive graph is really useful. We can zoom and select any job. If we select any node it will show what experience job and year an employee need to get there. We can get the necessary information for each job by hovering over the jobs. In the interactive plot, **the bigger the node the higher position it is**. We hope this will be really helpful for the City of LA.

<div class="alert alert-block alert-info">
<p/>
<b>RECOMMENDATIONS FOR EXPLICIT PROMOTIONS:</b><br/>
    • We saw explicit promotion in this notebook. We used lot of visualization methods to describe our graph. The City of LA can easily take this idea make something more useful. All the code is moduler so that they can use this codes more easily. <br/>
    <br/>
<b>RECOMMENDATIONS FOR IMPLICIT PROMOTIONS:</b><br/>
    • We saw explicit promotion in this notebook. But the City of LA also can easily implement implicit promotion graph. There is a column in structured version of job bulletins ```EXP_JOB_CLASS_FUNCTION```. Using that column value we can easily match with job duties to a given job title and from that we can create promotion graph.
<p/>
</div>

# <font color="#5831bc" face="Comic Sans MS">Conclusion</font>
We finally came to the end of my solutions. I thank you very much for reading my solutions to this competition. Hope you enjoyed it. I also learned a lot while working on this competition. It has a unique problem that taught different knowledge. 

At last, I want to describe the features of this solution in the lights of grading rubric provided by Kaggle. 

<div class="alert alert-block alert-info">
<p/>
<b>Accuracy:</b><br/>
    • Successfully converted the folder of job postings into a structured CSV. The quality of the CSV is better than any other CSV shared in the forums. Because in forums, all the people used regex by giving a certain pattern. But there were hundreds of patterns to give. So sometimes it will not work. For randomized and new jobs, using regex will fail at some point, my solutions use NER that can be used for new jobs. <br/>
<p/>
<b>Documentation:</b><br/>
    • Provided comment for most of the code so that it can be understood what it's doing. <br/>
    • Followed industry best coding practices. I build necessary functions and methods for reuse and readability, that can be used in the production of the City of LA. <br/>
    • Also, provided extensive documentation of my methodology. I describe every techniques/library I used in my solutions. <br/>
<p/>
<b>Recommendation:</b><br/>
    • Used the structured CSV file for doing all my analysis. <br/>
    • Have made extensive EDA on structured CSV and found very insightful stories. I documented those in my solutions and provided recommendations with each of my findings. Also, all the recommendations, are actionable by the City of LA. For example. I found that masculine-coded jobs are higher and find popular masculine coded words, that the City of LA easily can improve on.<br/>
    • Use of machine learning model for identifying biased language also an interesting and useful technique that I shared with my solutions. That easily can use for a more complex identification of biased language. <br/>
</div>

This is end of my solutions. Hope you enjoyed it! Feel free to comment, upvote! Thank you very much! 