# Insight Data Science Consulting Project: 80,000 hours - Chapter 3

Note: this is a part of a consulting project with [80,000 hours](https://80000hours.org/).

## Stage 1: Ask a question

My objective is to rank skills (and possibly knowledge, tools & tech) based on how valuable they are. The skills are listed by US Department of Labor [here](https://www.onetonline.org/find/descriptor/browse/Skills/2.B.1/).

There is no performance measure for this rank yet since it is subjective. Yet in the future, one can create a poll to rate pairwise. 

## Stage 2: Set the environment up and get data

First, set up a directory for data and link it to this workplace. Download data into your choice of directory.

In [206]:
#Set up the environment
import pandas as pd                        #Pandas
import numpy as np                         #Numpy
import pycurl                              #For saving file from url
import os                                  #For checking if a file exists
from pandas.parser import CParserError     #For checking if a file contains a set of values
import matplotlib.pyplot as plt            #For plotting
import matplotlib
%matplotlib inline

#Some machine learning tools
from sklearn.linear_model import LassoCV, LassoLarsCV, LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV

#For radar graph plot
import numpy as np
import matplotlib
import matplotlib.path as path
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# Set up data directory
DataDir = "C:/Users/Admin/Desktop/Insight/data/"
OutputDir = "C:/Users/Admin/Desktop/Insight/output/"


## Stage 3+4: Feature exploration and scores

See previous chapter.

## Stage 5: Combine results

In [207]:
interest = 'Skill'
#interest = 'Knowledge'

In [208]:
filename = "ScoreAllMeasures_" + interest + ".csv"
dSummary = pd.read_csv(DataDir+ filename)
dSummary = dSummary.drop('Unnamed: 0', 1)
dSummary = dSummary.set_index('index')

In [209]:
d4dimensions = pd.DataFrame(index = dSummary.index)

In [210]:
d4dimensions['Income'] = dSummary['Wage_BLS']
d4dimensions['Satisfaction'] = 0.5*(dSummary['JobSatisfaction_GSS'] + dSummary['JobSatisfaction_PayScale'])
d4dimensions['Security'] = 0.5*(dSummary['Transferability'] - dSummary['RiskOfAutomation'])
d4dimensions['Learnability'] = -dSummary['JobZone']
d4dimensions['score'] = (d4dimensions['Satisfaction'] + d4dimensions['Income'] + \
                         d4dimensions['Security'] + d4dimensions['Learnability'])/4
d4dimensions['color'] = (d4dimensions['score'] - np.min(d4dimensions['score']))/ \
                         (np.max(d4dimensions['score']) - np.min(d4dimensions['score']))

In [211]:
#Label for skill groups based on O*NET database
if interest == 'Skill':
    SkillGroup = pd.DataFrame(index = d4dimensions.index, columns = ['group'])
    SkillGroup.loc['Active Learning'] = 'Basic'
    SkillGroup.loc['Active Listening'] = 'Basic'
    SkillGroup.loc['Critical Thinking'] = 'Basic'
    SkillGroup.loc['Learning Strategies'] = 'Basic'
    SkillGroup.loc['Mathematics'] = 'Basic'
    SkillGroup.loc['Monitoring'] = 'Basic'
    SkillGroup.loc['Reading Comprehension'] = 'Basic'
    SkillGroup.loc['Science'] = 'Basic'
    SkillGroup.loc['Speaking'] = 'Basic'
    SkillGroup.loc['Writing'] = 'Basic'
    SkillGroup.loc['Coordination'] = 'Social'
    SkillGroup.loc['Instructing'] = 'Social'
    SkillGroup.loc['Negotiation'] = 'Social'
    SkillGroup.loc['Persuasion'] = 'Social'
    SkillGroup.loc['Service Orientation'] = 'Social'
    SkillGroup.loc['Social Perceptiveness'] = 'Social'
    SkillGroup.loc['Complex Problem Solving'] = 'Complex Problem Solving'
    SkillGroup.loc['Equipment Maintenance'] = 'Technical'
    SkillGroup.loc['Equipment Selection'] = 'Technical'
    SkillGroup.loc['Installation'] = 'Technical'
    SkillGroup.loc['Operation and Control'] = 'Technical'
    SkillGroup.loc['Operation Monitoring'] = 'Technical'
    SkillGroup.loc['Operations Analysis'] = 'Technical'
    SkillGroup.loc['Programming'] = 'Technical'
    SkillGroup.loc['Quality Control Analysis'] = 'Technical'
    SkillGroup.loc['Repairing'] = 'Technical'
    SkillGroup.loc['Technology Design'] = 'Technical'
    SkillGroup.loc['Troubleshooting'] = 'Technical'
    SkillGroup.loc['Management of Financial Resources'] = 'Resource Management'
    SkillGroup.loc['Management of Material Resources'] = 'Resource Management'
    SkillGroup.loc['Management of Personnel Resources'] = 'Resource Management'
    SkillGroup.loc['Time Management'] = 'Resource Management'
    SkillGroup.loc['Judgment and Decision Making'] = 'System'
    SkillGroup.loc['Systems Analysis'] = 'System'
    SkillGroup.loc['Systems Evaluation'] = 'System'

    SkillGroup['Color'] = "0"
    for i in SkillGroup.index:
        if SkillGroup.group[i] == 'System':
            SkillGroup.loc[i,'Color'] = "#800080"#"purple",
        if SkillGroup.group[i] == 'Resource Management':
            SkillGroup.loc[i,'Color'] = "#0000FF"#"blue",
        if SkillGroup.group[i] == 'Technical':
            SkillGroup.loc[i,'Color'] = "#008000"#"green",
        if SkillGroup.group[i] == 'Complex Problem Solving':
            SkillGroup.loc[i,'Color'] = "#FFFF00"#"yellow",
        if SkillGroup.group[i] == 'Social':
            SkillGroup.loc[i,'Color'] = "#FFA500"#"orange",
        if SkillGroup.group[i] == 'Basic':
            SkillGroup.loc[i,'Color'] = "#FF0000"#"red",

set(SkillGroup['group'])

{'Basic',
 'Complex Problem Solving',
 'Resource Management',
 'Social',
 'System',
 'Technical'}

In [212]:
d4dimensions['group'] = SkillGroup['group']
d4dimensions['group_color'] = SkillGroup['Color']

In [213]:
pd.set_option('max_rows', 100)
d4dimensions

Unnamed: 0_level_0,Income,Satisfaction,Security,Learnability,score,color,group,group_color
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Active Learning,2.634182,0.504576,2.069776,-2.341797,0.716684,0.811107,Basic,#FF0000
Active Listening,2.341124,0.42888,2.202526,-2.322009,0.66263,0.762498,Basic,#FF0000
Complex Problem Solving,3.147225,0.470851,2.023038,-2.354152,0.821741,0.905581,Complex Problem Solving,#FFFF00
Coordination,2.074343,0.447771,2.38327,-1.62998,0.818851,0.902983,Social,#FFA500
Critical Thinking,3.247886,0.528635,2.325277,-2.594317,0.87687,0.955158,Basic,#FF0000
Equipment Maintenance,-0.503275,-0.063316,-0.843287,0.680028,-0.182463,0.002531,Technical,#008000
Equipment Selection,-0.473784,-0.064737,-0.767879,0.654828,-0.162893,0.020129,Technical,#008000
Installation,-0.348337,0.0,-0.848958,0.456187,-0.185277,0.0,Technical,#008000
Instructing,1.562355,0.37889,1.707986,-1.577251,0.517995,0.632432,Social,#FFA500
Judgment and Decision Making,3.373525,0.547922,2.346067,-2.560573,0.926736,1.0,System,#800080


In [214]:
from matplotlib.font_manager import FontProperties

fontP = FontProperties()
fontP.set_size('small')

#Set up font properties

In [215]:
d4dimensions.max(axis=0)

Income            3.37353
Satisfaction     0.547922
Security          2.38327
Learnability      0.79625
score            0.926736
color                   1
group           Technical
group_color       #FFFF00
dtype: object

In [216]:
d4dimensions.min(axis=0)

Income         -0.569528
Satisfaction   -0.081749
Security       -0.848958
Learnability    -2.59432
score          -0.185277
color                  0
group              Basic
group_color      #0000FF
dtype: object

Click to see more information about definitions of [Skills](https://www.onetonline.org/find/descriptor/browse/Skills/) and [Knowledge](https://www.onetonline.org/find/descriptor/browse/Knowledge/). Note that learnability gives an awkward interpretation.

In [217]:
#set color map
cmap = matplotlib.cm.get_cmap('RdYlGn')
#set directory
Dir = OutputDir + interest + '/'

In [218]:
def radarplot(df, lower, upper):
    df = df.sort_values(by='score')
    for count in range(0,len(df)):

        #Adapted from Copyright (C) 2011  Nicolas P. Rougier

        # Data to be represented
        # ----------
        properties = df.columns[0:4]
        values = df.loc[df.index[count],:][0:4]
        # ----------

        # Choose some nice colors
        matplotlib.rc('axes', facecolor = 'white')

        # Make figure background the same colors as axes 
        fig = plt.figure(figsize=(8,6), facecolor='white')


        # Use a polar axes
        axes = plt.subplot(111, polar=True)

        # Set ticks to the number of properties (in radians)
        #t = np.arange(0,2*np.pi,2*np.pi/len(properties))
        t = np.arange(np.pi/4,2*np.pi,2*np.pi/len(properties))
        plt.xticks(t, [])

        # Set yticks from 0 to 10
        #plt.yticks(np.linspace(0,10,11))
        #plt.yticks(np.linspace(0,4,9))
        plt.yticks(np.linspace(lower,upper,(upper-lower)+1))

        # Draw polygon representing values
        points = [(x,y) for x,y in zip(t,values)]
        points.append(points[0])
        points = np.array(points)
        codes = [path.Path.MOVETO,] + \
                [path.Path.LINETO,]*(len(values) -1) + \
                [ path.Path.CLOSEPOLY ]
        _path = path.Path(points, codes)
        _patch = patches.PathPatch(_path, fill=True, color=cmap(df.loc[df.index[count],'color']), linewidth=0, alpha=.7)
        axes.add_patch(_patch)
        _patch = patches.PathPatch(_path, fill=False, linewidth = 2)
        axes.add_patch(_patch)

        # Draw circles at value points
        plt.scatter(points[:,0],points[:,1], linewidth=2,
                    s=50, color='white', edgecolor='black', zorder=10)

        # Set axes limits
        #plt.ylim(0,10)
        #plt.ylim(0,4)
        plt.ylim(lower,upper)

        #add tile
        plt.title(df.index[count] + ", score = "+ str(round(df.loc[df.index[count],'score'],2)) ,size=20)

        # Draw ytick labels to make sure they fit properly
        for i in range(len(properties)):
            angle_rad = i/float(len(properties))*2*np.pi + np.pi/4
            angle_deg = i/float(len(properties))*360 + 45
            ha = "right"
            if angle_rad < np.pi/2 or angle_rad > 3*np.pi/2: ha = "left"
            #plt.text(angle_rad, 10.75, properties[i], size=14,
            #plt.text(angle_rad, 4.75, properties[i], size=14,
            plt.text(angle_rad, upper + 0.25, properties[i], size=14,
                     horizontalalignment=ha, verticalalignment="center")

            # A variant on label orientation
            #    plt.text(angle_rad, 11, properties[i], size=14,
            #             rotation=angle_deg-90,
            #             horizontalalignment='center', verticalalignment="center")

        # Done
        plt.savefig(Dir + str(count+1).zfill(2) +'-radar-chart.png', facecolor='white');
        #plt.show()
        plt.clf();

In [219]:
radarplot(d4dimensions, -4, 4); # for Skill
#radarplot(d4dimensions, -2, 2);

<matplotlib.figure.Figure at 0xe5cccf8>

<matplotlib.figure.Figure at 0xe18d860>

<matplotlib.figure.Figure at 0xed151d0>

<matplotlib.figure.Figure at 0x1150d7f0>

<matplotlib.figure.Figure at 0xed3f390>

<matplotlib.figure.Figure at 0xe376ef0>

<matplotlib.figure.Figure at 0x11504da0>

<matplotlib.figure.Figure at 0x11953320>

<matplotlib.figure.Figure at 0xe0177b8>

<matplotlib.figure.Figure at 0xe5dfc88>

<matplotlib.figure.Figure at 0xe5cc550>

<matplotlib.figure.Figure at 0xde58630>

<matplotlib.figure.Figure at 0xea4b978>

<matplotlib.figure.Figure at 0xe17dcf8>

<matplotlib.figure.Figure at 0xedb93c8>

<matplotlib.figure.Figure at 0xdfca7f0>

<matplotlib.figure.Figure at 0xf3a96a0>

<matplotlib.figure.Figure at 0xeb07198>

<matplotlib.figure.Figure at 0xea70208>

<matplotlib.figure.Figure at 0xea70668>

<matplotlib.figure.Figure at 0xee31c18>

<matplotlib.figure.Figure at 0x118fb978>

<matplotlib.figure.Figure at 0xf926eb8>

<matplotlib.figure.Figure at 0xe12ed30>

<matplotlib.figure.Figure at 0xe250668>

<matplotlib.figure.Figure at 0xeb3a400>

<matplotlib.figure.Figure at 0xc319f98>

<matplotlib.figure.Figure at 0xe3d7ef0>

<matplotlib.figure.Figure at 0xe3d7b00>

<matplotlib.figure.Figure at 0xdf8a4e0>

<matplotlib.figure.Figure at 0xf58e0b8>

<matplotlib.figure.Figure at 0xdd295c0>

<matplotlib.figure.Figure at 0xdd29470>

<matplotlib.figure.Figure at 0xe1b4b00>

<matplotlib.figure.Figure at 0xedb9860>

In [220]:
d = d4dimensions
d = d.reset_index()
#d = d.sort_values(by=interest)
d.to_csv(DataDir + 'score_' + interest + '.csv')