###### Name: Deepak Vadithala
###### Course: MSc Data Science
###### Project Name: MOOC Recommender System

##### Notes:
This notebook contains the analysis of the **Cosine Similarity** model. 
Mutiple variables **(Role and Skill Scores)** are used to predict the course category. **Role and skill** are calculated seperately and both of them are combined using weighting method.
Each skill has a weighting within the respective role and the weighting is applied after calculating the **Cosine Similarity** score.Skill Score is calculated using the similarity between the skills from LinkedIn compared with the course description from Coursera.


*Model Source Code Path: /mooc-recommender/Model/Cosine_Distance.py*

*Github Repo: https://github.com/iamdv/mooc-recommender*

In [1]:
# **************************** IMPORTANT ****************************
'''
This cell configuration settings for the Notebook. 
You can run one role at a time to evaluate the performance of the model
Change the variable names to run for multiple roles

In this model:
1. cosine distance is calculated between the skills and the course description 
with the weight of 50%. And each skill has a weighted score based on the 
popularity of the skill. This is derived by endorsements of the respective
skill by other linkedin connections.

2. cosine distance is calcuated between the role and the course name with 
with the weight of 50%.
'''

# *******************************************************************
# For each role a list of category names are grouped. 
# Please don't change these variables

label_DataScientist = ['Data Science','Data Analysis','Data Mining','Data Visualization']

label_SoftwareDevelopment = ['Software Development','Computer Science',
                           'Programming Languages', 'Algorithms and Data Structures', 
                           'Information Technology']


label_DatabaseAdministrator = ['Databases']

label_Cybersecurity = ['Cybersecurity']

label_FinancialAccountant = ['Finance', 'Accounting']

label_MachineLearning = ['Machine Learning', 'Deep Learning']

label_Musician = ['Music']

label_Dietitian = ['Nutrition & Wellness']


            
# *******************************************************************


# *******************************************************************
# Environment and Config Variables. Change these variables as per the requirement.

my_fpath_courses = "../Data/main_coursera.csv"

my_fpath_skills_DataScientist = "../Data/Cosine-Distance/Single-Variable/CosDist_DataScientist.csv"

my_fpath_skills_SoftwareDevelopment = "../Data/Cosine-Distance/Single-Variable/CosDist_SoftwareDevelopment.csv" 

my_fpath_skills_DatabaseAdministrator = "../Data/Cosine-Distance/Single-Variable/CosDist_DatabaseAdministrator.csv"

my_fpath_skills_Cybersecurity = "../Data/Cosine-Distance/Single-Variable/CosDist_Cybersecurity.csv"

my_fpath_skills_FinancialAccountant = "../Data/Cosine-Distance/Single-Variable/CosDist_FinancialAccountant.csv"

my_fpath_skills_MachineLearning = "../Data/Cosine-Distance/Single-Variable/CosDist_MachineLearning.csv"

my_fpath_skills_Musician = "../Data/Cosine-Distance/Single-Variable/CosDist_Musician.csv"

my_fpath_skills_Dietitian = "../Data/Cosine-Distance/Single-Variable/CosDist_Dietitian.csv"


# *******************************************************************


# *******************************************************************
# Weighting Variables. Change them as per the requirement.

my_role_weight = 0.5

my_skill_weight = 0.5

# *******************************************************************


In [2]:
# Importing required modules/packages

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk, string


In [3]:
# Downloading the stopwords like i, me, and, is, the etc.

nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /Users/DV/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [4]:
# Loading courses and skills data from the CSV files

df_courses = pd.read_csv(my_fpath_courses)

df_DataScientist = pd.read_csv(my_fpath_skills_DataScientist)
df_DataScientist = df_DataScientist.drop('Role', 1)
df_DataScientist.columns = ['Course Id', 'DataScientist_Skill_Score', 'DataScientist_Role_Score', 'DataScientist_Keyword_Score']

df_SoftwareDevelopment = pd.read_csv(my_fpath_skills_SoftwareDevelopment)
df_SoftwareDevelopment = df_SoftwareDevelopment.drop('Role', 1)
df_SoftwareDevelopment.columns = ['Course Id','SoftwareDevelopment_Skill_Score', 'SoftwareDevelopment_Role_Score', 'SoftwareDevelopment_Keyword_Score']

df_DatabaseAdministrator = pd.read_csv(my_fpath_skills_DatabaseAdministrator)
df_DatabaseAdministrator = df_DatabaseAdministrator.drop('Role', 1)
df_DatabaseAdministrator.columns = ['Course Id','DatabaseAdministrator_Skill_Score', 'DatabaseAdministrator_Role_Score', 'DatabaseAdministrator_Keyword_Score']

df_Cybersecurity = pd.read_csv(my_fpath_skills_Cybersecurity)
df_Cybersecurity = df_Cybersecurity.drop('Role', 1)
df_Cybersecurity.columns = ['Course Id','Cybersecurity_Skill_Score', 'Cybersecurity_Role_Score', 'Cybersecurity_Keyword_Score']

df_FinancialAccountant = pd.read_csv(my_fpath_skills_FinancialAccountant)
df_FinancialAccountant = df_FinancialAccountant.drop('Role', 1)
df_FinancialAccountant.columns = ['Course Id','FinancialAccountant_Skill_Score', 'FinancialAccountant_Role_Score', 'FinancialAccountant_Keyword_Score']

df_MachineLearning = pd.read_csv(my_fpath_skills_MachineLearning)
df_MachineLearning = df_MachineLearning.drop('Role', 1)
df_MachineLearning.columns = ['Course Id','MachineLearning_Skill_Score', 'MachineLearning_Role_Score', 'MachineLearning_Keyword_Score']

df_Musician = pd.read_csv(my_fpath_skills_Musician)
df_Musician = df_Musician.drop('Role', 1)
df_Musician.columns = ['Course Id','Musician_Skill_Score', 'Musician_Role_Score', 'Musician_Keyword_Score']

df_Dietitian = pd.read_csv(my_fpath_skills_Dietitian)
df_Dietitian = df_Dietitian.drop('Role', 1)
df_Dietitian.columns = ['Course Id','Dietitian_Skill_Score', 'Dietitian_Role_Score','Dietitian_Keyword_Score']


In [5]:
# Merging the csv files

df_cosdist = df_DataScientist.merge(df_SoftwareDevelopment, on = 'Course Id', how = 'outer')

df_cosdist = df_cosdist.merge(df_DatabaseAdministrator, on = 'Course Id', how = 'outer')

df_cosdist = df_cosdist.merge(df_Cybersecurity, on = 'Course Id', how = 'outer')

df_cosdist = df_cosdist.merge(df_FinancialAccountant, on = 'Course Id', how = 'outer')

df_cosdist = df_cosdist.merge(df_MachineLearning, on = 'Course Id', how = 'outer')

df_cosdist = df_cosdist.merge(df_Musician, on = 'Course Id', how = 'outer')

df_cosdist = df_cosdist.merge(df_Dietitian, on = 'Course Id', how = 'outer')



In [6]:
# Exploring data dimensionality, feature names, and feature types.

print(df_courses.shape,"\n")

print(df_cosdist.shape,"\n")

print(df_courses.columns, "\n")

print(df_cosdist.shape,"\n")

print(df_courses.describe(), "\n")

print(df_cosdist.describe(), "\n")


(2213, 19) 

(2213, 25) 

Index(['Unnamed: 0', 'Course Id', 'Course Name', 'Course Description', 'Slug',
       'Provider', 'Universities/Institutions', 'Parent Subject',
       'Child Subject', 'Category', 'Url', 'Length', 'Language',
       'Credential Name', 'Rating', 'Number of Ratings', 'Certificate',
       'Workload', 'Course Keywords'],
      dtype='object') 

(2213, 25) 

        Unnamed: 0    Course Id      Length       Rating  Number of Ratings  \
count  2213.000000  2213.000000  964.000000  2213.000000        2213.000000   
mean   1106.000000  4816.998192    6.063278     2.352785          10.321735   
std     638.982394  3033.878865    2.724669     2.129134         110.680382   
min       0.000000   303.000000    1.000000     0.000000           0.000000   
25%     553.000000  1829.000000    4.000000     0.000000           0.000000   
50%    1106.000000  4880.000000    6.000000     3.000000           1.000000   
75%    1659.000000  7329.000000    7.000000     4.428571       

In [7]:
# Quick check to see if the dataframe showing the right results

df_cosdist.head(20)

Unnamed: 0,Course Id,DataScientist_Skill_Score,DataScientist_Role_Score,DataScientist_Keyword_Score,SoftwareDevelopment_Skill_Score,SoftwareDevelopment_Role_Score,SoftwareDevelopment_Keyword_Score,DatabaseAdministrator_Skill_Score,DatabaseAdministrator_Role_Score,DatabaseAdministrator_Keyword_Score,...,FinancialAccountant_Keyword_Score,MachineLearning_Skill_Score,MachineLearning_Role_Score,MachineLearning_Keyword_Score,Musician_Skill_Score,Musician_Role_Score,Musician_Keyword_Score,Dietitian_Skill_Score,Dietitian_Role_Score,Dietitian_Keyword_Score
0,303,0.307459,0.744684,0.54446,0.237837,0.603931,0.518541,0.488561,0.846402,0.67622,...,0.559414,0.37623,0.274192,0.645469,0.45683,0.272538,0.549407,0.516312,0.798153,0.525401
1,305,0.232071,0.535167,0.460375,0.255782,0.693831,0.613591,0.417917,0.537684,0.472373,...,0.60453,0.339911,0.294941,0.62591,0.332566,0.178531,0.467199,0.42734,0.285521,0.538356
2,306,0.323008,0.449385,0.219802,0.227007,0.576043,0.074077,0.391093,0.519654,0.075394,...,0.403093,0.437589,0.56671,0.26186,0.43187,0.445152,0.354556,0.54786,0.716982,0.302728
3,307,0.309164,0.41626,0.648603,0.228633,0.492073,0.519089,0.417219,0.412655,0.507598,...,0.72508,0.417827,0.651783,0.69776,0.436257,0.382234,0.527155,0.518842,0.484317,0.660312
4,308,0.300722,0.284999,0.602987,0.239965,0.59345,0.53051,0.413887,0.426953,0.509686,...,0.758461,0.414348,0.420242,0.713478,0.437167,0.070305,0.72038,0.503216,0.349389,0.780097
5,309,0.307471,0.52217,0.757452,0.248328,0.72414,0.667862,0.446119,0.680659,0.68943,...,0.874166,0.418602,0.545971,0.839266,0.435669,0.231572,0.770149,0.50905,0.49418,0.827834
6,316,0.334175,0.310164,0.355386,0.248898,0.610637,0.343334,0.448754,0.550098,0.318062,...,0.524332,0.431564,0.4208,0.538153,0.470734,0.153079,0.457915,0.542808,0.65811,0.550351
7,317,0.272255,0.375398,0.364093,0.236654,0.532408,0.405278,0.426597,0.434652,0.389137,...,0.599828,0.375735,0.655808,0.416526,0.375193,0.223273,0.496248,0.467149,0.56135,0.418296
8,318,0.281119,0.473992,0.571874,0.23311,0.530605,0.539747,0.431926,0.496032,0.509738,...,0.537845,0.361833,0.289234,0.566323,0.460934,0.380536,0.595765,0.495765,0.431085,0.626008
9,322,0.310068,0.642344,0.75333,0.241869,0.780723,0.766033,0.437863,0.679127,0.710625,...,0.870527,0.419656,0.570464,0.857066,0.456258,0.45393,0.841729,0.540446,0.69061,0.875866


In [8]:
# Joining two dataframes - Courses and the Cosein Similarity Results based on the 'Course Id' variable. 
# Inner joins: Joins two tables with the common rows. This is a set operateion.

df_courses_score = df_courses.merge(df_cosdist, on ='Course Id', how='inner')

print(df_courses_score.shape,"\n")

(2213, 43) 



In [9]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: DATA SCIENTIST

y_actu_DataScientist         = ''
y_pred_DataScientist         = ''

df_courses_score['DataScientist_Final_Score'] = (df_courses_score['DataScientist_Role_Score'] * my_role_weight) + (df_courses_score['DataScientist_Skill_Score'] * my_skill_weight)

df_courses_score['DataScientist_Predict'] = (df_courses_score['DataScientist_Final_Score'] >= 0.5)

df_courses_score['DataScientist_Label'] = df_courses_score.Category.isin(label_DataScientist)

y_pred_DataScientist = pd.Series(df_courses_score['DataScientist_Predict'], name='Predicted')

y_actu_DataScientist = pd.Series(df_courses_score['DataScientist_Label'], name='Actual')

df_confusion_DataScientist = pd.crosstab(y_actu_DataScientist, y_pred_DataScientist , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [10]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: SOFTWARE ENGINEER/DEVELOPER

y_actu_SoftwareDevelopment         = ''
y_pred_SoftwareDevelopment         = ''

df_courses_score['SoftwareDevelopment_Final_Score'] = (df_courses_score['SoftwareDevelopment_Role_Score'] * my_role_weight) + (df_courses_score['SoftwareDevelopment_Skill_Score'] * my_skill_weight)

df_courses_score['SoftwareDevelopment_Predict'] = (df_courses_score['SoftwareDevelopment_Final_Score'] >= 0.5)

df_courses_score['SoftwareDevelopment_Label'] = df_courses_score.Category.isin(label_SoftwareDevelopment)

y_pred_SoftwareDevelopment = pd.Series(df_courses_score['SoftwareDevelopment_Predict'], name='Predicted')

y_actu_SoftwareDevelopment = pd.Series(df_courses_score['SoftwareDevelopment_Label'], name='Actual')

df_confusion_SoftwareDevelopment = pd.crosstab(y_actu_SoftwareDevelopment, y_pred_SoftwareDevelopment , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [11]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: DATABASE DEVELOPER/ADMINISTRATOR

y_actu_DatabaseAdministrator         = ''
y_pred_DatabaseAdministrator         = ''

df_courses_score['DatabaseAdministrator_Final_Score'] = (df_courses_score['DatabaseAdministrator_Role_Score'] * my_role_weight) + (df_courses_score['DatabaseAdministrator_Skill_Score'] * my_skill_weight)

df_courses_score['DatabaseAdministrator_Predict'] = (df_courses_score['DatabaseAdministrator_Final_Score'] >= 0.5)

df_courses_score['DatabaseAdministrator_Label'] = df_courses_score.Category.isin(label_DatabaseAdministrator)

y_pred_DatabaseAdministrator = pd.Series(df_courses_score['DatabaseAdministrator_Predict'], name='Predicted')

y_actu_DatabaseAdministrator = pd.Series(df_courses_score['DatabaseAdministrator_Label'], name='Actual')

df_confusion_DatabaseAdministrator = pd.crosstab(y_actu_DatabaseAdministrator, y_pred_DatabaseAdministrator , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [12]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: CYBERSECURITY CONSULTANT

y_actu_Cybersecurity         = ''
y_pred_Cybersecurity         = ''

df_courses_score['Cybersecurity_Final_Score'] = (df_courses_score['Cybersecurity_Role_Score'] * my_role_weight) + (df_courses_score['Cybersecurity_Skill_Score'] * my_skill_weight)

df_courses_score['Cybersecurity_Predict'] = (df_courses_score['Cybersecurity_Final_Score'] >= 0.5)

df_courses_score['Cybersecurity_Label'] = df_courses_score.Category.isin(label_Cybersecurity)

y_pred_Cybersecurity = pd.Series(df_courses_score['Cybersecurity_Predict'], name='Predicted')

y_actu_Cybersecurity = pd.Series(df_courses_score['Cybersecurity_Label'], name='Actual')

df_confusion_Cybersecurity = pd.crosstab(y_actu_Cybersecurity, y_pred_Cybersecurity , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [13]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: FINANCIAL ACCOUNTANT

y_actu_FinancialAccountant         = ''
y_pred_FinancialAccountant         = ''

df_courses_score['FinancialAccountant_Final_Score'] = (df_courses_score['FinancialAccountant_Role_Score'] * my_role_weight) + (df_courses_score['FinancialAccountant_Skill_Score'] * my_skill_weight)

df_courses_score['FinancialAccountant_Predict'] = (df_courses_score['FinancialAccountant_Final_Score'] >= 0.5)

df_courses_score['FinancialAccountant_Label'] = df_courses_score.Category.isin(label_FinancialAccountant)

y_pred_FinancialAccountant = pd.Series(df_courses_score['FinancialAccountant_Predict'], name='Predicted')

y_actu_FinancialAccountant = pd.Series(df_courses_score['FinancialAccountant_Label'], name='Actual')

df_confusion_FinancialAccountant = pd.crosstab(y_actu_FinancialAccountant, y_pred_FinancialAccountant , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [14]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: MACHINE LEARNING ENGINEER

y_actu_MachineLearning         = ''
y_pred_MachineLearning         = ''

df_courses_score['MachineLearning_Final_Score'] = (df_courses_score['MachineLearning_Role_Score'] * my_role_weight) + (df_courses_score['MachineLearning_Skill_Score'] * my_skill_weight)

df_courses_score['MachineLearning_Predict'] = (df_courses_score['MachineLearning_Final_Score'] >= 0.5)

df_courses_score['MachineLearning_Label'] = df_courses_score.Category.isin(label_MachineLearning)

y_pred_MachineLearning = pd.Series(df_courses_score['MachineLearning_Predict'], name='Predicted')

y_actu_MachineLearning = pd.Series(df_courses_score['MachineLearning_Label'], name='Actual')

df_confusion_MachineLearning = pd.crosstab(y_actu_MachineLearning, y_pred_MachineLearning , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [15]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: MUSICIAN

y_actu_Musician         = ''
y_pred_Musician         = ''

df_courses_score['Musician_Final_Score'] = (df_courses_score['Musician_Role_Score'] * my_role_weight) + (df_courses_score['Musician_Skill_Score'] * my_skill_weight)

df_courses_score['Musician_Predict'] = (df_courses_score['Musician_Final_Score'] >= 0.5)

df_courses_score['Musician_Label'] = df_courses_score.Category.isin(label_Musician)

y_pred_Musician = pd.Series(df_courses_score['Musician_Predict'], name='Predicted')

y_actu_Musician = pd.Series(df_courses_score['Musician_Label'], name='Actual')

df_confusion_Musician = pd.crosstab(y_actu_Musician, y_pred_Musician , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [16]:
# Tranforming and shaping the data to create the confusion matrix for the ROLE: NUTRITIONIST/DIETITIAN

y_actu_Dietitian         = ''
y_pred_Dietitian         = ''

df_courses_score['Dietitian_Final_Score'] = (df_courses_score['Dietitian_Role_Score'] * my_role_weight) + (df_courses_score['Dietitian_Skill_Score'] * my_skill_weight)

df_courses_score['Dietitian_Predict'] = (df_courses_score['Dietitian_Final_Score'] >= 0.5)

df_courses_score['Dietitian_Label'] = df_courses_score.Category.isin(label_Dietitian)

y_pred_Dietitian = pd.Series(df_courses_score['Dietitian_Predict'], name='Predicted')

y_actu_Dietitian = pd.Series(df_courses_score['Dietitian_Label'], name='Actual')

df_confusion_Dietitian = pd.crosstab(y_actu_Dietitian, y_pred_Dietitian , rownames=['Actual'], colnames=['Predicted'], margins=False)


In [17]:
df_confusion_DataScientist


Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,1996,135
True,59,23


In [18]:
df_confusion_SoftwareDevelopment

Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,1919,173
True,98,23


In [19]:
df_confusion_DatabaseAdministrator

Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,1110,1092
True,3,8


In [20]:
df_confusion_Cybersecurity

Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,1861,322
True,8,22


In [21]:
df_confusion_FinancialAccountant

Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,1042,1068
True,20,83


In [22]:
df_confusion_MachineLearning

Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,1540,649
True,2,22


In [23]:
df_confusion_Musician

Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,2159,17
True,35,2


In [24]:
df_confusion_Dietitian

Predicted,False,True
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,748,1439
True,4,22


In [25]:
# Performance summary for the ROLE: DATA SCIENTIST


try:
    tn_DataScientist = df_confusion_DataScientist.iloc[0][False]
except:
    tn_DataScientist = 0
    
try:
    tp_DataScientist =  df_confusion_DataScientist.iloc[1][True]
except:
    tp_DataScientist = 0

    
try:
    fn_DataScientist = df_confusion_DataScientist.iloc[1][False]
except:
    fn_DataScientist = 0
    
try:
    fp_DataScientist =  df_confusion_DataScientist.iloc[0][True]
except:
    fp_DataScientist = 0  
    
    
total_count_DataScientist = tn_DataScientist + tp_DataScientist + fn_DataScientist + fp_DataScientist

print('Data Scientist Accuracy Rate : ', '{0:.2f}'.format((tn_DataScientist + tp_DataScientist) / total_count_DataScientist * 100))

print('Data Scientist Misclassifcation Rate : ',  '{0:.2f}'.format((fn_DataScientist + fp_DataScientist) / total_count_DataScientist * 100))

print('Data Scientist True Positive Rate : ',  '{0:.2f}'.format(tp_DataScientist / (tp_DataScientist + fn_DataScientist) * 100))

print('Data Scientist False Positive Rate : ',  '{0:.2f}'.format(fp_DataScientist / (tn_DataScientist + fp_DataScientist) * 100))


Data Scientist Accuracy Rate :  91.23
Data Scientist Misclassifcation Rate :  8.77
Data Scientist True Positive Rate :  28.05
Data Scientist False Positive Rate :  6.34


In [26]:
# Performance summary for the ROLE: DATA SCIENTIST


try:
    tn_SoftwareDevelopment = df_confusion_SoftwareDevelopment.iloc[0][False]
except:
    tn_SoftwareDevelopment = 0
    
try:
    tp_SoftwareDevelopment =  df_confusion_SoftwareDevelopment.iloc[1][True]
except:
    tp_SoftwareDevelopment = 0

    
try:
    fn_SoftwareDevelopment = df_confusion_SoftwareDevelopment.iloc[1][False]
except:
    fn_SoftwareDevelopment = 0
    
try:
    fp_SoftwareDevelopment =  df_confusion_SoftwareDevelopment.iloc[0][True]
except:
    fp_SoftwareDevelopment = 0  
    
    
total_count_SoftwareDevelopment = tn_SoftwareDevelopment + tp_SoftwareDevelopment + fn_SoftwareDevelopment + fp_SoftwareDevelopment

print('Software Engineer Accuracy Rate : ', '{0:.2f}'.format((tn_SoftwareDevelopment + tp_SoftwareDevelopment) / total_count_SoftwareDevelopment * 100))

print('Software Engineer Misclassifcation Rate : ',  '{0:.2f}'.format((fn_SoftwareDevelopment + fp_SoftwareDevelopment) / total_count_SoftwareDevelopment * 100))

print('Software Engineer True Positive Rate : ',  '{0:.2f}'.format(tp_SoftwareDevelopment / (tp_SoftwareDevelopment + fn_SoftwareDevelopment) * 100))

print('Software Engineer False Positive Rate : ',  '{0:.2f}'.format(fp_SoftwareDevelopment / (tn_SoftwareDevelopment + fp_SoftwareDevelopment) * 100))


Software Engineer Accuracy Rate :  87.75
Software Engineer Misclassifcation Rate :  12.25
Software Engineer True Positive Rate :  19.01
Software Engineer False Positive Rate :  8.27


In [27]:
# Performance summary for the ROLE: DATABASE DEVELOPER/ ADMINISTRATOR


try:
    tn_DatabaseAdministrator = df_confusion_DatabaseAdministrator.iloc[0][False]
except:
    tn_DatabaseAdministrator = 0
    
try:
    tp_DatabaseAdministrator =  df_confusion_DatabaseAdministrator.iloc[1][True]
except:
    tp_DatabaseAdministrator = 0

    
try:
    fn_DatabaseAdministrator = df_confusion_DatabaseAdministrator.iloc[1][False]
except:
    fn_DatabaseAdministrator = 0
    
try:
    fp_DatabaseAdministrator =  df_confusion_DatabaseAdministrator.iloc[0][True]
except:
    fp_DatabaseAdministrator = 0  
    
    
total_count_DatabaseAdministrator = tn_DatabaseAdministrator + tp_DatabaseAdministrator + fn_DatabaseAdministrator + fp_DatabaseAdministrator

print('Database Administrator Accuracy Rate : ', '{0:.2f}'.format((tn_DatabaseAdministrator + tp_DatabaseAdministrator) / total_count_DatabaseAdministrator * 100))

print('Database Administrator Misclassifcation Rate : ',  '{0:.2f}'.format((fn_DatabaseAdministrator + fp_DatabaseAdministrator) / total_count_DatabaseAdministrator * 100))

print('Database Administrator True Positive Rate : ',  '{0:.2f}'.format(tp_DatabaseAdministrator / (tp_DatabaseAdministrator + fn_DatabaseAdministrator) * 100))

print('Database Administrator False Positive Rate : ',  '{0:.2f}'.format(fp_DatabaseAdministrator / (tn_DatabaseAdministrator + fp_DatabaseAdministrator) * 100))


Database Administrator Accuracy Rate :  50.52
Database Administrator Misclassifcation Rate :  49.48
Database Administrator True Positive Rate :  72.73
Database Administrator False Positive Rate :  49.59


In [28]:
# Performance summary for the ROLE: CYBERSECURITY CONSULTANT


try:
    tn_Cybersecurity = df_confusion_Cybersecurity.iloc[0][False]
except:
    tn_Cybersecurity = 0
    
try:
    tp_Cybersecurity =  df_confusion_Cybersecurity.iloc[1][True]
except:
    tp_Cybersecurity = 0

    
try:
    fn_Cybersecurity = df_confusion_Cybersecurity.iloc[1][False]
except:
    fn_Cybersecurity = 0
    
try:
    fp_Cybersecurity =  df_confusion_Cybersecurity.iloc[0][True]
except:
    fp_Cybersecurity = 0  
    
    
total_count_Cybersecurity = tn_Cybersecurity + tp_Cybersecurity + fn_Cybersecurity + fp_Cybersecurity

print('Cybersecurity Consultant Accuracy Rate : ', '{0:.2f}'.format((tn_Cybersecurity + tp_Cybersecurity) / total_count_Cybersecurity * 100))

print('Cybersecurity Consultant Misclassifcation Rate : ',  '{0:.2f}'.format((fn_Cybersecurity + fp_Cybersecurity) / total_count_Cybersecurity * 100))

print('Cybersecurity Consultant True Positive Rate : ',  '{0:.2f}'.format(tp_Cybersecurity / (tp_Cybersecurity + fn_Cybersecurity) * 100))

print('Cybersecurity Consultant False Positive Rate : ',  '{0:.2f}'.format(fp_Cybersecurity / (tn_Cybersecurity + fp_Cybersecurity) * 100))


Cybersecurity Consultant Accuracy Rate :  85.09
Cybersecurity Consultant Misclassifcation Rate :  14.91
Cybersecurity Consultant True Positive Rate :  73.33
Cybersecurity Consultant False Positive Rate :  14.75


In [29]:
# Performance summary for the ROLE: FINANCIAL ACCOUNTANT


try:
    tn_FinancialAccountant = df_confusion_FinancialAccountant.iloc[0][False]
except:
    tn_FinancialAccountant = 0
    
try:
    tp_FinancialAccountant =  df_confusion_FinancialAccountant.iloc[1][True]
except:
    tp_FinancialAccountant = 0

    
try:
    fn_FinancialAccountant = df_confusion_FinancialAccountant.iloc[1][False]
except:
    fn_FinancialAccountant = 0
    
try:
    fp_FinancialAccountant =  df_confusion_FinancialAccountant.iloc[0][True]
except:
    fp_FinancialAccountant = 0  
    
    
total_count_FinancialAccountant = tn_FinancialAccountant + tp_FinancialAccountant + fn_FinancialAccountant + fp_FinancialAccountant

print('Financial Accountant Consultant Accuracy Rate : ', '{0:.2f}'.format((tn_FinancialAccountant + tp_FinancialAccountant) / total_count_FinancialAccountant * 100))

print('Financial Accountant Consultant Misclassifcation Rate : ',  '{0:.2f}'.format((fn_FinancialAccountant + fp_FinancialAccountant) / total_count_FinancialAccountant * 100))

print('Financial Accountant Consultant True Positive Rate : ',  '{0:.2f}'.format(tp_FinancialAccountant / (tp_FinancialAccountant + fn_FinancialAccountant) * 100))

print('Financial Accountant Consultant False Positive Rate : ',  '{0:.2f}'.format(fp_FinancialAccountant / (tn_FinancialAccountant + fp_FinancialAccountant) * 100))


Financial Accountant Consultant Accuracy Rate :  50.84
Financial Accountant Consultant Misclassifcation Rate :  49.16
Financial Accountant Consultant True Positive Rate :  80.58
Financial Accountant Consultant False Positive Rate :  50.62


In [30]:
# Performance summary for the ROLE: MACHINE LEARNING ENGINEER


try:
    tn_MachineLearning = df_confusion_MachineLearning.iloc[0][False]
except:
    tn_MachineLearning = 0
    
try:
    tp_MachineLearning =  df_confusion_MachineLearning.iloc[1][True]
except:
    tp_MachineLearning = 0

    
try:
    fn_MachineLearning = df_confusion_MachineLearning.iloc[1][False]
except:
    fn_MachineLearning = 0
    
try:
    fp_MachineLearning =  df_confusion_MachineLearning.iloc[0][True]
except:
    fp_MachineLearning = 0  
    
    
total_count_MachineLearning = tn_MachineLearning + tp_MachineLearning + fn_MachineLearning + fp_MachineLearning

print('Machine Learning Engineer Accuracy Rate : ', '{0:.2f}'.format((tn_MachineLearning + tp_MachineLearning) / total_count_MachineLearning * 100))

print('Machine Learning Engineer Misclassifcation Rate : ',  '{0:.2f}'.format((fn_MachineLearning + fp_MachineLearning) / total_count_MachineLearning * 100))

print('Machine Learning Engineer True Positive Rate : ',  '{0:.2f}'.format(tp_MachineLearning / (tp_MachineLearning + fn_MachineLearning) * 100))

print('Machine Learning Engineer False Positive Rate : ',  '{0:.2f}'.format(fp_MachineLearning / (tn_MachineLearning + fp_MachineLearning) * 100))


Machine Learning Engineer Accuracy Rate :  70.58
Machine Learning Engineer Misclassifcation Rate :  29.42
Machine Learning Engineer True Positive Rate :  91.67
Machine Learning Engineer False Positive Rate :  29.65


In [31]:
# Performance summary for the ROLE: MUSICIAN


try:
    tn_Musician = df_confusion_Musician.iloc[0][False]
except:
    tn_Musician = 0
    
try:
    tp_Musician =  df_confusion_Musician.iloc[1][True]
except:
    tp_Musician = 0

    
try:
    fn_Musician = df_confusion_Musician.iloc[1][False]
except:
    fn_Musician = 0
    
try:
    fp_Musician =  df_confusion_Musician.iloc[0][True]
except:
    fp_Musician = 0  
    
    
total_count_Musician = tn_Musician + tp_Musician + fn_Musician + fp_Musician

print('Musician Accuracy Rate : ', '{0:.2f}'.format((tn_Musician + tp_Musician) / total_count_Musician * 100))

print('Musician Misclassifcation Rate : ',  '{0:.2f}'.format((fn_Musician + fp_Musician) / total_count_Musician * 100))

print('Musician True Positive Rate : ',  '{0:.2f}'.format(tp_Musician / (tp_Musician + fn_Musician) * 100))

print('Musician False Positive Rate : ',  '{0:.2f}'.format(fp_Musician / (tn_Musician + fp_Musician) * 100))


Musician Accuracy Rate :  97.65
Musician Misclassifcation Rate :  2.35
Musician True Positive Rate :  5.41
Musician False Positive Rate :  0.78


In [32]:
# Performance summary for the ROLE: DIETITIAN


try:
    tn_Dietitian = df_confusion_Dietitian.iloc[0][False]
except:
    tn_Dietitian = 0
    
try:
    tp_Dietitian =  df_confusion_Dietitian.iloc[1][True]
except:
    tp_Dietitian = 0

    
try:
    fn_Dietitian = df_confusion_Dietitian.iloc[1][False]
except:
    fn_Dietitian = 0
    
try:
    fp_Dietitian =  df_confusion_Dietitian.iloc[0][True]
except:
    fp_Dietitian = 0  
    
    
total_count_Dietitian = tn_Dietitian + tp_Dietitian + fn_Dietitian + fp_Dietitian

print('Dietitian Accuracy Rate : ', '{0:.2f}'.format((tn_Dietitian + tp_Dietitian) / total_count_Dietitian * 100))

print('Dietitian Misclassifcation Rate : ',  '{0:.2f}'.format((fn_Dietitian + fp_Dietitian) / total_count_Dietitian * 100))

print('Dietitian True Positive Rate : ',  '{0:.2f}'.format(tp_Dietitian / (tp_Dietitian + fn_Dietitian) * 100))

print('Dietitian False Positive Rate : ',  '{0:.2f}'.format(fp_Dietitian / (tn_Dietitian + fp_Dietitian) * 100))


Dietitian Accuracy Rate :  34.79
Dietitian Misclassifcation Rate :  65.21
Dietitian True Positive Rate :  84.62
Dietitian False Positive Rate :  65.80


### End of the Notebook. Thank you!