<h1><center>INSIDE OUT - People Analytics for Talent Acquisition </center></h1>

There is no AI without IA. 71% of companies see people analytics as a high priority, but organisations still struggle. [Deloitte](https://www2.deloitte.com/global/en/pages/human-capital/topics/human-capital-trends.html) reports that as little as 8% of organisations surveyed believe they have usable data. 

Inside Out powered by Watson Personality Insights will enable companies to capture the interactions with candidates and transform them into data points. 
These analytics metrics will be fed into algorithms that will identify higher-quality talents matching the specific company and team.

<img src="https://images.hrtechnologist.com/images/uploads/content_images/peopleanalyticshrnews_5915a0f280b22.jpg" alt="PI" style="width:1000px; height:500px">

<a id="top"></a>
<h3>Table of Contents</h3>

[1. Setup](#setup)<br>
[2. Acquiring the Data](#acquire)<br>
[3. Watson Personality Insights](#wpi)<br>
[4. Data Wrangling and Exploration](#eda)<br>
[5. IBM 1-3-9](#ibm139)<br>
[6. Team Match](#team)<br>
[7. Compare Candidates](#compare)<br>

**NOTE:** 
Reading transcripts is recommended. It helps to understand the results better.

<a id="setup"></a>
# 1. Setup

In [1]:
# The code was removed by Watson Studio for sharing.

In [2]:
#!pip install --upgrade "watson-developer-cloud>=2.5.1"
from watson_developer_cloud import PersonalityInsightsV3

import json
from os.path import join

import pandas as pd
from pandas.io.json import json_normalize
from functools import reduce

import numpy as np

[Back to Top](#top)

<a id="acquire"></a>
# 2.  Acquiring the Data

## 2.1. Data Files: PDF to TXT

In [3]:
#!pip install pdfminer.six
#import pdfminer
#pdf2txt.py ycombinator-sam-altman.pdf > ycombinator-sam-altman.txt

In [8]:
#!git clone https://github.com/ssuleyma/InsideOut.git
path ='./InsideOut/Transcripts/'
files = os.listdir(path=path)
print(files)

['taskrabbit-stacy-brownphilpot.txt', 'instagram-kevin-systrom.txt', 'huffington-arianna-huffington.txt', 'facebook-mark-zuckerberg.txt', 'spotify-daniel-ek.txt', 'twitter-ev-williams.txt', 'lumilabs-marissa-mayer.txt']


[Back to Top](#top)

<a id="wpi"></a>
# 3. Watson Personality Insights [DO NOT RUN]

<img src="https://www.easydna.ca/wp-content/uploads/2018/03/the_big_five1.png" alt="Big Five" style="width: 400px; height: 400px;"/>

In [9]:
pi_version = "2017-10-13"
pi_api_key = "your apikey"
pi_url = "https://gateway.watsonplatform.net/personality-insights/api"

personality_insights = PersonalityInsightsV3(
    version = pi_version,
    iam_apikey = pi_api_key,
    url = pi_url
)

personality_insights.set_detailed_response(True)
personality_insights.set_default_headers({'x-watson-learning-opt-out': "true"})

In [10]:
candidates_df = pd.DataFrame()
for i,f in enumerate(files):
    with open(join(path, f)) as profile_txt:
        profile = personality_insights.profile(profile_txt.read(),
                                               content_type='text/plain',
                                               accept='application/json',
                                               consumption_preferences=True,
                                               raw_scores=True).get_result()
        
        candidates_df = candidates_df.append(json_normalize(profile),ignore_index=True,sort=False)
        candidates_df.loc[i,'Name'] = f

In [11]:
candidates_df.drop(columns=['processed_language','warnings'],inplace=True)

In [12]:
candidates_df['Name'] = candidates_df['Name'].apply(lambda s: s.split("-")[1].title() + " " + s.split("-")[2].split(".")[0].title())

In [13]:
candidates_df.head()

Unnamed: 0,consumption_preferences,needs,personality,values,word_count,Name
0,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",1828,Stacy Brownphilpot
1,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",2732,Kevin Systrom
2,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",2167,Arianna Huffington
3,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",2517,Mark Zuckerberg
4,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",1854,Daniel Ek


In [14]:
project.save_data('candidates_df.csv',candidates_df.to_csv(index=False),overwrite=True)

{'asset_id': '36131aed-e240-4df3-a8e8-5d9963aafb07',
 'bucket_name': 'insideout-donotdelete-pr-arlsgjghrogjm7',
 'file_name': 'candidates_df.csv',
 'message': 'File candidates_df.csv has been written successfully to the associated OS'}

[Back to Top](#top)

<a id ="eda"></a>
# 4. Data Wrangling and Exploration

## 4.1. Data Wrangling

In [15]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,consumption_preferences,needs,personality,values,word_count,Name
0,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",1828,Stacy Brownphilpot
1,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",2732,Kevin Systrom
2,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",2167,Arianna Huffington
3,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",2517,Mark Zuckerberg
4,[{'consumption_preference_category_id': 'consu...,"[{'category': 'needs', 'name': 'Challenge', 's...","[{'category': 'personality', 'children': [{'ca...","[{'category': 'values', 'name': 'Conservation'...",1854,Daniel Ek


In [16]:
def insights_to_df(insight, candidates):
    """
    Returns a dataframe of needs, values or personality raw scores.
    
    insight - "needs", "values", "personality"
    candidates - dataframe of candidates
    """
    n_v_df = pd.DataFrame()
    for r in candidates.index:
        d = pd.DataFrame(eval(candidates.copy().loc[r,insight]))[['name','raw_score']]
        d = d.set_index('name').T.reset_index(drop=True)
        d['Name'] = candidates.loc[r,'Name']
        n_v_df = n_v_df.append(d,ignore_index=True)
    n_v_df.columns.name = ""
    return n_v_df

def consumption(candidates):
    """
    Returns a dataframe of positive consumption preferences of candidates.
    
    candidates - dataframe of candidates
    """
    c_df = pd.DataFrame()
    for r in candidates.index:
        d = pd.DataFrame(eval(candidates.copy().loc[r,'consumption_preferences'])).drop(columns=['consumption_preference_category_id'])
        d['consumption_preferences'] = d['consumption_preferences'].apply(lambda l: [p['consumption_preference_id'][24:] for p in l if p['score'] >= 0.5])
        d = d[~d['name'].isin(['Music Preferences','Purchasing Preferences'])]
        d = d.set_index('name').T.reset_index(drop=True)
        d['Name'] = candidates.loc[r,'Name']
        c_df = c_df.append(d,ignore_index=True)
    c_df.columns.name = ""
    return c_df

def personality_details(candidates):
    """
    Returns dataframes of raw scores for each of the Big 5 traits.
    
    candidates - dataframe of candidates
    """
    open_df = pd.DataFrame()
    cons_df = pd.DataFrame()
    extra_df = pd.DataFrame()
    agree_df = pd.DataFrame()
    emotion_df = pd.DataFrame()
    
    for r in candidates.index:
        for t in range(0,5):
            t_name = eval(candidates_df.loc[r,'personality'])[t]['name']
            d = pd.DataFrame(eval(candidates_df.loc[r,'personality'])[t]['children'])[['name','raw_score']]
            d = d.set_index('name').T.reset_index(drop=True)
            d['Name'] = candidates.loc[r,'Name']
            
            if t_name == 'Openness':
                open_df = open_df.append(d,ignore_index=True)
            elif t_name == 'Conscientiousness':
                cons_df = cons_df.append(d,ignore_index=True)
            elif t_name == 'Extraversion':
                extra_df = extra_df.append(d,ignore_index=True)
            elif t_name == 'Agreeableness':
                agree_df = agree_df.append(d,ignore_index=True)
            elif t_name == 'Emotional range':
                emotion_df = emotion_df.append(d,ignore_index=True)
                
    open_df.columns.name = ""
    cons_df.columns.name = ""
    extra_df.columns.name = ""
    agree_df.columns.name = ""
    emotion_df.columns.name = ""
    
    return  open_df,cons_df,extra_df,agree_df,emotion_df

In [17]:
needs_df = insights_to_df('needs',candidates_df)
values_df = insights_to_df('values',candidates_df)
consumption_df = consumption(candidates_df)
all_personality_df = insights_to_df('personality',candidates_df)
openness_df, conscientiousness_df, extraversion_df, agreeableness_df, emotional_range_df = personality_details(candidates_df)

## 4.2. Exploratory Data Analysis

In [18]:
needs_df

Unnamed: 0,Challenge,Closeness,Curiosity,Excitement,Harmony,Ideal,Liberty,Love,Practicality,Self-expression,Stability,Structure,Name
0,0.732572,0.749537,0.833941,0.593045,0.771311,0.675777,0.711317,0.712801,0.699369,0.624175,0.721488,0.725894,Stacy Brownphilpot
1,0.687388,0.737678,0.827021,0.553379,0.758527,0.643312,0.694281,0.691013,0.697178,0.646145,0.707389,0.694753,Kevin Systrom
2,0.72721,0.766896,0.845312,0.61849,0.79915,0.667004,0.718663,0.745111,0.713351,0.651702,0.728604,0.714364,Arianna Huffington
3,0.729555,0.705474,0.834953,0.575228,0.74046,0.643764,0.698083,0.672824,0.70058,0.613063,0.69507,0.714984,Mark Zuckerberg
4,0.705916,0.734802,0.83349,0.580611,0.765428,0.656928,0.703968,0.696026,0.705324,0.623352,0.701928,0.709038,Daniel Ek
5,0.709902,0.723402,0.843319,0.544,0.748655,0.668819,0.695866,0.696311,0.699126,0.62731,0.690413,0.701614,Ev Williams
6,0.70768,0.726441,0.82906,0.54328,0.754978,0.649147,0.693459,0.679617,0.700173,0.625493,0.694059,0.708965,Marissa Mayer


In [19]:
values_df

Unnamed: 0,Conservation,Openness to change,Hedonism,Self-enhancement,Self-transcendence,Name
0,0.631154,0.777917,0.631708,0.639579,0.834642,Stacy Brownphilpot
1,0.575899,0.76661,0.633714,0.652945,0.829168,Kevin Systrom
2,0.636515,0.78405,0.651688,0.661706,0.839432,Arianna Huffington
3,0.59866,0.782782,0.609476,0.656638,0.829649,Mark Zuckerberg
4,0.586833,0.771245,0.639145,0.661032,0.830876,Daniel Ek
5,0.556791,0.775472,0.623224,0.655128,0.826246,Ev Williams
6,0.583031,0.78855,0.632841,0.657561,0.827496,Marissa Mayer


In [20]:
consumption_df

Unnamed: 0,Health & Activity Preferences,Environmental Concern Preferences,Entrepreneurship Preferences,Movie Preferences,Reading Preferences,Volunteering Preferences,Name
0,[outdoor],[concerned_environment],[start_business],"[movie_adventure, movie_historical, movie_scie...","[read_frequency, books_non_fiction]",[volunteer],Stacy Brownphilpot
1,[outdoor],[concerned_environment],[start_business],"[movie_adventure, movie_musical, movie_histori...","[read_frequency, books_non_fiction, books_auto...",[volunteer],Kevin Systrom
2,[outdoor],[concerned_environment],[start_business],"[movie_adventure, movie_science_fiction, movie...","[read_frequency, books_non_fiction]",[volunteer],Arianna Huffington
3,[outdoor],[concerned_environment],[start_business],"[movie_adventure, movie_historical, movie_scie...","[read_frequency, books_non_fiction, books_fina...",[volunteer],Mark Zuckerberg
4,[outdoor],[concerned_environment],[],"[movie_adventure, movie_musical, movie_histori...","[read_frequency, books_non_fiction, books_auto...",[volunteer],Daniel Ek
5,[outdoor],[concerned_environment],[start_business],"[movie_adventure, movie_historical, movie_scie...","[read_frequency, books_non_fiction, books_fina...",[volunteer],Ev Williams
6,[outdoor],[concerned_environment],[start_business],"[movie_adventure, movie_musical, movie_histori...","[read_frequency, books_non_fiction, books_auto...",[volunteer],Marissa Mayer


In [21]:
all_personality_df

Unnamed: 0,Openness,Conscientiousness,Extraversion,Agreeableness,Emotional range,Name
0,0.772982,0.64935,0.502878,0.738743,0.497555,Stacy Brownphilpot
1,0.800396,0.627016,0.523315,0.721635,0.497048,Kevin Systrom
2,0.787819,0.664589,0.54084,0.735884,0.481716,Arianna Huffington
3,0.795236,0.660267,0.5125,0.69728,0.497587,Mark Zuckerberg
4,0.791975,0.574534,0.455835,0.704284,0.473716,Daniel Ek
5,0.818073,0.633957,0.528237,0.721584,0.506147,Ev Williams
6,0.805419,0.644014,0.52295,0.721594,0.486634,Marissa Mayer


In [22]:
openness_df

Unnamed: 0,Adventurousness,Artistic interests,Emotionality,Imagination,Intellect,Authority-challenging,Name
0,0.549272,0.677823,0.663399,0.684591,0.711465,0.58335,Stacy Brownphilpot
1,0.519717,0.72792,0.66937,0.709328,0.712264,0.628538,Kevin Systrom
2,0.542648,0.71261,0.692105,0.747324,0.727052,0.60127,Arianna Huffington
3,0.541532,0.69145,0.643814,0.691068,0.722693,0.623718,Mark Zuckerberg
4,0.529975,0.716827,0.678005,0.730441,0.736404,0.647658,Daniel Ek
5,0.536035,0.733576,0.670893,0.746914,0.741595,0.667499,Ev Williams
6,0.547431,0.701889,0.66169,0.712637,0.723099,0.634171,Marissa Mayer


In [23]:
conscientiousness_df

Unnamed: 0,Achievement striving,Cautiousness,Dutifulness,Orderliness,Self-discipline,Self-efficacy,Name
0,0.739307,0.608857,0.684512,0.454481,0.581252,0.762035,Stacy Brownphilpot
1,0.675763,0.615348,0.680129,0.440515,0.530816,0.697531,Kevin Systrom
2,0.750372,0.528937,0.675265,0.463056,0.580965,0.796325,Arianna Huffington
3,0.728193,0.613682,0.681686,0.44999,0.574116,0.761576,Mark Zuckerberg
4,0.686923,0.588714,0.682981,0.429911,0.521362,0.723693,Daniel Ek
5,0.699108,0.590725,0.683971,0.436252,0.530293,0.7222,Ev Williams
6,0.718306,0.616908,0.686392,0.452915,0.567074,0.750701,Marissa Mayer


In [24]:
extraversion_df

Unnamed: 0,Activity level,Assertiveness,Cheerfulness,Excitement-seeking,Outgoing,Gregariousness,Name
0,0.607066,0.715528,0.605097,0.545532,0.554002,0.419505,Stacy Brownphilpot
1,0.534139,0.633161,0.583899,0.517096,0.494154,0.375204,Kevin Systrom
2,0.617853,0.730923,0.610429,0.595363,0.579744,0.422092,Arianna Huffington
3,0.616067,0.702512,0.587455,0.548125,0.557755,0.428105,Mark Zuckerberg
4,0.538444,0.631072,0.571256,0.544202,0.492812,0.362288,Daniel Ek
5,0.570366,0.646632,0.58042,0.535335,0.514934,0.370557,Ev Williams
6,0.582477,0.683506,0.5846,0.536119,0.522487,0.388183,Marissa Mayer


In [25]:
agreeableness_df

Unnamed: 0,Altruism,Cooperation,Modesty,Uncompromising,Sympathy,Trust,Name
0,0.756708,0.639379,0.48767,0.716013,0.751417,0.603901,Stacy Brownphilpot
1,0.732964,0.65564,0.485216,0.697863,0.711541,0.598488,Kevin Systrom
2,0.782432,0.61658,0.44081,0.685819,0.735926,0.620319,Arianna Huffington
3,0.743622,0.646228,0.450045,0.674064,0.719749,0.639157,Mark Zuckerberg
4,0.744824,0.665624,0.48294,0.705629,0.73868,0.613193,Daniel Ek
5,0.752197,0.669407,0.45443,0.683956,0.753108,0.631402,Ev Williams
6,0.741977,0.657229,0.446347,0.708853,0.737199,0.621807,Marissa Mayer


In [26]:
emotional_range_df

Unnamed: 0,Fiery,Prone to worry,Melancholy,Immoderation,Self-consciousness,Susceptible to stress,Name
0,0.401697,0.494946,0.42986,0.500217,0.532594,0.408512,Stacy Brownphilpot
1,0.437068,0.532793,0.481805,0.518504,0.576679,0.476039,Kevin Systrom
2,0.45662,0.556489,0.444668,0.485235,0.532718,0.425963,Arianna Huffington
3,0.410326,0.532186,0.441051,0.498628,0.552278,0.429566,Mark Zuckerberg
4,0.422803,0.555647,0.493483,0.525057,0.608136,0.469477,Daniel Ek
5,0.444969,0.567656,0.477213,0.497748,0.595427,0.477262,Ev Williams
6,0.427564,0.509595,0.452251,0.51205,0.558704,0.415672,Marissa Mayer


[Back to Top](#top)

<a id="ibm139"></a>
# 5. IBM 1-3-9

The IBM's 139 practices are taken as a basis to determine the qualified matches:

<img src="https://upload.wikimedia.org/wikipedia/commons/1/1d/IBM_values.png" alt="IBM 139" style="width: 700px; height: 300px;"/>

- Dedication to every client's success requires skills: Curiosity, Practicality, Self-transcendence.
- Innovation that matters for our company and for the world requires skills: Self-enhancement, Imagination, Intellect.
- Trust and personal responsibility in all relationships requires skills: Trust, Dutifulness, Cooperation. 

In [27]:
def company_match(candidates):
    """ 
    Calculates the match based on company values. Returns Euclidian distance based similarity. 
    
    candidates - dateframe of candidates 
    
    """
    
    company_skills=['Curiosity', 'Practicality', 'Self-transcendence','Self-enhancement','Imagination', 'Intellect','Trust', 'Dutifulness', 'Cooperation']

    s = np.array([0.85]*len(company_skills))
    dist = np.linalg.norm(np.array(candidates[company_skills]) - s)
    similarity = 1/(1+dist)
    
    return similarity

Based on the interests of candidates managers will have an opportunity to recommend facilities and activities at IBM that candidates may enjoy:

In [28]:
def recommend_activities(activity):
    recommendation = "Candidate would like to know about {}."
    lst = []
    if set(['gym_membership']).intersection(activity['Health & Activity Preferences']):
        lst.append('on-site gym')
    if set(['eat_out']).intersection(activity['Health & Activity Preferences']):
        lst.append('cafeteria')
    if set(['read_frequency']).intersection(activity['Reading Preferences']):
        lst.append('on-site library')
    if set(['start_business']).intersection(activity['Entrepreneurship Preferences']):
        lst.append('Area 631')
    if set(['volunteer']).intersection(activity['Volunteering Preferences']):
        lst.append('ECF')
    if len(activity['Movie Preferences']) >= 1:
        lst.append('Amphitheatre movie days')
        
    return recommendation.format(", ".join(lst))

In [29]:
consumption_df['Recommend activities'] = consumption_df.apply(recommend_activities,axis = 1)

In [30]:
consumption_df[['Name','Recommend activities']]

Unnamed: 0,Name,Recommend activities
0,Stacy Brownphilpot,Candidate would like to know about on-site lib...
1,Kevin Systrom,Candidate would like to know about on-site lib...
2,Arianna Huffington,Candidate would like to know about on-site lib...
3,Mark Zuckerberg,Candidate would like to know about on-site lib...
4,Daniel Ek,Candidate would like to know about on-site lib...
5,Ev Williams,Candidate would like to know about on-site lib...
6,Marissa Mayer,Candidate would like to know about on-site lib...


[Back to Top](#top)

<a id="team"></a>
# 6. Team Match

The skills indicated below were chosen based on the analysis of job descriptions.

***CHOOSE 5 ESSENTIAL SKILLS FOR YOUR TEAM:***
* Strong work values - Dependability, honesty, self-confidence and a positive attitude.
* Perseverance and motivation
* Enthusiasm
* Communication 
* Negotiation and persuasion
* Teamwork
* Leadership skills
* Confidence
* Self-management - Multitasking, adapting to changes, and flexibility.
* Detail oriented
* Ability to work under pressure
* Willingness to learn 
* Thinking skills - Problem solving, analytical skills, and decision making.
* Creativity

In [31]:
body = client_0f8d8b468baa4efa9c29025ebc262fe5.get_object(Bucket='insideout-donotdelete-pr-arlsgjghrogjm7',Key='Describe_Team.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

describe_team = pd.read_csv(body)
describe_team.head()

Unnamed: 0,Essential Skills,Extraversion,Conscientiousness,Agreeableness,Openness,Emotional range,Achievement striving,Cautiousness,Dutifulness,Orderliness,...,Modesty,Activity level,Assertiveness,Cheerfulness,Gregariousness,Susceptible to stress,Self-consciousness,Intellect,Imagination,Authority-challenging
0,Strong work values,,High,High,,Low,,,High,,...,,,,High,,,Low,,,
1,Perseverance and motivation,High,High,High,,Low,High,,,,...,,,,,,,,,,
2,Enthusiasm,High,High,High,,,,,,,...,,,,High,,,,,,
3,Communication,High,,High,,High,,,,,...,,,,,High,,,,,
4,Negotiation and persuasion,High,High,High,,,High,,,,...,,,,,,,,,,


In [32]:
def team_match(skills):
    description = describe_team.copy()
    description.replace({'High':1,'Low':-1},inplace=True)
    
    dfs = [all_personality_df, openness_df, conscientiousness_df,extraversion_df,agreeableness_df,emotional_range_df]
    df = reduce(lambda left,right: pd.merge(left,right,on='Name'), dfs)
    
    l = ['Name']
    
    for skill in skills:
        skill_df =  description.loc[description['Essential Skills'] == skill]
        skill_df = skill_df[[col for col in skill_df.columns if skill_df[col].notnull().values]]
        lows = skill_df[skill_df == -1].count().sum()
        traits = list(skill_df.iloc[:,1:].columns)
        
        df[skill+' scores'] = ((df[traits]*skill_df[traits].values[0]).apply(np.mean,axis=1) + lows/len(traits))
        l.append(skill + ' scores')
        
    match_result = df.copy()[l]
    match_result['Team match'] = match_result[l[1:]].apply(np.mean,axis=1)
                                                                    
    return match_result

In [33]:
# Example
skill_list = ['Teamwork','Ability to work under pressure','Thinking skills','Enthusiasm','Communication']

In [34]:
def match(skills):
    c_dfs = [needs_df, values_df, openness_df, conscientiousness_df, extraversion_df, agreeableness_df, emotional_range_df]
    company_match_final = reduce(lambda left,right: pd.merge(left,right,on='Name'), c_dfs)
    
    company_match_final['Company match'] = company_match_final.apply(company_match,axis=1)
    company_match_final = company_match_final[['Name','Company match']]
    
    team_match_final = team_match(skills)
    
    match_final = pd.merge(company_match_final,team_match_final,on='Name')
    match_final['Overall match'] = (match_final['Company match'] + match_final['Team match'])/2
    match_final.sort_values(by = 'Overall match',ascending=False,inplace=True)
    
    cols = match_final.columns.tolist()
    cols_ordered = cols[0:1] + cols[-1:] + cols[1:2] + cols[-2:-1] + cols[2:-2]
    
    return match_final[cols_ordered]

In [35]:
results = match(skills = skill_list)
results = pd.merge(results,consumption_df[['Name','Recommend activities']],how='inner',on='Name')

In [36]:
results

Unnamed: 0,Name,Overall match,Company match,Team match,Teamwork scores,Ability to work under pressure scores,Thinking skills scores,Enthusiasm scores,Communication scores,Recommend activities
0,Arianna Huffington,0.647058,0.681862,0.612253,0.672351,0.557762,0.648086,0.637936,0.545133,Candidate would like to know about on-site lib...
1,Ev Williams,0.646184,0.695495,0.596873,0.674983,0.515628,0.646075,0.61605,0.531631,Candidate would like to know about on-site lib...
2,Marissa Mayer,0.646089,0.685034,0.607144,0.674279,0.554923,0.65839,0.61829,0.52984,Candidate would like to know about on-site lib...
3,Mark Zuckerberg,0.643008,0.682641,0.603374,0.667925,0.548987,0.651715,0.614375,0.533868,Candidate would like to know about on-site lib...
4,Stacy Brownphilpot,0.639307,0.66831,0.610305,0.675824,0.558395,0.653619,0.624017,0.53967,Candidate would like to know about on-site lib...
5,Kevin Systrom,0.634904,0.675013,0.594795,0.668097,0.519243,0.643367,0.613966,0.529301,Candidate would like to know about on-site lib...
6,Daniel Ek,0.633424,0.689685,0.577163,0.648147,0.526056,0.636103,0.576477,0.499031,Candidate would like to know about on-site lib...


In [37]:
project.save_data('cognos_one.csv',results.to_csv(index=False),overwrite=True)

{'asset_id': '0a6ede86-3755-4516-b575-a78494fa8e1d',
 'bucket_name': 'insideout-donotdelete-pr-arlsgjghrogjm7',
 'file_name': 'cognos_one.csv',
 'message': 'File cognos_one.csv has been written successfully to the associated OS'}

In [38]:
project.save_data('match_results.json',results.to_json(orient='records'),overwrite=True)

{'asset_id': '7930b209-08d2-49aa-96f1-a9d7a01254e2',
 'bucket_name': 'insideout-donotdelete-pr-arlsgjghrogjm7',
 'file_name': 'match_results.json',
 'message': 'File match_results.json has been written successfully to the associated OS'}

[Back to Top](#top)

<a id="compare"></a>
# 7. Compare Candidates

In [39]:
body = client_0f8d8b468baa4efa9c29025ebc262fe5.get_object(Bucket='insideout-donotdelete-pr-arlsgjghrogjm7',Key='Describe_Needs.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

describe_needs = pd.read_csv(body)
describe_needs.head()

Unnamed: 0,Need,Description
0,Challenge,"have an urge to achieve, to succeed, and to ta..."
1,Closeness,relish being connected to family and setting u...
2,Curiosity,"have a desire to discover, find out, and grow"
3,Excitement,"want to get out there and live life, have upbe..."
4,Harmony,"appreciate other people, their viewpoints and ..."


In [40]:
body = client_0f8d8b468baa4efa9c29025ebc262fe5.get_object(Bucket='insideout-donotdelete-pr-arlsgjghrogjm7',Key='Describe_Values.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )
describe_values = pd.read_csv(body)
describe_values.head()

Unnamed: 0,Value,Description
0,Conservation,"emphasize self-restriction, order, and resista..."
1,Openness to change,"emphasize independent action, thought, and fee..."
2,Hedonism,seek pleasure and sensuous gratification
3,Self-enhancement,seek personal success
4,Self-transcendence,show concern for the welfare and interests of ...


In [41]:
def compare_needs_and_values(candidate_one,candidate_two,trait):
    """
    Compares either needs or values and returns a dictionary: overall similarity score, similarities, differences, and highlights.
    
    candidate_one - name of first candidate 
    candidate_two - name of second candidate
    
    trait - 'Need' or 'Value'
    
    """
    
    if trait == 'Need':
        df = needs_df.copy()
        description = describe_needs.copy()
    elif trait == 'Value':
        df = values_df.copy()
        description = describe_values.copy()
        
    skills_one = df.loc[df['Name'] == candidate_one,df.columns != 'Name']
    skills_two = df.loc[df['Name'] == candidate_two,df.columns != 'Name']
    
    dist = np.linalg.norm(np.array(skills_one) - np.array(skills_two))
    similarity = 1/(1+dist)
    
    scores = skills_one.T.squeeze()-skills_two.T.squeeze()
    scores_text = scores.apply(lambda n: 'more' if n > 0 else 'less')
    skills_text = skills_one.T.squeeze().apply(lambda n: 'likely' if n >= 0.5 else 'less likely')
    
    similarities = list(np.abs(scores).sort_values().head(2).index)
    s_text = description.loc[description[trait].isin(similarities),'Description'].reset_index(drop=True)
    similarities_text = "Both are {} to {}. Also they are {} to {}.".format(skills_text[similarities][0], s_text[0], 
                                                                                        skills_text[similarities][1], s_text[1])
    
    differences = list(np.abs(scores).sort_values(ascending=False).head(2).index)
    d_text = description.loc[description[trait].isin(differences),'Description'].reset_index(drop=True)
    differences_text = "{} is {} likely to {}. Also he/she is {} likely than {} to {}.".format(candidate_one, 
                                                                                               scores_text[differences][0], 
                                                                                               d_text[0], 
                                                                                               scores_text[differences][1], 
                                                                                               candidate_two,
                                                                                               d_text[1])
    
    candidate_one_similarity_scores = skills_one.T.squeeze()[similarities].to_dict()
    candidate_two_similarity_scores = skills_two.T.squeeze()[similarities].to_dict()
    candidate_one_difference_scores = skills_one.T.squeeze()[differences].to_dict()
    candidate_two_difference_scores = skills_two.T.squeeze()[differences].to_dict()
    
    new_similarities = {key: [value] + [candidate_two_similarity_scores[key]] for key, value in candidate_one_similarity_scores.items()}
    new_differences = {key: [value] + [candidate_two_difference_scores[key]] for key, value in candidate_one_difference_scores.items()}
    compare_data = {'Name':[candidate_one,candidate_two],'Text {}'.format(trait): [similarities_text,differences_text],**new_similarities,**new_differences}
    
    
    compare_n_v = pd.DataFrame(compare_data)
    
    
    return {'similarity_score_{}'.format(trait): similarity, 
            'candidate_one_similarities_{}'.format(trait):candidate_one_similarity_scores,
            'candidate_two_similarities_{}'.format(trait):candidate_two_similarity_scores,
            'candidate_one_differences_{}'.format(trait):candidate_one_difference_scores,
            'candidate_two_differences_{}'.format(trait):candidate_two_difference_scores,
            'similarities_text_{}'.format(trait): similarities_text,
            'differences_text_{}'.format(trait): differences_text}, compare_n_v

In [138]:
v_js, compare_df_values = compare_needs_and_values('Marissa Mayer','Arianna Huffington','Value')
n_js, compare_df_needs = compare_needs_and_values('Marissa Mayer','Arianna Huffington','Need')

In [139]:
compare_df_values

Unnamed: 0,Conservation,Hedonism,Name,Openness to change,Self-enhancement,Text Value
0,0.583031,0.632841,Marissa Mayer,0.78855,0.657561,Both are likely to emphasize independent actio...
1,0.636515,0.651688,Arianna Huffington,0.78405,0.661706,Marissa Mayer is less likely to emphasize self...


In [140]:
compare_df_needs

Unnamed: 0,Excitement,Love,Name,Practicality,Structure,Text Need
0,0.54328,0.679617,Marissa Mayer,0.700173,0.708965,Both are likely to have a desire to get the jo...
1,0.61849,0.745111,Arianna Huffington,0.713351,0.714364,Marissa Mayer is less likely to want to get ou...


In [45]:
body = client_0f8d8b468baa4efa9c29025ebc262fe5.get_object(Bucket='insideout-donotdelete-pr-arlsgjghrogjm7',Key='Describe_Personalities_HL.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

describe_personalities_hl = pd.read_csv(body)
describe_personalities_hl.fillna(value='',inplace=True)
describe_personalities_hl.head()

Unnamed: 0,Personalities,Agreeableness High,Conscientiousness High,Extraversion High,Emotional range High,Openness High,Agreeableness Low,Conscientiousness Low,Extraversion Low,Emotional range Low,Openness Low
0,Agreeableness High,,cooperative,"communicative, enthusiastic",,"tactful, idealistic",,,,"flexible, patient",
1,Conscientiousness High,cooperative,,"purposeful, competitive",particular,"perfectionistic, analytical",,,punctual,"self-disciplined, logical, decisive, rational",
2,Extraversion High,,"purposeful, competitive",,wordy,"expressive, witty",bossy,,,confident,
3,Emotional range High,,particular,wordy,,excitable,,,,,
4,Openness High,"tactful, idealistic","perfectionistic, analytical","expressive, witty",excitable,,,,self-examining,"creative, intellectual, insightful",


In [57]:
def compare_personality(candidate_one,candidate_two):
    """
    Compares personalities and returns a dictionary: overall similarity score, similarities, differences, and highlights.
    
    candidate_one - name of first candidate 
    candidate_two - name of second candidate
    
    """
    trait='Personality'
    
    df = all_personality_df.copy()
    description_hl = describe_personalities_hl.copy()
    
    skills_one = df.loc[df['Name'] == candidate_one,df.columns != 'Name']
    skills_two = df.loc[df['Name'] == candidate_two,df.columns != 'Name']
    
    dist = np.linalg.norm(np.array(skills_one) - np.array(skills_two))
    similarity = 1/(1+dist)
    
    scores_one = skills_one.T.squeeze()
    scores_one_hl = scores_one.apply(lambda n: 'High' if n >= 0.5 else 'Low')
    cols_one = list(scores_one_hl.index + ' ' + scores_one_hl.values)
     
    scores_two = skills_two.T.squeeze()
    scores_two_hl = scores_two.apply(lambda n: 'High' if n >= 0.5 else 'Low')
    cols_two = list(scores_two_hl.index + ' ' + scores_two_hl.values)

    scores = scores_one_hl == scores_two_hl
    differences = list(scores[scores == False].index)
    
    d = {'similarity score': similarity,'differences': differences,'similarities':[]}

    if differences != []:
        differences_one = list(scores_one_hl[scores_one_hl.index.isin(differences)].index + ' ' + scores_one_hl[scores_one_hl.index.isin(differences)].values)
        differences_two = list(scores_two_hl[scores_two_hl.index.isin(differences)].index + ' ' + scores_two_hl[scores_two_hl.index.isin(differences)].values)
        candidate_one_traits = description_hl.copy().loc[description_hl['Personalities'].isin(differences_one),cols_one].values[0]
        candidate_two_traits = description_hl.copy().loc[description_hl['Personalities'].isin(differences_two),cols_two].values[0]
        candidate_text = "{} is {}."
        d['candidate_one_text'] = candidate_text.format(candidate_one,", ".join(candidate_one_traits[candidate_one_traits !=""]))
        d['candidate_two_text'] = candidate_text.format(candidate_two,", ".join(candidate_two_traits[candidate_two_traits !=""]))
        
        
    else:
        scores_similar = scores_one - scores_two
        scores_text = scores_similar.apply(lambda n: 'more' if n > 0 else 'less')
        similarities = list(np.abs(scores_similar).sort_values(ascending=False).head(2).index)
        similar_traits = list(scores_one_hl[similarities].index + ' ' +scores_one_hl[similarities].values)
        
        cols = list(set(cols_one).difference(set(similar_traits)))
        
        similar_text_one = description_hl.copy().loc[description_hl['Personalities'].isin([similar_traits[0]]),cols].values[0]
        similar_text_one = similar_text_one[similar_text_one != ""]
        similar_text_two = description_hl.copy().loc[description_hl['Personalities'].isin([similar_traits[1]]),cols].values[0]
        similar_text_two = similar_text_two[similar_text_two != ""]
        
        candidate_text = "Both candidates have similar personalities, however {} is {} {} than {}. Also, he/she is {} {}.".format(candidate_one,
                                                                                                                                  scores_text[similarities[0]],
                                                                                                                                  similar_text_one[0], 
                                                                                                                                  candidate_two,
                                                                                                                                  scores_text[similarities[1]],
                                                                                                                                  similar_text_two[0])
        d['similarity_text'] = candidate_text
        d['similarities'] = similarities
        
    d['candidate_one_scores'] = scores_one.to_dict()
    d['candidate_two_scores'] = scores_two.to_dict()
    
    if 'similarity_text' in d.keys():
        text = {'Text': [d['similarity_text'],d['similarity_text']]}
    elif 'candidate_one_text' in d.keys():
        text = {'Text':[d['candidate_one_text'],d['candidate_two_text']]}
    
    vals = {key: [value] + [scores_two.to_dict()[key]] for key, value in scores_one.to_dict().items()}
    compare_data = {'Name':[candidate_one,candidate_two], **vals, 'Similarity score':[similarity,similarity],**text}
    compare_p = pd.DataFrame(compare_data)
    
    return d, compare_p

In [141]:
p_js, compare_df_personality = compare_personality('Marissa Mayer','Arianna Huffington')

In [142]:
compare_df_personality

Unnamed: 0,Agreeableness,Conscientiousness,Emotional range,Extraversion,Name,Openness,Similarity score,Text
0,0.721594,0.644014,0.486634,0.52295,Marissa Mayer,0.805419,0.965439,"Both candidates have similar personalities, ho..."
1,0.735884,0.664589,0.481716,0.54084,Arianna Huffington,0.787819,0.965439,"Both candidates have similar personalities, ho..."


In [143]:
dfs = [compare_df_personality,compare_df_needs,compare_df_values]
compare_df = reduce(lambda left,right: pd.merge(left,right,on='Name'), dfs)

In [144]:
compare_df

Unnamed: 0,Agreeableness,Conscientiousness,Emotional range,Extraversion,Name,Openness,Similarity score,Text,Excitement,Love,Practicality,Structure,Text Need,Conservation,Hedonism,Openness to change,Self-enhancement,Text Value
0,0.721594,0.644014,0.486634,0.52295,Marissa Mayer,0.805419,0.965439,"Both candidates have similar personalities, ho...",0.54328,0.679617,0.700173,0.708965,Both are likely to have a desire to get the jo...,0.583031,0.632841,0.78855,0.657561,Both are likely to emphasize independent actio...
1,0.735884,0.664589,0.481716,0.54084,Arianna Huffington,0.787819,0.965439,"Both candidates have similar personalities, ho...",0.61849,0.745111,0.713351,0.714364,Marissa Mayer is less likely to want to get ou...,0.636515,0.651688,0.78405,0.661706,Marissa Mayer is less likely to emphasize self...


##### Saving datasets to COS

In [145]:
project.save_data('compare_cognos_three.csv',compare_df.to_csv(index=False),overwrite=True)

{'asset_id': '4c318cb6-b8f4-49ea-bf04-a6a71fd4345e',
 'bucket_name': 'insideout-donotdelete-pr-arlsgjghrogjm7',
 'file_name': 'compare_cognos_three.csv',
 'message': 'File compare_cognos_three.csv has been written successfully to the associated OS'}

In [74]:
project.save_data('compare_personalities.json',json.dumps(p_js),overwrite=True)

{'asset_id': '26714437-9516-480b-8d30-86d40fa28a85',
 'bucket_name': 'insideout-donotdelete-pr-arlsgjghrogjm7',
 'file_name': 'compare_personalities.json',
 'message': 'File compare_personalities.json has been written successfully to the associated OS'}

In [75]:
project.save_data('compare_needs.json',json.dumps(n_js),overwrite=True)

{'asset_id': '1d637f17-b4f9-426f-b589-c7d81f8e5d0b',
 'bucket_name': 'insideout-donotdelete-pr-arlsgjghrogjm7',
 'file_name': 'compare_needs.json',
 'message': 'File compare_needs.json has been written successfully to the associated OS'}

In [76]:
project.save_data('compare_values.json',json.dumps(v_js),overwrite=True)

{'asset_id': '05f20679-6536-42de-b36e-e96218dc9e3e',
 'bucket_name': 'insideout-donotdelete-pr-arlsgjghrogjm7',
 'file_name': 'compare_values.json',
 'message': 'File compare_values.json has been written successfully to the associated OS'}

[Back to Top](#top)

# END 