## Use-Case Specific Feature Extraction on Cleaned Data

In [38]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [39]:
data = pd.read_csv('../data/processed/clean_data.csv')
data.set_index('call_id',inplace=True)
data= (data - data.mean())/data.std()

### Linearly Feature Quantile 
The features which impact the agent's score linearly are constructed through this function 'linear_feature_quantile'. The more the agent's feature is positively deviant from the average behaviour, the more the score assigned. However if the value is lesser than mean, the value is 0. 

In [40]:
def linear_feature_quantile(data, feature):
    '''
    This function assigns a linear score to each call for the feature given in the parameter.
    Linear indicates  more the value the higher the score. The following logic is used : 
        - 0 is assigned if the value of the standardized feature is less than 0.
        - 1 is assigned if the value of the standardized feature is between 0 and 1.
        - 2 is assigned if the value of the standardized feature is between 1 and 2. 
        - 3 is assigned if the value of the standardized feature is more than 2.
    '''
    data.ix[data[data[feature]<=0].index.values,feature]=0
    data.ix[data[(data[feature]>0) & (data[feature]<=1)].index.values,feature]=1
    data.ix[data[(data[feature]>1) & (data[feature]<=2)].index.values,feature]=2
    data.ix[data[(data[feature]>2)].index.values,feature]=3
    return data
    
    

### Abnormal Feature Quantile
The features which do not impact the agent's score linearly, but are indicators of anomolous behaviour when their value is straying away from both sides of the mean, are constructed trough the following function called 'abnormal_feature_quantile'. 
When the value is abnormally deviant from the mean on positive or negative side , the abnormal feature value increases.

In [41]:
def abnormal_feature_quantile(data, feature):
    '''
    This function assigns a linear score to each call for the feature given in the parameter.
    Linear indicates  more the value the higher the score. The following logic is used : 
        - 0 is assigned if the value of the standardized feature is between -1 and 1.
        - 1 is assigned if the absolute value of the standardized feature is between 1 and 2.
        - 2 is assigned if the absolute value of the standardized feature is more than 2.
    '''
    data.ix[data[(data[feature].abs()<1)].index.values,feature]=0
    data.ix[data[(data[feature].abs()>1) & (data[feature].abs()<=2)].index.values,feature]=1
    data.ix[data[(data[feature].abs()>2)].index.values,feature]=2
    return data

We will be demonstrating out products capabilites and range using following use cases :


1) Agent Score


2) Borrower's Credit Risk Score


3) Information Score


### 1. Agent Score 
Based on the features that contribute to suggest that the agent has performed  poorly during a call, we assign an agent score in this module. It will allow our customers to assess their agents over time and allow the agents to improve on their weak links via the analytics provided by this score. The more the agent score, the more weaknesses the agent has shown during the call.

In [45]:
def agent_score(data):
    '''
    The following features are selected to gauge the agent performance as 'Agent Score' : 
        1. Linear Quantile features : 
            a. O-A-overtalk-incidents (L)
            b. O-A-overtalk-ratio (L)
            c. TTR-silence-ratio (L)
            e. TTR-silence-incidents (L)
            f. TST-A-voice-dynamism-std-dev-score (L)
        2. Anomoly Quantile features : 
            a. TTR-A-to-C-talk-rate-ratio (N)
            b. TTR-A-median-streak (N) 
            c. TTR-A-talk-ratio (N)
            d. TST-A-intra-call-change-in-pitch (N)
            e. TST-A-intra-call-change-in-relative-voice-volume-energy (N)
            f. TST-A-relative-voice-volume-energy (N)
            g. S-A-sentiment (N)
            h. TTR-A-talk-rate (Bad if too fast or too slow)  (N)

    '''
    linear_quantile_feature_list = ['O-A-overtalk-incidents','O-A-overtalk-ratio','TTR-silence-ratio',
                                    'TTR-silence-incidents']
    abnormal_quantile_feature_list = ['TTR-A-to-C-talk-rate-ratio','TTR-A-median-streak','TTR-A-talk-ratio',
                                    'TST-A-intra-call-change-in-pitch','TST-A-intra-call-change-in-relative-voice-volume-energy',
                                    'TST-A-relative-voice-volume-energy','S-A-sentiment','TTR-A-talk-rate']
    total_agent_score_columns = linear_quantile_feature_list + abnormal_quantile_feature_list
    data = data[total_agent_score_columns]
    for linear_feature in linear_quantile_feature_list:
        data = linear_feature_quantile(data, linear_feature)
    for abnormal_feature in abnormal_quantile_feature_list:
        data = abnormal_feature_quantile(data, abnormal_feature)
    return data

In [46]:
data = agent_score(data)
data['agent_score'] = data.sum(axis=1)
bad_agents = data[data['agent_score']>=10]
bad_agents.to_csv('../data/bad_agents.csv')

In [47]:
bad_agents

Unnamed: 0_level_0,O-A-overtalk-incidents,O-A-overtalk-ratio,TTR-silence-ratio,TTR-silence-incidents,TTR-A-to-C-talk-rate-ratio,TTR-A-median-streak,TTR-A-talk-ratio,TST-A-intra-call-change-in-pitch,TST-A-intra-call-change-in-relative-voice-volume-energy,TST-A-relative-voice-volume-energy,S-A-sentiment,TTR-A-talk-rate,agent_score
call_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2023,3.0,2.0,3.0,3.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,12.0
2600,3.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,11.0
2723,2.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,10.0
2987,3.0,0.0,2.0,2.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,12.0
3387,3.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,11.0
3417,3.0,1.0,3.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,11.0
3541,1.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,10.0
4519,1.0,2.0,2.0,3.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,10.0


### 2. Borrower's Credit Risk Score
