## Use-Case Specific Analytics

In [None]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

### Linearly Feature Quantile 
The features which impact the overall score linearly are constructed through this function 'linear_feature_quantile'. The more the  feature is positively deviant from the average behaviour, the more the score assigned. However if the value is lesser than mean, the value is 0. 

In [2]:
def linear_feature_quantile(data, feature):
    '''
    This function assigns a linear score to each call for the feature given in the parameter.
    Linear indicates  more the value the higher the score. The following logic is used : 
        - 0 is assigned if the value of the feature is less than 0 standard deviations away from the mean.
        - 1 is assigned if the value of the feature is between 0 and 1 standard deviations away from the mean.
        - 2 is assigned if the value of the feature is between 1 and 2 standard deviations away from the mean. 
        - 3 is assigned if the value of the feature is more than 2 standard deviations away from the mean.
    '''
    data.ix[data[data[feature]<=0].index.values,feature]=0
    data.ix[data[(data[feature]>0) & (data[feature]<=1)].index.values,feature]=1
    data.ix[data[(data[feature]>1) & (data[feature]<=2)].index.values,feature]=2
    data.ix[data[(data[feature]>2)].index.values,feature]=3
    return data
    
    

### Abnormal Feature Quantile
The features which do not impact the overall score linearly, but are indicators of anomolous behaviour when their value is straying away from both sides of the mean, are constructed through the following function called 'abnormal_feature_quantile'. 
When the value is abnormally deviant from the mean on positive or negative side , the abnormal feature value increases.

In [3]:
def abnormal_feature_quantile(data, feature):
    '''
    This function assigns a linear score to each call for the feature given in the parameter.
    Linear indicates  more the value the higher the score. The following logic is used : 
      - 0 is assigned if the value of the feature is between -1 and 1 standard deviations away from the mean.
      - 1 is assigned if the value of the feature is between 1 and 2 absolute standard deviations away.
      - 2 is assigned if the value of the feature is more than 2 absolute standard deviations away from the mean..
    '''
    data.ix[data[(data[feature].abs()<1)].index.values,feature]=0
    data.ix[data[(data[feature].abs()>1) & (data[feature].abs()<=2)].index.values,feature]=1
    data.ix[data[(data[feature].abs()>2)].index.values,feature]=2
    return data

We will be demonstrating out products capabilites and range using following use cases :


1) Agent Score


2) Borrower's Credit Risk Score


### 1. Agent Score 
Based on the features that contribute to suggest that the agent has performed  poorly during a call, we assign an agent score in this module. It will allow our customers to assess their agents over time and allow the agents to improve on their weak links via the analytics provided by this score. The more the agent score, the more weaknesses the agent has shown during the call.
The following features are selected to gauge the agent performance as 'Agent Score' : <br>
__Linear Quantile features__ : <br>
The more the value of these features, the more the agent score increases. 
1. O-A-overtalk-incidents (L)
   - The agent engaged in ovetalking is a bad trait.
2. O-A-overtalk-ratio (L)
   - The agent engaged in ovetalking is a bad trait.
3. TTR-silence-ratio (L)
   - The agent allowed high silence moments which indicates bad conversation skills.
4. TTR-silence-incidents (L)
   - The agent allowed high silence moments which indicates bad conversation skills.
5. TST-A-voice-dynamism-std-dev-score (L)
   - If the agent depicted low dynamism , it indicates the lack of voice modulation, which indicates bad communicatin skill.

__Abnormal Quantile features__ : <br>
If the value of these features for a particular is abnormally straying from the mean, the feature value increases.
1. TTR-A-to-C-talk-rate-ratio (N)
   - Indicates the agent is talking abormally fast or abnormally slow compared to borrower.
2. TTR-A-median-streak (N) 
   - If the streak is abnormally high or low, it indicates poor speaking skills.
3. TTR-A-talk-ratio (N)
   - Indicates the agent is talking abormally more or less than the borrower.
4. TST-A-intra-call-change-in-pitch (N)
   - If there is an extreme change in pitch, it indicates stressed behaviour.
5. TST-A-intra-call-change-in-relative-voice-volume-energy (N)
   - If there is extreme change in voice volume, it indicates too soft 
     or too loud voice.
6. TST-A-relative-voice-volume-energy (N)
   - If there is extreme change in voice volume, it indicates too soft 
     or too loud voice.
7. S-A-sentiment (N)
   - If there is an extreme change in sentiment, it indicates poor use
     of language by the agent.
8. TTR-A-talk-rate (Bad if too fast or too slow)  (N)
   - Indicates the agent is talking abormally fast or abnormally slow compared to borrower.
   

In [49]:
def agent_score(data):
    
    linear_quantile_feature_list = ['O-A-overtalk-incidents','O-A-overtalk-ratio','TTR-silence-ratio',
                                    'TTR-silence-incidents']
    abnormal_quantile_feature_list = ['TTR-A-to-C-talk-rate-ratio','TTR-A-median-streak',
                                      'TST-A-intra-call-change-in-pitch','TTR-A-talk-ratio',
                                      'TST-A-intra-call-change-in-relative-voice-volume-energy',
                                'TST-A-relative-voice-volume-energy','S-A-sentiment','TTR-A-talk-rate']
    total_agent_score_columns = linear_quantile_feature_list + abnormal_quantile_feature_list
    data = data[total_agent_score_columns]
    for linear_feature in linear_quantile_feature_list:
        data = linear_feature_quantile(data, linear_feature)
    for abnormal_feature in abnormal_quantile_feature_list:
        data = abnormal_feature_quantile(data, abnormal_feature)
    # summing up value of each extracted feature for every call_id to get a final agent score for each call.
    data['agent_score'] = data.sum(axis=1)
    return data

### Reading and Standardizing the cleaned data
The cleaned data from obtained from the preprocessing step is read and standardized for efficient analytics.

In [5]:
data = pd.read_csv('../data/processed/clean_data.csv')
data.set_index('call_id',inplace=True)
data= (data - data.mean())/data.std()

### Identifying the underperforming agents
A threshold score of 10 is used to identify agents that did not perform upto the mark.

In [6]:
data = agent_score(data)
bad_agents = data[data['agent_score']>10]
bad_agents.to_csv('../data/bad_agents.csv')

In [7]:
bad_agents

Unnamed: 0_level_0,O-A-overtalk-incidents,O-A-overtalk-ratio,TTR-silence-ratio,TTR-silence-incidents,TTR-A-to-C-talk-rate-ratio,TTR-A-median-streak,TTR-A-talk-ratio,TST-A-intra-call-change-in-pitch,TST-A-intra-call-change-in-relative-voice-volume-energy,TST-A-relative-voice-volume-energy,S-A-sentiment,TTR-A-talk-rate,agent_score
call_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2023,3.0,2.0,3.0,3.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,12.0
2600,3.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,11.0
2723,2.0,1.0,1.0,0.0,1.0,0.0,1.0,2.0,1.0,1.0,1.0,0.0,11.0
2987,3.0,0.0,2.0,2.0,2.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,13.0
3387,3.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,11.0
3417,3.0,1.0,3.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,11.0
3439,0.0,3.0,0.0,0.0,2.0,2.0,1.0,0.0,0.0,1.0,1.0,2.0,12.0


### 2. Borrower's Credit Risk Score
The primary goal of our customers is to get their principal money back along with their due interests from the borrowers. Moreover, they also need to gauge the default risk involved, which indicates the probability of the borrower defaulting. This feature called "Borrower's Credit Risk Score" will assist the Risk Analytics team in deducing the default risk involved with a particular borrower. In future, they would refrain from lending assets to such high risk prone borrowers. The feature is constructed by analyzing the properties of the call recordings made by the agent with a behaviour analytics perspective. 

The following features are selected to gauge the agent performance as 'Borrowers Credit Risk Score' :<br> 

__Linear Quantile features__ : <br>
The more the value of these features, the more the default risk score increases. 
1. TTR-C-talk-ratio
   - If the borrower talks a lot more than the agent, that means he is trying to justify his inability to pay.
2. TTR-Silence-incidents
   - If there are many silent incidents, it suggests that the borrower is thinking of reasons and excuses.
3. TTR-C-intra-call-change-in-talk-rate
   - Higher speech rate towards the second half of the call indicates increase in stress levels of borrower.
4. TTR-C-average-streak (-1)
   - Since a high risk borrower won't be able to make complete and comprehensive sentences, a low average streak 
     indicates high risk.
5. TST-C-intra-call-change-in-pitch 
   - Exposure to high stress leads to increase in pitch of speaker. Hence if the borrower speaks with high
     pitch, it indicates he is stressed, hence higher default risk.
6. S-C-sentiment (-1)
   - Negative overall sentiment indicates higher risk.
7. S-C-intra-call-change-in-sentiment            
   - Shift in sentiments towards the negative end indicates higher risk.

__Anomoly Quantile features :__ <br>
If the value of these features for a particular call_id is abnormally straying from the mean, the feature value increases.
1. TST-C-relative-voice-volume-energy
   - If the energy is too low or too high, it indicates high risk because the borrower is either low
     in confidence or he is too arrogant.
2. ST-C-intra-call-change-in-relative-voice-volume-energy
   - If the energy is too low or too high, it indicates high risk because the borrower is either low
     in confidence or he is too arrogant.


In [50]:
def borrower_credit_risk_score(data):
    
    linear_quantile_feature_list = ['TTR-C-talk-ratio','TTR-silence-incidents','TTR-C-intra-call-change-in-talk-rate',
                                    'TTR-C-average-streak', 'TST-C-intra-call-change-in-pitch','S-C-sentiment',
                                    'S-C-intra-call-change-in-sentiment']
    abnormal_quantile_feature_list = ['TST-C-relative-voice-volume-energy',
                                      'TST-C-intra-call-change-in-relative-voice-volume-energy']
    total_agent_score_columns = linear_quantile_feature_list + abnormal_quantile_feature_list
    data = data[total_agent_score_columns]
    data['TTR-C-average-streak'] = data['TTR-C-average-streak']*-1
    data['S-C-sentiment'] = data['S-C-sentiment']*-1
    for linear_feature in linear_quantile_feature_list:
        data = linear_feature_quantile(data, linear_feature)
    for abnormal_feature in abnormal_quantile_feature_list:
        data = abnormal_feature_quantile(data, abnormal_feature)
    #Summing up values of each constructed feature to get a final credit risk score for that call_id's borrower.
    data['borrower_credit_risk_score'] = data.sum(axis=1)
    return data

### Reading and Standardizing the cleaned data
The cleaned data from obtained from the preprocessing step is read and standardized for efficient analytics.

In [51]:
data = pd.read_csv('../data/processed/clean_data.csv')
data.set_index('call_id',inplace=True)
data= (data - data.mean())/data.std()

### Identifying the borrowers with high credit risk
A threshold score of 8 is used to identify borrowers that are deemed to have high credit risk based on bevaiour analytics.

In [52]:
data = borrower_credit_risk_score(data)
high_risk_borrowers= data[data['borrower_credit_risk_score']>=8]
bad_agents.to_csv('../data/high_risk_borrowers.csv')

In [54]:
high_risk_borrowers

Unnamed: 0_level_0,TTR-C-talk-ratio,TTR-silence-incidents,TTR-C-intra-call-change-in-talk-rate,TTR-C-average-streak,TST-C-intra-call-change-in-pitch,S-C-sentiment,S-C-intra-call-change-in-sentiment,TST-C-relative-voice-volume-energy,TST-C-intra-call-change-in-relative-voice-volume-energy,borrower_credit_risk_score
call_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2022,1.0,0.0,0.0,1.0,3.0,1.0,2.0,0.0,2.0,10.0
3170,2.0,0.0,0.0,0.0,1.0,3.0,0.0,1.0,1.0,8.0
3532,2.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,2.0,8.0
3647,2.0,0.0,0.0,0.0,0.0,3.0,1.0,0.0,2.0,8.0
3677,2.0,0.0,0.0,2.0,0.0,1.0,2.0,0.0,1.0,8.0
3855,0.0,2.0,0.0,1.0,0.0,2.0,3.0,0.0,0.0,8.0
4017,1.0,0.0,3.0,0.0,1.0,0.0,2.0,2.0,0.0,9.0
4071,1.0,2.0,0.0,1.0,0.0,2.0,3.0,0.0,0.0,9.0
4163,1.0,0.0,0.0,0.0,2.0,3.0,1.0,1.0,1.0,9.0
4222,0.0,2.0,0.0,1.0,0.0,3.0,2.0,1.0,0.0,9.0
