# CRISP-DM Analysis for Business Problem: Innactivity prediction with transactional data

This notebook is a companion to the Medium article (link bellow) the underlies the application with CRISP-DM methodology to understand, analyze and communicate a business problem through a proven and tested Data Science methodology.

CRISP-DM comprises of 6 steps:

Section 1: Business Understanding

Section 2: Data Understanding

Section 3: Data Preparation

Section 4: Data Modeling

Section 5: Evaluate the Results

Section 6: Deployment

Medium Article:
https://medium.com/@fernandocarliniguimaraes/innactivity-prediction-using-machine-learning-on-transacional-data-642ef7c84674

# Section 1: Business Understanding

The broader business contextualization is laid in the companion Medium Article.
A brief summary of the business undersating is laid out bellow:
A Brazilian Credit Union wishes to preempively predict Mobile phone app innactivity in a six month window. 

The business value of such endeavor lies on: 
- (1) expanding use cases of a dataset (data enrichment may lead to revenue growth); 
- (2) deterring potential customer churn (avoid revenue lost);
- (3) early detection of customer friction (garantee user satisfaction).

The business questions that arise pertaing such objective are:

### Question 1: What are aspects of a transactional dataset that can be used for understanding channel innactivity in a six month window?

### Question 2: Are mono-product-family users more likely to have channel innactivity in a six month window?

### Question 3: Can transactional data alone safely predict channel innactivity in a six month window?
    


# Section 2: Data Understanding

### Credit Union's Transaction Dataset overview

The Credit Union has several client channels. For this project we are looking at only of them: the mobile phone app. It has roughlly 4 million users, with an average of 40–45 Million transactions per month, about 40% of these are financial transacions (like paying a bill) and 60% non-financial (like looking up a bill receipt). Our main goal for the project is avoidind financial transaction innactivity, so we focused onlty on those.

All of these transactions are stored in a main database that is daily ingested in AWS Data Lake. That was the interface I used to query the data and extract it for the project.

The transacional database holds A LOT of information. But, for this project the most vital informations used were:

- Time and date the transacion happend;
- The transaction code;
- The product family the transaction is part of (example: investment application and investment cashout are two different transactions of the same product family).¹
- The Credit Union Member who solicited the transaction;
- The Credit Union the Member is linked to;
- The status of the transacion. Did it complete? Or was it canceled?
- The channel through which the transaction was solicited;

¹ There are 8 main product familys: Channels (managing your self service channel), Checking account (wire transfers), Payments (Government Tribute or company Slips), Bills (Water, Phone, etc), Credit (Loans), Cards (Credit and debit) and PIX(Brazil’s own instant payment financial product), Investments (Long Term Deposits, Market Shares);

For this project I filtered the channel to be only the Mobile App. I also chose 5 medium sized Credit Unions from our system (we have over 140) so as to have a good amount of data, but not too much as to make the processing time too long. And also fixed a six month period to analyze data.

### IMPORTANT OBSERVATION: 
This dataset is quite clean because it’s a high management information system. When we use the filters described above, like the channel filter and completed status filter, we flush out basically anything that could get in our way. The heavier data wrangling necessary is grouping the transaction codes into product families and that is still quite easy to accomplish.

### Exploratory Analysis of the Transactional Database

I have written a second article piece that show cases the method I used for both the exploratory analysis and also the model selection and development. Please check it out the article, specially the <b>Data understanding — What data do we have / need? Is it clean?</b> section for further insight.

Link:
https://medium.com/@fernandocarliniguimaraes/innactivity-prediction-using-machine-learning-on-transacional-data-642ef7c84674

### Disclaimer about Compliance and Confidentiality

Due to company compliance I had to do all of the data wrangling and manipulation on our AWS Data Lake server using Redash running a AWS Athena and AthenaSQL engine. Data was only available for extraction after anonymization. I've included in the repository a SQL file with a pseudo algorhitm that masks the sensible information (like dataset names and columns) and shows how data manipulation was done.

GitHub Repo for this project: https://github.com/nandodsg/Innactivity-Prediction-with-Transactional-Data

##### Queries using during exploratory analysis:
1. pseudo query - exploratory analysis dataset (anonymous).sql :
https://github.com/nandodsg/Innactivity-Prediction-with-Transactional-Data/blob/main/pseudo%20query%20-%20exploratory%20analysis%20dataset%20(anonymous).sql
2. pseudo query - exporatory analysis - churn flags (anonymous).sql : 
https://github.com/nandodsg/Innactivity-Prediction-with-Transactional-Data/blob/main/pseudo%20query%20-%20exporatory%20analysis%20-%20churn%20flags%20(anonymous).sql

# Section 3: Data Preparation

There can be many approches when it comes to modelling this specific business problem. One way to look at is to think of this sixth month innactivity as a kind of “churn” that we would want to predict based on a series of features (predictors). On this approch we could elect a Classifier Model for the problem.

On this solution framing we have to consider our dataset modeling base on individual and not on transactions (the would work for the Time Series model though).

We need one individual per row, with all the features laid out on separate columns. Based on the exploratory analysis I want to construct my dataframe with the following blocks:

- Account Number ID
- Credit Union Number ID
- Sixth Month Innactivity Flag (our future dependent variable)
- A depth counter (number of transacionts) by month and by product family
- An amplitude counter (number of diferente families used) by month
- Total depth counter by month

I extracted the data from the other 4 credit unions I had previously selected. This time bringing in every member who attendend one simple rule: they had to be active on the first by months of 2022. This extraction gave me a 91.848 long dataset, each row representing an unique individual.

### Disclaimer about Compliance and Confidentiality

Due to company compliance I had to do all of the data wrangling and manipulation on our AWS Data Lake server using Redash running a AWS Athena and AthenaSQL engine. Data was only available for extraction after anonymization. I've included in the repository a SQL file with a pseudo algorhitm that masks the sensible information (like dataset names and columns) and shows how data manipulation was done.

GitHub Repo for this project: https://github.com/nandodsg/Innactivity-Prediction-with-Transactional-Data

##### Query used to generate model dataset
1. pseudo query - model dataset.SQL : 
https://github.com/nandodsg/Innactivity-Prediction-with-Transactional-Data/blob/main/pseudo%20query%20-%20model%20dataset.SQL


# Section 4: Data Modeling

The following section details the development of 10 different Classifiers Models aimed at supporting the analyses of the three business questions.

I have written a second article piece that show cases the method I used for both the exploratory analysis and also the model selection and development. Please check it out the article, specially the <b>Modeling — What modeling techniques should we apply?</b> and the <b>Evaluation — Which model best meets the business objectives?</b> sections for further insight.

Link: https://medium.com/@fernandocarliniguimaraes/innactivity-prediction-using-machine-learning-on-transacional-data-642ef7c84674

In [1]:
# Utils class with functions for model development, testing and evaluation
import utils as u
%matplotlib inline

In [2]:
df = u.pd.read_csv('./Model Data Set (pseudo).csv',sep=';')
df.head()

Unnamed: 0,CREDIT_UNION_ID,ACCOUNT_NUM,FLG_202201,FLG_202202,FLG_202203,FLG_202204,FLG_202205,FLG_202206,DEEP_CHANNELS_202201,DEEP_CHANNELS_202202,...,AMP_202203,AMP_202204,AMP_202205,AMP_202206,NUM_TRANSACTIONS_202201,NUM_TRANSACTIONS_202202,NUM_TRANSACTIONS_202203,NUM_TRANSACTIONS_202204,NUM_TRANSACTIONS_202205,NUM_TRANSACTIONS_202206
0,A,ZWZZ!W,1,1,1,1,1,1,0,0,...,3,3,2,4,68,86,130,100,68,112
1,A,&WXYY&,1,1,1,1,1,1,0,0,...,1,2,2,1,14,8,4,24,12,10
2,A,Y%@YZ&,1,1,1,1,1,1,0,0,...,2,2,1,2,12,6,10,12,12,20
3,A,!W%&#!,1,1,1,1,1,1,0,0,...,3,3,2,2,38,24,36,54,72,48
4,A,%##AXY,1,1,1,1,1,1,0,0,...,3,2,3,2,24,12,12,8,6,20


In [3]:
df.shape

(91848, 68)

In [4]:
# We make sure to create a copy of the data before we start altering it. Note that we don't change the original data we loaded.
data = df.copy(deep=False)

# Preparing Global Variables

In [5]:
#Declare independent variables (X) and dependent variable (y)

# To avoid writing them out every time, we save the names of the estimators of our model in a list. 
independent_variables=[#PIX
            'DEEP_PIX_202201',
            'DEEP_PIX_202202',
            'DEEP_PIX_202203',
            'DEEP_PIX_202204',
            'DEEP_PIX_202205',
            #BILLS
            'DEEP_BILLS_202201',
            'DEEP_BILLS_202202',
            'DEEP_BILLS_202203',
            'DEEP_BILLS_202204',
            'DEEP_BILLS_202205',
            #CARDS
            'DEEP_CARDS_202201',
            'DEEP_CARDS_202202',
            'DEEP_CARDS_202203',
            'DEEP_CARDS_202204',
            'DEEP_CARDS_202205',
            #CHECKING
            'DEEP_CHECKING_202201',
            'DEEP_CHECKING_202202',
            'DEEP_CHECKING_202203',
            'DEEP_CHECKING_202204',
            'DEEP_CHECKING_202205',
            #CREDIT
            'DEEP_CREDIT_202201',
            'DEEP_CREDIT_202202',
            'DEEP_CREDIT_202203',
            'DEEP_CREDIT_202204',
            'DEEP_CREDIT_202205',
            #INVESTMENTS
            'DEEP_INVESTMENTS_202201',
            'DEEP_INVESTMENTS_202202',
            'DEEP_INVESTMENTS_202203',
            'DEEP_INVESTMENTS_202204',
            'DEEP_INVESTMENTS_202205',
            #PAYMENTS
            'DEEP_PAYMENTS_202201',
            'DEEP_PAYMENTS_202202',
            'DEEP_PAYMENTS_202203',
            'DEEP_PAYMENTS_202204',
            'DEEP_PAYMENTS_202205',
            #AMPLITUDE
            'AMP_202201',
            'AMP_202202',
            'AMP_202203',
            'AMP_202204',
            'AMP_202205'
           ]

X = data[independent_variables]
y = data['FLG_202206']

In [None]:
#set shared model, scaler and splitter variables
random_state = 26
test_size = 0.15
verbose = 'off'
#set model names
models = [#'Random Forest',
          'Logistic Regression',
         ]
#set resampling method names
resamplers = [
              'Baseline',
              'Random Over Sampling',
              'SMOTE',
              'Near Miss KNN',
              'Random Under Sampling',
             ]

# Handling class imbalance

We know from our exploratory analysis that this dataset will be havily imbalanced with churn on 6th month as the minority class (represented as inactivity on that month or FLG_202206 = 0).

The problem with classifiers and class imbalance is that the classifier will more easily classify the majority class, simply because most cases are of that class. For that reason model performance metrics have to be carefully selected. Precision, recall and F1 will be used as the main metrics for evaluating performance. In our specfic case we our most interested in those metrics regarding the prediction of the minority class (0 in our case).

So in this study we will contrast the use of two wildly used classification models: Logistic Regression and RandomTreeClassifier, both with SciKit Learn implementations. Tree Ensembles our suposabily better at handling inbalance. And a common technique for getting better results is using resampling techniques. For that we will contrast model metrics on baseline models with resampled models (RandomOverSampling, SMOTE and NearMisses)


Reference:

https://medium.com/grabngoinfo/four-oversampling-and-under-sampling-methods-for-imbalanced-classification-using-python-7304aedf9037

https://towardsdatascience.com/a-look-at-precision-recall-and-f1-score-36b5fd0dd3ec

https://www.analyticsvidhya.com/blog/2020/10/overcoming-class-imbalance-using-smote-techniques/

## Processing and evaluating models

In [77]:
#Instantiate empty list to hold models for coeficient evaluation 
model_prediction = []

model_scores_table = u.pd.DataFrame()
model_scores_table['Scores'] = ['Model','Model Name','Resampler Name','TN','FP','FN','TP','Precision 0','Precision 1','Recall 0','Recall 1','F1-Score 0','F1-Score 1','Support 0','Support 1']

for model_name in models:
    for resampler_name in resamplers:
        model,cr,cm,precision,recall,fbeta_score,support = u.model_predict(model_name,resampler_name,X,y,random_state,test_size,verbose)
        models_scores_append = [model,model_name,resampler_name,cm[0][0],cm[0][1],cm[1][0],cm[1][1],precision[0],precision[1],recall[0],recall[1],fbeta_score[0],fbeta_score[1],support[0],support[1]]
        
        model_scores_table[model_name+' '+resampler_name] = models_scores_append
        model_prediction.append((model,model_name,resampler_name))

# Create Model Coeficient Table
Model_Coef_Table = u.Model_Coef_Table(model_prediction, X)


--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

 Random Forest with Baseline  Classification Report:
              precision    recall  f1-score   support

           0       0.08      0.01      0.02       423
           1       0.97      1.00      0.98     13355

    accuracy                           0.97     13778
   macro avg       0.53      0.50      0.50     13778
weighted avg       0.94      0.97      0.95     13778

[[    5   418]
 [   54 13301]]

Total processing time: --- 11.438204526901245 seconds ---

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

 Random Forest with Random Over Sampling  Classification Report:
              precision    recall  f1-score   support

           0       0.09      0.02      0.03       423
           1       0.97     

In [76]:
from importlib import reload  # Python 3.4+
import utils as u

u = reload(u)

# Section 5: Evaluate the Results

This section will split up into separate analyses for each business question.
Each section will be comprised of a brief analysis, and evaluation and conclusion.

### Question 2: Are mono-product-family users more likely to have channel innactivity in a six month window?

### Question 3: Can transactional data alone safely predict channel innactivity in a six month window?

## Question 1: What are aspects of a transactional dataset that can be used for understanding channel innactivity in a six month window?

The objetctive behind this question is to understand what predictors from our transactional have the higest impact on model performance.

We will:

(1) Check de Model Scores to identify best models;

(2) We will verify which features had greater impact on our best models;

(3) We will then reavaluate our models using only the best predictor to check if performance boosts up.

##### Quick Primer com classifier evaluation scores

Quick Primer on reading the Confusion Matrix and Classification report measures
How to read the quadrants of the matrix:

True Negative | False Positive

False Negative | True Positive

Precision
Measure of how many of the positive predictions made are correct (true positives).
Formula: TP/(TP+FP)

Recall
Measure of how many of the positive cases the classifier correctly predicted considering the over all positive cases in the data.
It is sometimes also referred to as Sensitivity
Formula: TP/(TP+FN)

f1-Score
Harmonic mean of precision and recall

Accuracy
Measure of the number of correct predictions over all predictions
Formula: (TP+TN)/(TP+TN+FP+FN)

In [99]:
# Let's transpose the Model Scores Table to get a better look
model_scores_table_T = model_scores_table.T
# Now let's promote the first row as header and drop the index
model_scores_table_T = model_scores_table_T.rename(columns=model_scores_table_T.iloc[0]).drop(model_scores_table_T.index[0]).reset_index(drop=True)
# Let's clean out the score related to predicint the majority class (1)
model_scores_table_T = model_scores_table_T.drop(columns=['Precision 1','Recall 1','F1-Score 1'])
model_scores_table_T

Unnamed: 0,Model,Model Name,Resampler Name,TN,FP,FN,TP,Precision 0,Recall 0,F1-Score 0,Support 0,Support 1
0,"(DecisionTreeClassifier(max_features='sqrt', r...",Random Forest,Baseline,5,418,54,13301,0.084746,0.01182,0.020747,423,13355
1,"(DecisionTreeClassifier(max_features='sqrt', r...",Random Forest,Random Over Sampling,9,414,89,13266,0.091837,0.021277,0.034549,423,13355
2,"(DecisionTreeClassifier(max_features='sqrt', r...",Random Forest,SMOTE,60,363,391,12964,0.133038,0.141844,0.1373,423,13355
3,"(DecisionTreeClassifier(max_features='sqrt', r...",Random Forest,Near Miss KNN,5,418,54,13301,0.084746,0.01182,0.020747,423,13355
4,"(DecisionTreeClassifier(max_features='sqrt', r...",Random Forest,Random Under Sampling,327,96,3337,10018,0.089247,0.77305,0.16002,423,13355
5,"LogisticRegression(class_weight='balanced', ra...",Logistic Regression,Baseline,356,67,4139,9216,0.079199,0.841608,0.144774,423,13355
6,"LogisticRegression(class_weight='balanced', ra...",Logistic Regression,Random Over Sampling,357,66,4101,9254,0.080081,0.843972,0.146281,423,13355
7,"LogisticRegression(class_weight='balanced', ra...",Logistic Regression,SMOTE,356,67,3960,9395,0.082484,0.841608,0.150243,423,13355
8,"LogisticRegression(class_weight='balanced', ra...",Logistic Regression,Near Miss KNN,356,67,4139,9216,0.079199,0.841608,0.144774,423,13355
9,"LogisticRegression(class_weight='balanced', ra...",Logistic Regression,Random Under Sampling,349,74,4290,9065,0.075232,0.825059,0.13789,423,13355


In [79]:
print('Maximum: ')
print(model_scores_table_T.drop(columns=['Model','Model Name','Resampler Name']).max(axis = 0))
print('\nMinimum: ')
print(model_scores_table_T.drop(columns=['Model','Model Name','Resampler Name']).min(axis = 0))

Maximum: 
TN                  357
FP                  418
FN                 4290
TP                13301
Precision 0    0.133038
Precision 1    0.992919
Recall 0       0.843972
Recall 1       0.995957
F1-Score 0      0.16002
F1-Score 1     0.982566
Support 0           423
Support 1         13355
dtype: object

Minimum: 
TN                    5
FP                   66
FN                   54
TP                 9065
Precision 0    0.075232
Precision 1    0.969531
Recall 0        0.01182
Recall 1       0.678772
F1-Score 0     0.020747
F1-Score 1     0.805993
Support 0           423
Support 1         13355
dtype: object


## Classfication Score analysis

All models had precision scores for the minority class (Precision 0) ranging from 0.07 to 0.13, recall from 0.01 to 0.85 and f1-score from 0.02 to 0.16. This is actually a rather distressing signal that, in overall, none of the models did a remarkable job at predicting innactivity. They were pretty simple bad at it.

One of most interesting difference between models are seeable through confusion matrices' True Negatives, False Positives, False Negatives and True Positives scores. 

The best case scenario shows that the models actually did an interesting job of predicting 357 cases (check Max True Negatives) of the 423 (check Support 0) innactivity targets in the test set. That score was achieved by Logistic Regression with Random Under Sampling model, which,not surpriselingy, had also the higest Recall 0.

This means the model is more confident at trying to predict the minority cases. There are a few models that had really low negative predictions as whole like the Random Forest Baseline and Near Miss KNN who score lowest practically didn’t even try to predict the minority cases (only 0,04% of predictions were for the minority class).

For this step of the process we will elect the Models with higher Minority cases predictions. We see their Recall scores are high, but F1-Score and Precision are low. Further studies can look it fine tunning these models to try to reduce the False Negative scores, maybe using differnt class weights, penalization and solver methods.

To continue on this analysis I will choose the two models with high True Negatives but with their pairs lowest False Negative predictions. These are the Random Forest with Random Under Sampling and Logistic Regression with SMOTE. We will look their coeficients to get a feel.

In [17]:
# Check the Model Coeficient Table
Model_Coef_Table

Unnamed: 0,Features,CoefRandom ForestBaseline,CoefRandom ForestRandom Over Sampling,CoefRandom ForestSMOTE,CoefRandom ForestNear Miss KNN,CoefRandom ForestRandom Under Sampling,CoefLogistic RegressionBaseline,CoefLogistic RegressionRandom Over Sampling,CoefLogistic RegressionSMOTE,CoefLogistic RegressionNear Miss KNN,CoefLogistic RegressionRandom Under Sampling
0,DEEP_PIX_202201,0.05941,0.059683,0.047388,0.05941,0.055802,-0.108326,-0.094238,-0.134778,-0.108326,-0.195564
1,DEEP_PIX_202202,0.057143,0.057625,0.045647,0.057143,0.054875,-0.244472,-0.257599,-0.288997,-0.244472,-0.196759
2,DEEP_PIX_202203,0.065239,0.06635,0.050336,0.065239,0.060874,-0.296879,-0.29082,-0.303673,-0.296879,-0.315867
3,DEEP_PIX_202204,0.076184,0.071977,0.056631,0.076184,0.073048,-0.206332,-0.210498,-0.226016,-0.206332,-0.202132
4,DEEP_PIX_202205,0.147647,0.151696,0.123374,0.147647,0.150976,2.584145,2.573109,3.268187,2.584145,2.595444
5,DEEP_BILLS_202201,0.018865,0.019921,0.017487,0.018865,0.018165,-0.149018,-0.159326,-0.14537,-0.149018,-0.257109
6,DEEP_BILLS_202202,0.017776,0.017891,0.015607,0.017776,0.018118,-0.071881,-0.065141,-0.046134,-0.071881,-0.0628
7,DEEP_BILLS_202203,0.018481,0.018387,0.016546,0.018481,0.01726,-0.103426,-0.109691,-0.1523,-0.103426,-0.115335
8,DEEP_BILLS_202204,0.020911,0.02064,0.023597,0.020911,0.019468,0.126906,0.135689,0.136148,0.126906,0.126297
9,DEEP_BILLS_202205,0.037868,0.03901,0.040325,0.037868,0.042316,0.45189,0.462249,0.669784,0.45189,0.483864


In [11]:
#Declare independent variables (X) and dependent variable (y)

# Based on our prior analysis, I've decided to test the perfomance of the models with less predictors. 
# So will drop most of them and leave only the top 10.

# To avoid writing them out every time, we save the names of the estimators of our model in a list. 
independent_variables=[#PIX
            'DEEP_PIX_202205',
            'DEEP_BILLS_202204',
            'DEEP_BILLS_202205',
            'DEEP_CARDS_202205',
            'DEEP_CHECKING_202205',
            'DEEP_PAYMENTS_202205',
            'AMP_202202',
            'AMP_202203',
            'AMP_202204',
            'AMP_202205'
           ]

X1 = data[independent_variables]
y1 = data['FLG_202206']

In [12]:
#set shared variables
random_state = 26
test_size = 0.15
verbose = 'off'
#set model names
models = ['Random Forest',
          'Logistic Regression',
         ]
#set resampling names
resamplers = [
              'Baseline',
              'Random Over Sampling',
              'SMOTE',
              'Near Miss KNN',
              'Random Under Sampling',
             ]

#Instantiate empy models
model_prediction_revisited = []

for model_name in models:
    for resampler_name in resamplers:
        model,cr,cm,precision,recall,fbeta_score,support = u.model_predict(model_name,resampler_name,X1,y1,random_state,test_size,verbose)
        model_prediction_revisited.append((model,cr,cm,precision,recall,fbeta_score,support,model_name,resampler_name))
        



--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

 Random Forest with Baseline  Classification Report:
              precision    recall  f1-score   support

           0       0.11      0.38      0.17       423
           1       0.98      0.90      0.94     13355

    accuracy                           0.88     13778
   macro avg       0.54      0.64      0.55     13778
weighted avg       0.95      0.88      0.91     13778

[[  162   261]
 [ 1337 12018]]

Total processing time: --- 4.432544708251953 seconds ---

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

 Random Forest with Random Over Sampling  Classification Report:
              precision    recall  f1-score   support

           0       0.10      0.41      0.16       423
           1       0.98      

In [13]:
# Create Model Coeficient Table
Model_Coef_Table = u.Model_Coef_Table(model_prediction_revisited,X1)

RandomForestClassifier(class_weight='balanced', random_state=26)
RandomForestClassifier(class_weight='balanced', random_state=26)
RandomForestClassifier(class_weight='balanced', random_state=26)
RandomForestClassifier(class_weight='balanced', random_state=26)
RandomForestClassifier(class_weight='balanced', random_state=26)
LogisticRegression(class_weight='balanced', random_state=26)
LogisticRegression(class_weight='balanced', random_state=26)
LogisticRegression(class_weight='balanced', random_state=26)
LogisticRegression(class_weight='balanced', random_state=26)
LogisticRegression(class_weight='balanced', random_state=26)


# Evaluation

Though the exploratory analysis indicated the possibily of finding correlation between transaction patterns and innactivity, the two classifiers and 4 resampling techniques used did not present good performance on this highly imbalanced dataset. 

The models just didn't perform well! Unfortunatelly. But hey, this is a scientific approach, know that something doesn't work is also a valid result, it just brushes off the false positives from your line of sight.

All models had precision scores ranging from 0.08 to 0.09, recall from 0.83 to 0.85 and f1-score at exactlly 0.15. The main difference seeable at the confusion matrix, with slight differences on the true/false positive/negative predictions. The RandomForest with Random Under Sampling had similiar measures: precision at 0.09, recall at 0.77 and f1-score at 0.16.
Exemple of Classification Report and Confusion Matrix for the Logistic Regression with Baseline model.

The models actually did an interesting job of predicting 325+ cases of the 423 innactivity targets in the test set (you can see that looking at the confusion matrix's top left quadrant, 358 in the example above). That is why the Recall (or sensitivity) is high. 

This means the model is more confident at trying to predict the minority cases (the Random Forest Baseline practically didn't even try to predict the minority cases, in the report in only classified 15 as negatives, and 14 of them were flase - check the print screen bellow).

# Conclusion

Unfortunately this project doesn't seem to provide strong evidence towards answering either positively or negatively the business question provided.

Our exploratory analysis show their is a potential correlation to be explored between innactivaty, depth (specially PIX) and amplitude. But, the use of classifier models, at least with the present configuration, haven't presented promising results.

### Recommendations on future studies

1. Study the use of time series prediction techniques as a subsititue for Classifiers
2. Use the accumlative transactional variation on 5 months prior to the 6th month innactivity prediction may wielf better results than using the absolute number of transations per month as features.