# AD&D Churn Analysis And Modeling


To:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Magnimind](https://magnimindacademy.com/)

From: Matt Curcio, matt.curcio.us@gmail.com

Date: 2022-12-27

Re:&nbsp;&nbsp;&nbsp;&nbsp; Churn Analysis from 10/5/2022 to 11/5/2022

---

## PCA Bigram of Churn Data Using sklearn.decomposition & Plotly

- The SKLearn package for decomposition of vectors using Plotly does not provide the same level of information that was easily gathered from the [Erdogant_PCA_Library](https://erdogant.github.io/pca/pages/html/index.html). 

- It is my reccommendation that the [Erdogant_PCA_Library](https://erdogant.github.io/pca/pages/html/index.html) be used **over** this SKLean package.


### Review of `scikit-learn PCA`

This notebook is a test (*of my own*) to produce and quickly review the bi-grams of two python libaries, 
1. [sklearn.decomposition from scikit-learn PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)
   
2. [Erdogant-PCA](https://erdogant.github.io/pca/pages/html/index.html)
   
The techincal notes for `sklearn.decomposition from scikit-learn PCA` state that the library uses the  LAPACK implementation of the full SVD and the can also use the scipy.sparse.linalg ARPACK implementation of the truncated SVD.\* 

\* [sklearn.decomposition from scikit-learn PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)

While this is fine, (IMHO) it does not make retrieving the loadings and graphing them easy.

1. One must first list the features to be investigated
   
2. Instantiate then transform the data, which is common proactice for most SKLearn modules.
   
3. Then the loadings need to calculated and graphed individually onto the graphic. 
   
The benefit of this process was that I was able to plotly to view all the data points graphed along with the loading vectors. Plotly allows researchers to zoom in and out to view the relationships of the loadings. 

### Results: 

- Only Three loading vectors were seen as significant in length and direction.
  1. T_D_Min   
  2. T_E_Min   
  3. T_N_Min 


Future work: Graphing the values of each loading near the vectors is one hurdle I will undertake.


### Introduction

Principal Component Analysis is an unsupervised dimensionality reduction technique. It is a maximization and ranking of the variances found a high dimension dataset in order to:

1. determine where multi-colinearity exists between the features/variables
   
2. determine when the number of feature dimensions  is large, for simplification
   
3. remove noise from a signal and/or data compression

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [4]:
# Cleaned Data
path = '../data/processed/'
fileName = "mcc_clean_churn.csv"

# Data pre-processed in ADD_Initial_Data_Analysis notebook
df = pd.read_csv(path+fileName,header= 0)

# Convert 3 vars to categorical
df['Int_Plan'] = pd.Categorical(df['Int_Plan'])
df['VM_Plan'] = pd.Categorical(df['VM_Plan'])
df['Churned'] = pd.Categorical(df['Churned'])

In [12]:
features = ['Act_Len','Int_Plan','VM_Plan','Num_VM','T_D_Min','T_D_Calls','T_D_Charge','T_E_Min','T_E_Calls','T_E_Charge','T_N_Min','T_N_Calls','T_N_Charge','T_I_Min','T_I_Calls','T_I_Charge','Num_Srv_Calls']
X = df[features]

pca = PCA(n_components=2)
components = pca.fit_transform(X)

loadings = pca.components_.T * np.sqrt(pca.explained_variance_)

fig = px.scatter(components, x=0, y=1, 
                 title='PCA Bigram of PC1 vs PC2',
                 opacity=0.1, color=df['Churned']).update_layout(
                 xaxis_title="Principal Component 1", 
                 yaxis_title="Principal Component 2"
)

for i, feature in enumerate(features):
    fig.add_shape(
        type='line',
        x0=0, y0=0,
        x1=loadings[i, 0],
        y1=loadings[i, 1]
    )
    fig.add_annotation(
        x=loadings[i, 0],
        y=loadings[i, 1],
        ax=0, ay=0,
        xanchor="center",
        yanchor="bottom",
        text=feature,
    )
fig.show()