# Install Library

[ppscore](https://github.com/8080labs/ppscore) is a library released Apr 2020 that can be used for:
- Finding data patterns
- Feature selection
- Detecting data leakage

Let's check how it works.

In [None]:
!pip install ppscore

# Import Library

In [None]:
import numpy as np 
import pandas as pd 
import ppscore as pps
import seaborn as sns
import matplotlib.pyplot as plt
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
PATH ='/kaggle/input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv'
df = pd.read_csv(f'{PATH}')
df.shape

In [None]:
list(df.columns)

In [None]:
df.dtypes

# Get PPS Scores

In [None]:
%%time
pps.score(df, 'InternetService', 'Churn')

In [None]:
%%time
pps.score(df, 'PaymentMethod', 'Churn')

# Get a PPS matrix

In [None]:
%%time
df_matrix = pps.matrix(df)

 We can directly get predictive power score(PPS) between both categorical and numerical features which is not possible for pearson correlation matrix analysis.
 It is also possible to capture nonlinear relationship among the features.
 
 From the following PPS matrix, we find:
 
 - Within the row of "Churn", we see that "tenure", "MontlyCharges" and "TotalCharges" have relatively high PPS, and this is intuitively reasonable
 - Within the row of "MonthlyCharges", "InternetServices", "OnlineSecurity", and "StreamTV" etc. that may cost each customer corresponding charges have strong relation with the "MonthlyCharges". These features are also have relatively high PPS which means that they may have similar information.

In [None]:
plt.figure(figsize=(18,18))
sns.heatmap(df_matrix, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
plt.show()