The main focus of this notebook is Data Visualization. Here we have a dataset of Banking Customer data. We have various attributes of a customer like Age, Gender, Creditscore to predict the churn. We need to find out hidden patterns of churns.

First we need to import the libraries.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import warnings
warnings.filterwarnings("ignore", category=UserWarning) 

In [None]:
#1.6 Display outputs of multiple commands from a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Let's read the dataset in our panda dataframe.

In [None]:
df = pd.read_csv('../input/predicting-churn-for-bank-customers/Churn_Modelling.csv')

Let's do some basic checks of the dataset.

In [None]:
df.head()
print('------------ Dataset Info -------------')
df.info()
print('\n This dataset has {0} rows and {1} columns'.format(df.shape[0],df.shape[1]))
print('\n The Columns in this Dataset are')
list(df.columns.values)

### Missing Values
 fortunatly this dataset doesnt have any missing values.

In [None]:
df.isnull().sum()

Columns like Row Number, Customer ID and Surname are not going to help us in finding the churn pattern as these are mostly unique values to each customers.

In [None]:
df.drop(columns=['RowNumber', 'CustomerId', 'Surname'], inplace = True)
df_1 = df

In [None]:
df.head()

we can see that columns like NumOfProducts, HasCrCard and IsActiveMember are numerical but these are actually categorical variables.
Lets change these into categorical variables.

In [None]:
df[['NumOfProducts','HasCrCard','IsActiveMember','Exited']]= df[['NumOfProducts','HasCrCard','IsActiveMember','Exited']].astype('object')

In [None]:
df.dtypes

In [None]:
print('The Numerical variables are ')
list(df.select_dtypes(include=[np.number]).columns.values)
num_col= df.select_dtypes(include=[np.number]).columns.values
print('The Categorical variables are ')
list(df.select_dtypes(include=[np.object]).columns.values)
cat_col= df.select_dtypes(include=[np.object]).columns.values
#print('There are {0} numerical columns and {1} categorical columns'.format())        

In [None]:
df.describe().T

## Data Visualization
 Need for Visualization
   1. Understanding the trends and patterns of data.
   2. Analyze the frequency and other such characteristics of the data.
   3. Know the distribution of the variables in the data.
   4. Visualize the relationship that may exist between different variables

### Data Distribution Plots
   Data distribution plots shows the distribution in data like the central tendency,skewedness, any significant outliers.
Lets see some of the distribution plots

   #### Histogram for Numerical Variables

In [None]:
num_col

In [None]:
plt.figure(figsize = (15,10))
for i in enumerate(num_col):
    plt.subplot(3,2, i[0]+1)
    _= sns.histplot(df, x=i[1], hue = 'Exited',multiple="dodge")

Observation from the above histogram
  1. Balances of the customers are seemed to be symmetrically distributed. Credit Score seems like left skewed.
  2.There is not much variation in Estimated salary. All the values are lying between 300k to 400k. This variable may not be very helpful in finding out the churn.
  3. Most of our custmers are between the age of 28 to 40.

In [None]:
_= sns.displot(df, x="CreditScore", col="Exited", multiple="dodge")

In [None]:
_= sns.displot(df, x="Age", col="Gender",hue = 'Exited', multiple="dodge")

Observation for the above plot
    Female customers between the age 50 to 60 are more likely to churn.

In [None]:
plt.figure(figsize= (10,10))
_= sns.displot(df, x="Tenure", y="Balance", hue="Exited", kind= 'kde')

In [None]:
#pair plot for numerical columns only
sns.pairplot(df[num_col])

##### Joint Plot for two numerical Variables

In [None]:
_ = sns.jointplot(data = df, x='Balance', y = 'EstimatedSalary')

In [None]:
g = sns.jointplot(data = df, x='CreditScore', y = 'Age')
g.plot_joint(sns.histplot)
#g.plot_marginals(sns.boxplot)

#### Count Plot for Categorical Variables

In [None]:
fig, ax= plt.subplots(2, 3, figsize=(20,12))
_=sns.countplot(x='Gender', data = df,hue='Exited', ax= ax[0][0])
_=sns.countplot(x='Geography', data = df,hue='Exited', ax= ax[0][1])
_=sns.countplot(x='NumOfProducts', data = df,hue='Exited', ax= ax[0][2])
_=sns.countplot(x='HasCrCard', data = df,hue='Exited', ax= ax[1][0])
_=sns.countplot(x='IsActiveMember', data = df,hue='Exited', ax= ax[1][1])
_=sns.countplot(x='Exited', data = df, ax= ax[1][2])

Observation from the ablove plots
   1. Female customers are leaving the bank more than the male customers. 
   2. In the given data, Germany is having more churn ratio than spain and France. And France is having lowest ration of churn.
   3. Churn in customers having only 1 product is more. And Customers having more than 2 products are more likly to leave.
   4. Inactive customers are leaving the bank more than those active customers.
   5. This is an imbalanced dataset as we have only 2000 customers who exited and 8000 customers who stayed.

Lets segregate all the customers according to the geography and balance more than zero. 

In [None]:
df_fr= df[(df['Geography'] == 'France') & (df['Balance'] != 0)]
df_sp= df[(df['Geography'] == 'Spain') & (df['Balance'] != 0)]
df_Gr= df[(df['Geography'] == 'Germany') & (df['Balance'] != 0)]

#### BOx Plot

In [None]:
plt.figure(figsize=(15,8))
plt.subplot(1,3,1)
_= sns.boxplot(x= 'Gender', y= 'Age', data = df_Gr, hue = 'Exited')
plt.title('For Germany')
plt.subplot(1,3,2)
_= sns.boxplot(x= 'Gender', y= 'Age', data = df_fr, hue = 'Exited')
plt.title('For France')
plt.subplot(1,3,3)
_= sns.boxplot(x= 'Gender', y= 'Age', data = df_sp, hue = 'Exited')
plt.title('For Spain')

#### Bar Plot
Bar plots are classic. We get an estimate of central tendency for a numerical variable for each class on the x axis. 

In [None]:
plt.figure(figsize=(10,5))
_= sns.barplot(x= 'Tenure', y= 'Balance', data = df, hue = 'Exited')

In [None]:
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
_= sns.barplot(x= 'HasCrCard', y= 'EstimatedSalary', data = df, hue = 'Exited')
plt.subplot(1,2,2)
_= sns.barplot(x= 'HasCrCard', y= 'CreditScore', data = df, hue = 'Exited')

No conclusive observation from the above plot. Estimated Salary doesnt have any relationship with any other variables.

##### Categorical Scatter PLot(Strip Plot)

In [None]:
plt.figure(figsize=(10,7))
_=sns.stripplot(x='NumOfProducts', y='CreditScore', data=df, hue='Exited', dodge=True, palette='viridis')

In [None]:
plt.figure(figsize=(10,7))
sns.swarmplot(x='Gender', y='CreditScore', data=df, hue='Exited', dodge=True, palette='viridis')

In [None]:
plt.figure(figsize=(15,7))
sns.catplot(x="Tenure", y="CreditScore", hue="Exited",
            col="IsActiveMember", aspect=.7,
            kind="swarm", data=df)

In [None]:
sns.catplot(x="Geography", y="Balance", hue="Exited",
            col="NumOfProducts", aspect=.7,
            kind="bar", data=df)

In [None]:
sns.catplot(x="Gender", y="Balance", hue="Exited",
            col="HasCrCard", aspect=.7,
            kind="bar", data=df)

In [None]:
sns.relplot(x = 'Age',y = 'Balance',hue = 'Exited',kind = 'scatter',data = df,cmap = 'winter')

In [None]:
_= sns.relplot(x = 'CreditScore',
            y = 'Balance',
            hue = 'Exited',
            kind = 'scatter',
            data = df,
            cmap = 'winter')

In [None]:
_= sns.relplot(x = 'CreditScore',
            y = 'EstimatedSalary',
            hue = 'Exited',
            kind = 'scatter',
            data = df,
            cmap = 'winter')

#### Plots for finding the Structure in Data

we are going to plot below three plots.
    
    1.Parallel Coordinates
    2. Radviz
    3. Andrews Curves

In [None]:
df_1.drop(columns=['Geography','Gender'], inplace = True, axis =1)

In [None]:
_= pd.plotting.parallel_coordinates(
                                     df_1,
                                     'Exited',
                                     colormap='winter'
                                    )

In [None]:
_ = pd.plotting.radviz(
                        df_1,
                        class_column ='Exited',
                        colormap= 'winter'
                      )

In [None]:
_=pd.plotting.andrews_curves(
                              df_1,
                              'Exited',
                             colormap = 'winter'
                            )

To be Continued...