<a href="https://colab.research.google.com/github/jojoconverteo/Evaneos/blob/main/Evaneos_Data_visualisation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://www.offremedia.com/sites/default/files/vignette/article/converteo-logo.png)

# **Cours classification partie 2 : Data visualisation**


-------------------------------

Context

With the rapid development of telecommunication industry, the service providers are inclined more towards expansion of the subscriber base. To meet the need of surviving in the competitive environment, the retention of existing customers has become a huge challenge. It is stated that the cost of acquiring a new customer is far more than that for retaining the existing one. Therefore, it is imperative for the telecom industries to use advanced analytics to understand consumer behavior and in-turn predict the association of the customers as whether or not they will leave the company.

Content
This data set contains customer level information for a telecom company. Various attributes related to the services used are recorded for each customer.

Inspiration
Some possible insights could be -

What variables are contributing to customer churn?
Who are the customers more likely to churn?
What actions can be taken to stop them from leaving?

-------------------------------

**Dictionnaire :**

Churn (Cible) :
-  1 if customer cancelled service, 0 if not

AccountWeeks : 
- number of weeks customer has had active account


DataPlan : 
- 1 if customer has data plan, 0 if not

DataUsage : 
 - gigabytes of monthly data usage


CustServCalls : 
- number of calls into customer service


DayMins :
- average daytime minutes per month


DayCalls : 
- average number of daytime calls


MonthlyCharge :
- average monthly bill


OverageFee :
- largest overage fee in last 12 months

ContractRenewal :
- 1 if customer recently renewed contract, 0 if not


RoamMins : 
- average number of roaming minutes [LE ROAMING, C'EST QUOI ?](https://www.sfrbusiness.fr/room/communications-unifiees/roaming-c-est-quoi.html)

In [None]:
#@title


!pip install plotly --upgrade
!pip install -U pandas_profiling

from pandas_profiling import ProfileReport
from google.colab import drive
import os 
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import typing
from typing import List
import numpy as np


import warnings
warnings.filterwarnings('ignore')

def func_create_noise(df_train_data: pd.DataFrame, coloumns_cat_2_category: List) -> pd.DataFrame:
  """
  Fonction qui permet de creer du bruit


  Parameters:
  ----------------------------
    df_train_data: pd.DataFrame 
    Dataframe d'entree 


  Return:
  -----------------------------
    df_train_data_suffle: pd.DataFrame
    Dataframe modifier  

  """
  dict_create_noise_columns_cat = {1 : 'Yes', 0 : 'No'}
  df_train_data_suffle = df_train_data.sample(frac=1)

  for col in columns_cat_2_category: 
    df_train_data_suffle[col] = df_train_data_suffle[col].apply(lambda x: dict_create_noise_columns_cat[x])

  return df_train_data_suffle




drive.mount('/content/drive')
sep = os.sep
str_path_to_file = f"/content/drive/My Drive/Cours Data/Classification/Data/telecom_churn.csv"

df_train_data_suffle = pd.read_csv(str_path_to_file, encoding='ascii')

columns_cat_2_category=['Churn', 'ContractRenewal', 'DataPlan']
df_train_data_suffle = func_create_noise(df_train_data_suffle, columns_cat_2_category)
df_train_data_suffle = df_train_data_suffle.sample(frac=1)

In [None]:
#@title

df_train_data_suffle_cat = df_train_data_suffle.select_dtypes(include=object)
df_train_data_suffle_num = df_train_data_suffle.select_dtypes(exclude=object)

## Data visualisation
----------------------------------



In [None]:
#@title

df_churn = pd.DataFrame(df_train_data_suffle_cat.Churn.value_counts().reset_index())
fig = px.bar(df_churn, x='index', y='Churn')
fig.show() 

In [None]:
#@title

df_ContractRenewal = pd.DataFrame(df_train_data_suffle_cat.ContractRenewal.value_counts().reset_index())
fig = px.bar(df_ContractRenewal, x='index', y='ContractRenewal')
fig.show()

In [None]:
#@title

df_DataPlan = pd.DataFrame(df_train_data_suffle_cat.DataPlan.value_counts().reset_index())
fig = px.bar(df_DataPlan, x='index', y='DataPlan')
fig.show()

In [None]:
#@title

df_train_data_suffle_cat['Color'] = df_train_data_suffle_cat.Churn.apply(lambda x : 1 if x == 'Yes' else 0)

fig = px.parallel_categories(df_train_data_suffle_cat, color="Color", dimensions=['DataPlan', 'Churn', 'ContractRenewal'], color_continuous_scale=px.colors.sequential.Inferno)

fig.show()

df_train_data_suffle_cat.drop('Color', axis=1, inplace=True)

In [None]:
#@title

df_crosstab_dataplan_X_Churn = pd.crosstab(df_train_data_suffle_cat.Churn, df_train_data_suffle_cat.DataPlan, normalize=True)

fig = px.imshow(df_crosstab_dataplan_X_Churn)
fig.show()

In [None]:
#@title

df_crosstab_dataplan_X_Churn = pd.crosstab(df_train_data_suffle_cat.Churn, df_train_data_suffle_cat.ContractRenewal, normalize=True)

fig = px.imshow(df_crosstab_dataplan_X_Churn)
fig.show()

In [None]:
#@title

list_num_columns = df_train_data_suffle_num.columns

for col in list_num_columns:
  fig = px.histogram(df_train_data_suffle, x=col, color="Churn", marginal="box",
                    hover_data=df_train_data_suffle.columns)
  fig.show()

In [None]:
#@title

corr = df_train_data_suffle_num.corr()
fig = px.imshow(corr)
fig.show()

In [None]:
#@title

fig = px.density_contour(df_train_data_suffle, x="DayMins", y="MonthlyCharge", color="Churn", facet_col="Churn")
fig.show()

In [None]:
#@title

fig = px.density_contour(df_train_data_suffle, x="MonthlyCharge", y="CustServCalls", color="Churn", facet_col="Churn")
fig.show()

In [None]:
#@title

profile = ProfileReport(df_train_data_suffle, title='Churn analysis')
profile.to_notebook_iframe()

HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max=26.0), HTML(value='')))




HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(HTML(value='Render HTML'), FloatProgress(value=0.0, max=1.0), HTML(value='')))




### ACP : 



In [None]:
#@title

from sklearn.decomposition import PCA

pca = PCA()
components = pca.fit_transform(df_train_data_suffle_num)

labels = {
    str(i): f"PC {i+1} ({var:.1f}%)"
    for i, var in enumerate(pca.explained_variance_ratio_ * 100)
}

fig = px.scatter_matrix(
    components,
    labels=labels,
    dimensions=range(4),
    color=df_train_data_suffle["Churn"]
)
fig.update_traces(diagonal_visible=False)
fig.show()



In [None]:
#@title

pca = PCA(n_components=2)
components = pca.fit_transform(df_train_data_suffle_num)

loadings = pca.components_.T * np.sqrt(pca.explained_variance_)

fig = px.scatter(components, x=0, y=1, color=df_train_data_suffle['Churn'])

features = df_train_data_suffle_num.columns.to_list()

for i, feature in enumerate(features):
    fig.add_shape(
        type='line',
        x0=0, y0=0,
        x1=loadings[i, 0],
        y1=loadings[i, 1]
    )
    fig.add_annotation(
        x=loadings[i, 0],
        y=loadings[i, 1],
        ax=0, ay=0,
        xanchor="center",
        yanchor="bottom",
        text=feature,
    )
fig.show()