# Telecom Churn Group Project

This notebook is split into following section

<b>* Common Function</b>
    * This section we have all the common functions which are used in the entire notebook
<b> * Basic Data Analysis and Null Value imputation </b>
    * Drop Columns with more than 50% NULL Values
    * Handle columns with less number of NULL Values
    * Drop columns having not informative information
    * Create Dummy Variables for the categorical variables.
    * Filter and get high valued customer information
    * Derived Columns based on basic column analysis
    * Create a new columns which will tell about the Churn/No-Churn customer and Drop 9th month related columns
<b> * EDA </b>
    * EDA for Month 6 and 7 Together
    * EDA for Month 8
    * Derived Columns by combining 6th, 7th and 8th columns 
    * How the features are varied from Good Period to Decision period
    * What is the average variation from 6th+7th Month to 8th Month
    * EDA for the derived columns
<b> * Data Modeling </b>
    * Data Normalization 
    * Basic Logistic Regression Fit to Check the accuracy
    * As churn count is less, basic logist can be done with GridSearch and K-Fold
    * Using PCA dimentionality reduction can be done
    * By taking the PCA data logistic regression can be done again with GridSearch and K-Fold to check for prediction.
    * Using Ridge Regression, It can be found the important features impacting the churn.
    * Using Tree Model, Also Important Features can be derived.
    * If during EDA, any variable relation found with multinomial relation, then SVM Kernel can be used for prediction.

<b> * Final Model Selection </b>
    * Final Model for Prediction
    * Final Model for important Feature selection 
<b> * Summary </b>
    * Project Analysis and Summary

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('telecom_churn_data.csv')

# Common Function

In [None]:
# Function to Return Montwise ColumnsList
def returnColumnsByMonth(df):
    column_Month_6 = []
    column_Month_7 = []
    column_Month_8 = []
    column_Month_9 = []
    column_Common = []
    for eachColumns in df.columns:
        if((eachColumns.find("_6") >=0) | (eachColumns.find("jun_") >=0)):
            column_Month_6.append(eachColumns)
        elif((eachColumns.find("_7") >=0) | (eachColumns.find("jul_") >=0)):
            column_Month_7.append(eachColumns)
        elif((eachColumns.find("_8") >= 0) | (eachColumns.find("aug_") >=0)):
            column_Month_8.append(eachColumns)
        elif((eachColumns.find("_9") >=0) | (eachColumns.find("sep_") >=0)):
            column_Month_9.append(eachColumns)
        else:
            column_Common.append(eachColumns)
    return column_Month_6, column_Month_7, column_Month_8, column_Month_9, column_Common

# Function to Get Columns Based on Null %
def getColumnsBasedOnNullPercent(df, nullPercentLimit, limitType = 'Upper'):
    col2NullPercent_df = pd.DataFrame(round((df.isnull().sum()/len(df.index))* 100, 2), columns=['NullPercent'])
    col2NullPercent_df = pd.DataFrame(round((df.isnull().sum()/len(df.index))* 100, 2), columns=['NullPercent'])
    if(limitType == 'Upper'):
        columnsList = np.array(col2NullPercent_df.apply(lambda x: x['NullPercent'] > nullPercentLimit , axis=1))
    if(limitType == 'Lower'):
        columnsList = np.array(col2NullPercent_df.apply(lambda x: ((x['NullPercent'] < nullPercentLimit) & (x['NullPercent'] > 0)) , axis=1))
    return np.array(df.loc[:, columnsList].columns)
    

# Basic Data Analysis and Null Value Imputation

In [None]:
df.head()

In [None]:
df.columns

####  * Get Columns Monthwise & Basic Understanding of Columns

In [None]:
column_Month_6, column_Month_7, column_Month_8, column_Month_9, column_Common = returnColumnsByMonth(df)

print("Month 6 Columns Count ==> {}".format(len(column_Month_6)))
print("Month 7 Columns Count ==> {}".format(len(column_Month_7)))
print("Month 8 Columns Count ==> {}".format(len(column_Month_8)))
print("Month 9 Columns Count ==> {}".format(len(column_Month_9)))
print("Common Columns Count ==> {}".format(len(column_Common)))

In [None]:
# All Months are having same type of columns So lets see the columns in general
print ("\nMonth based Columns:\n \t\t==> {}".format(np.array(column_Month_6)))
print ("\nCommon Columns:\n \t\t==> {}".format(np.array(column_Common)))

#### * Derive Columns Total_Recharge_Amount from 6th and 7th Month total_rech_amt

In [None]:
df['Total_Recharge_Amount'] = df['total_rech_amt_6'] + df['total_rech_amt_7']

# Get 70% of "Total Recharge Amount" to identify the recharge Amount Range for High value customer
print(df['Total_Recharge_Amount'].describe(percentiles = [0.7]))
print("\n70% of Total Recharge Amount of first 2 months are {}".format(df['Total_Recharge_Amount'].describe(percentiles = [0.7])[5]))

#### * Filter High Value Customer from main data frame

In [None]:
df = df[df['Total_Recharge_Amount'] > 737].reset_index(drop=True)
print("\nTotal High Value Customer Count ==> {}".format(df.shape[0]))
df.drop(columns=['Total_Recharge_Amount'], inplace=True)

#### * Null Value Checking and Drop High Null Value Columns

In [None]:
#Get Null Percentage in dataFrame and Filter
nullPercentageLimit = 50
columns_More_Than_50_PercentNull = getColumnsBasedOnNullPercent(df,nullPercentageLimit)
#Drop Columns with More than 50% NUll
df = df.loc[:, ~df.columns.isin(columns_More_Than_50_PercentNull)]

print("\nColumn List Dropped with More than 50% of Null Value:==>\n {}\n".format(columns_More_Than_50_PercentNull))

#### * Check Categorical Variables and Single Record Variables

In [None]:
singleCategoryColumns = df.loc[:, np.array(df.apply(lambda x: x.nunique() == 1))].columns
for eachSingleCatgory in singleCategoryColumns:
    print("{}: {}".format(eachSingleCatgory, df[eachSingleCatgory].unique()))
print("\n<=== Drop Single Category Columns, Other than last_date_of_month_6/7/8/9, as it will be used for Derive Columns ===>\n")
singleCategoryColumns = [x for x in singleCategoryColumns if x not in list(['last_date_of_month_6', 'last_date_of_month_7', 'last_date_of_month_8', 'last_date_of_month_9'])]
singleCategoryColumns = np.array(singleCategoryColumns)
df = df.loc[:, ~df.columns.isin(singleCategoryColumns)]

#### * Analyze Null Value for Less than 50%

In [None]:
columns_Less_Than_50_PercentNull = getColumnsBasedOnNullPercent(df,nullPercentageLimit, limitType='Lower')
df_temp = df.loc[:, columns_Less_Than_50_PercentNull]
round(df_temp.isnull().sum()/len(df_temp.index) * 100,2)

#### * As the Null % is very less, lets see if Null Value Can be imputed with some value

In [None]:
column_Month_6, column_Month_7, column_Month_8, column_Month_9, column_Common = returnColumnsByMonth(df_temp)

print("Month 6 Columns Count ==> {}".format(len(column_Month_6)))
print("Month 7 Columns Count ==> {}".format(len(column_Month_7)))
print("Month 8 Columns Count ==> {}".format(len(column_Month_8)))
print("Month 9 Columns Count ==> {}".format(len(column_Month_9)))
print("Common Columns Count ==> {}".format(len(column_Common)))
print("==> All Months are having same columns with less% of Null Value")
print(np.array(column_Month_6))
df_temp.loc[:, column_Month_6].head()