
# Introduction
**This data in** : https://www.kaggle.com/blastchar/telco-customer-churn

**Content :**

Each row represents a customer , each column contains customer's attributes described on the column Metadata



**The data set includes information about :**

- Customers who left within the last month (the column is called Churn)
- Services that each customer has signed up for : phone, multiple lines, internet, online bachup, device protection, tech support, and streaming TV and movies
- Customer Account Information : how long they've been a customer, contract, payment method, paperless billing , monthly charges, and total charges

- Demographic info about customers : gender, age range, and if they have partners and dependents
  

# **We want to answer some questions sush as :**
**What kind of people left within the last month ?**

**What is the reason for lefting the company within the last month ?**


## Data Wrangling
**let's read the data and take some intuition about it**


In [None]:
# import the important libraries :
# for dataframe :
import pandas as pd     
# for arraies :
import numpy as np
# for visualization :
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns


In [None]:
telco = pd.read_csv('../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv')

In [None]:
telco.head()

In [None]:
telco.info()

In [None]:
telco['TotalCharges'] = telco['TotalCharges'].replace(" ", np.nan)

In [None]:
# Numer of missing values for each columns :
telco.isna().sum()

In [None]:
telco.dropna(axis=0,inplace=True)

In [None]:
# Converting type of TotalCharges to float :
telco['TotalCharges'] = telco['TotalCharges'].astype(float)
type(telco['TotalCharges'])

**we can see that the data don't have missing value**




In [None]:
# to Know some statistical information about the numerical data :
telco.describe()

In [None]:
telco.nunique()

In [None]:
telco.PaymentMethod.unique()

In [None]:
telco.MultipleLines.unique()

# Now we can say about this data :
- **we have 2 genders of clients, Some clients are still dependent, other are independent and some have a partner.**

- **Customers who have available phone service have some features and advantages**

- **The company also pronides 4 payment methodes :**

  1) Electronic check

 2) Mailed check

 3) Bank transfer (automatic)

 4)  Credit card (automatic)


# Exploratory Data Analysis
**now let's make Exploratory Data Analysis**

In [None]:
telco.head()

# What is the percentage for each gender which left the company ?




In [None]:
sns.countplot(telco.Churn)
plt.title(" Number of people who left and who's still in company " )
plt.show()

In [None]:
telco.Churn.value_counts()

In [None]:
sns.countplot(telco.gender, hue = telco.Churn)
plt.title('The gender of customers who left and who not ')
plt.show()

In [None]:
female_left = telco.query('gender =="Female" and Churn == "Yes"')

In [None]:
female_left.shape

In [None]:
male_left = telco.query('gender == "Male" and Churn == "Yes"')

In [None]:
male_left.shape

In [None]:
print('The percentage of female who left is {}%'.format((939/1869)*100))

In [None]:
print('The percentage od male who left is {}%'.format((930/1869)*100))

**We can say that : The gender almost didn't affect on Churn**

# Let's think about outher kind of people

In [None]:
sns.countplot(telco.SeniorCitizen)
plt.title('Number of people who are senior citizen')
plt.show()

In [None]:
telco.SeniorCitizen.value_counts()

In [None]:
sns.countplot(telco.SeniorCitizen,hue = telco.Churn)
plt.title('Senior citizen eho left and who not ')
plt.show()

In [None]:
senior_left = telco.query('SeniorCitizen == 1 and Churn == "Yes"')

In [None]:
senior_left.shape

In [None]:
print('The percentage of senior citizen who left is {}%'.format((476/1142)*100))

#there is a problem!
**41.7% from senior citizen left the company, which it mean that the company didn't provide a suitable service to them**

In [None]:
sns.countplot(telco.Partner)
plt.title('Number of customer who have partener')
plt.show()

In [None]:
telco.Partner.value_counts()

In [None]:
sns.countplot(telco.Partner,hue = telco.Churn)
plt.show()

In [None]:
partner_left = telco.query("Partner == 'Yes' and Churn == 'Yes'")

In [None]:
partner_left.shape

In [None]:
telco.query("Partner == 'No' and Churn == 'Yes'").shape

In [None]:
print('There are {}% of customers who have partner left the company '.format((669/3402)*100))
print("There are {}% of customers who didn't have partner left ".format((1200/3641)*100))

# There is another problem : 33% of customers who don't have partner left the company
**which it mean that : the company didn't provide a suitable service to them**

# let's discover the last kind of customer who is dependant and who is not

In [None]:
telco.Dependents.value_counts()

In [None]:
sns.countplot(telco.Dependents)
plt.title('Numer of customers who are dependents VS independents')
plt.show()

**let's see who are left the company**

In [None]:
telco.query("Dependents == 'Yes' and Churn == 'Yes'").shape

In [None]:
telco.query("Dependents == 'No' and Churn == 'Yes'").shape


In [None]:
print("There are {}% of customers who are Dependents left the company \n and there are {}% of customers who are independents left"
     .format((326/2110)*100,(1543/4933)*100))

**That is expected thing, the number of independents customers who left the company is 31% which is greater than the number of dependent customer who left the company which is 15.4%, because they want to save money**


**now we are put our hand on some problems which are :**

- 41.7% from senior citizen left the company.
- 33% of customers who don't have partner left the company.
- 31% of independent customers left the company
















# It's time to finde the reasons for those problems

In [None]:
telco.head()

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(telco.tenure,order=(telco.tenure.value_counts().index))
plt.title('Number of months the customer has stayed with the company')
plt.show()

In [None]:
print(' There are {}% of customers left the company after the first month'.format((telco.query("tenure == 1").shape[0]/telco.shape[0])*100))

In [None]:
print('In general {}% of customers left the company after first 5 months '.format((telco.query("tenure == [1,2,3,4,5]").shape[0]/telco.shape[0])*100))

**Select numerical features**

In [None]:
num_col = telco[['tenure','MonthlyCharges','TotalCharges']]

In [None]:
num_col.head()

# Visualization

In [None]:
telco.head()

In [None]:
telco.shape

**What is the structure of your dataset ?**

The data have 7032 rows and 21 columns



**What is?are the main feature(s) of interest in your dataset ?**


*   gender
*  SeniorCitizen
*  Partner
*  Dependents
*  tenure
*  PhoneService
*  MultipleLines
*  InternetService
*  OnlineSecurity
*  OnlineBackup
*  DeviceProtection
*  TechSupport
*  StreamingTV
*  StreamingMovies
*  Contract
*  PaperlessBilling
*  PaymentMethod
*  MonthlyCharges
*  TotalCharges
*  Churn





**What features in the dataset do you think will help support your investigation into your features of intersest ?**

Untill now I fell all feature will help me 

# Univariate Exploration 

In [None]:
df = telco.drop('customerID',axis = 1)
df.head()

**Profiling our clients**

In [None]:
def pie(x):
  sorted_counts = df[x].value_counts()
  plt.pie(sorted_counts, labels=sorted_counts.index, startangle=90,
          counterclock = False, autopct = '%.1f%%',shadow = True);
  plt.axis('square')
  plt.show()        

In [None]:
plt.figure(figsize=(5,5))
plt.title('The percentage of each gender in the company')
pie('gender')
plt.show()

**The Company has an approximated equal number of gender**

In [None]:
plt.figure(figsize=(5,5))
plt.title('The percentage of varivation in the company')
pie('SeniorCitizen')
plt.show()

**16.2% of the company's clients are elderly and the the rest are young**

In [None]:
plt.figure(figsize=(5,5))
plt.title("Percentage of having a partner Vs those who don't have")
pie('Partner')
plt.show()

**51.7% of clients don't have a partner**

In [None]:
plt.figure(figsize=(5,5))
plt.title('The percentage of Dependent clients in the company VS Independent clients')
pie('Dependents')
plt.show()

**We can see 70.2% of clints are Independent**

In [None]:
num_col.head()

In [None]:
plt.figure(figsize =(5,5))
plt.hist(num_col.tenure)
plt.title('Tenure Destribution')
plt.xlabel('Tenure')
plt.ylabel('Count')
plt.show()

**We can't gain information from this values so I will try to count the logarithm to it and make a distribution again**

In [None]:
plt.figure(figsize=(5,5))
plt.hist(np.log10(num_col.tenure)) # Here I count the logarithm 10 for the tenure feature
plt.title('Tenure destribution after count logarithn')
plt.xlabel('Tenure')
plt.ylabel('Clients number')
plt.show()

**We can see now that the longer the tenure , the greater the numer of clients**

In [None]:
plt.figure(figsize=(20,6))
plt.subplot(1,2,1)
plt.hist(num_col.TotalCharges)
plt.ylim(0,3000)
plt.title('TotalCharges destribution')
plt.xlabel('TotalCharges')
plt.ylabel('Clients number')

plt.subplot(1,2,2)
plt.hist(num_col.MonthlyCharges)
plt.ylim(0,3000)
plt.title('MonthlyCharges destriution')
plt.xlabel('MonthlyCharges')
plt.show()

**For Total Charges:**

- the largest number of customers pays a value between 0 and 2000, after which the number of clients decreases dramatically

**For Monthly Charges:**

- There is a varation in the number of customers , but it increases between 20 and 25

In [None]:
plt.figure(figsize=(12,6))

for i in range(len(list(num_col.columns))):
  plt.subplot(1,3,i+1)
  sns.boxplot(num_col.iloc[:,i])

**There are no outliers**

**Discuss the distribution(s) of your variabe(s) of interest. Were there any unusual points ? Did you need to perform any transformations ?**

- for the tenure feature the distribution was not good to observe smething so I calculate the logarithm to see the distribution accuately


# Bivariate Exploration

In [None]:
service = telco.iloc[:,6:-4]

In [None]:
service.head()

In [None]:
plt.figure(figsize=(25,20))
for i, feature in enumerate(list(service.columns)):
  plt.subplot(3,4,i+1)
  sns.countplot(feature,hue = telco.Churn,data=telco)
  plt.ylim(0,5000)
  plt.title('Number of clints who used {} lest VS stay'.format(feature))
plt.show()

**The previous graph shows the relationship of leaving the company with services, and knowing the number of those who left the company and who remained in relation to using the services**

In [None]:
plt.figure(figsize=(25,20))
for i,feature in enumerate(list(service.columns)):
  plt.subplot(3,4,i+1)
  sns.countplot(feature,hue = telco.SeniorCitizen,data = telco)
  plt.ylim(0,6000)
  plt.title('senior citizen VS youth in using {}'.format(feature))
plt.show()

**The pervious plot aims to find out the number of elderly people who use different services and the number of young people who use the same services as well**

In [None]:
plt.figure(figsize= (25,20))
for i ,feature in enumerate(list(service.columns)):
  plt.subplot(3,4,i+1)
  sns.countplot(feature,hue = telco.Dependents,data = telco)
  plt.ylim(0,5000)
  plt.title('Dependents VS Independent in using {}'.format(feature))
plt.show()

**The previous plot aims to find out the number of dependent people who use different services and the number of independent people who use the same services as well**

In [None]:
plt.figure(figsize=(25,20))
for i,feature in enumerate(list(service.columns)):
    plt.subplot(3,4,i+1)
    sns.countplot(feature,hue = telco.Partner,data=telco)
    plt.ylim(0,5000)
    plt.title('clints who have partner VS not in using {}'.format(feature))
plt.show()

**The previous plot aims to find out the number of clients who has partner use different services and the number of clients who don't have partner use the same services as well**

In [None]:
sns.pairplot(num_col)
plt.show()

**Talk about some of the the relatinships you observed in this part of the investigation,How did the feature(s) of interest vary with other features in dataset ?**

- There are a big positive correlation between tenure,Monthly and Total charges 

# Multivariate Exploration 

In [None]:
sns.heatmap(num_col.corr(),annot = True,linewidths=2)
plt.title('Correlation between the numerical variales in dataset')
plt.show()

**Now we can see how mush each feature  correlated with the others**

#Conclusions

**In the end there are some problems the company faces:**

- 41.7% of senir citizens left the company.
- 33% of customers who don't have a partner left the company.
- 31% of independent customers left the company.
- 15.4% of dependent customers left thr company.
- 8.7% of customers left the company after the first month.
- 19.3% of customers left the company after first 5 month.
- A large number of customers using Fiber Optic have left the compay.
- Clients who do not use online security, many of them leave the company.
- Many clients wh do not use technical support have lest the company.
- Customers who pay month to month are the most who leave the company. 