**Loading Dataset**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df=pd.read_csv('/kaggle/input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv')
df

**Cleaning and Preparation of dataset**

**customerID**: A unique identifier for each customer in the dataset. It is used to differentiate individual customers and has no direct influence on churn analysis.

**gender**: Indicates the gender of the customer (Male/Female). It is used to analyze if there is any correlation between gender and churn behavior.

**SeniorCitizen**: A binary indicator (0 for non-senior and 1 for senior) that shows whether the customer is a senior citizen or not. Senior citizens may have different usage patterns and service needs.

**Partner**: Indicates whether the customer has a partner (Yes/No). This could be useful to understand the demographic and how having a partner might influence their service usage or loyalty.

**Dependents**: Indicates whether the customer has dependents (Yes/No). This column can help to understand if customers with dependents have different churn patterns compared to those without dependents.

**tenure**: Represents the number of months a customer has been with the service provider. It is an important variable to analyze customer loyalty and the likelihood of churn.

**PhoneService**: Indicates whether the customer has a phone service (Yes/No). This helps in understanding if having or not having a phone service is related to churn.

**MultipleLines**: Shows if the customer has multiple phone lines (Yes/No/No phone service). This can be used to see if customers with multiple lines are more or less likely to churn.

**InternetService**: Describes the type of internet service the customer has (DSL, Fiber optic, No). This is crucial in understanding which type of internet service has higher churn rates.

**OnlineSecurity**: Indicates whether the customer has an online security add-on service (Yes/No/No internet service). It can be analyzed to see if customers with additional security services are less likely to churn.

**OnlineBackup**: Shows if the customer has an online backup add-on service (Yes/No/No internet service). Like other additional services, it helps in determining the impact on churn.

**DeviceProtection**: Indicates whether the customer has device protection service (Yes/No/No internet service). This column helps to see if customers who opt for device protection are more loyal.

**TechSupport**: Shows if the customer has technical support (Yes/No/No internet service). It is useful to evaluate if technical support availability influences churn behavior.

**StreamingTV**: Indicates whether the customer has a streaming TV service (Yes/No/No internet service). This can help analyze if entertainment services like streaming TV affect customer retention.

**StreamingMovies**: Shows if the customer has a streaming movies service (Yes/No/No internet service). Similar to streaming TV, this column helps assess the impact of entertainment services on churn.

**Contract**: The contract term of the customer (Month-to-month, One year, Two year). This is one of the most critical variables, as it directly affects customer commitment and churn rates.

**PaperlessBilling**: Indicates whether the customer has paperless billing (Yes/No). It can be analyzed to see if customers who opt for paperless billing are more likely to churn.

**PaymentMethod**: Describes the payment method used by the customer (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic)). This helps in understanding if certain payment methods are associated with higher churn.

**MonthlyCharges**: The amount charged to the customer monthly. Higher or lower charges can be a significant factor influencing churn.

**TotalCharges**: The total amount charged to the customer over the tenure. It can help to understand the customer’s total spending and its relation to churn.

**Churn**: A binary variable indicating whether the customer has left the service (Yes/No). It is the target variable we aim to predict or analyze against the other features to understand why customers are churning.

In [None]:
df.columns

In [None]:
df.isnull().sum()

In [None]:
df.describe()

**EDA And Visualization for questions?**

**1.. Is there a correlation between monthly charges, tenure, and churn?**

In [None]:
sns.pairplot(df, vars=['MonthlyCharges', 'tenure'], hue='Churn')
plt.show()


**. What is the churn rate among senior citizens versus non-senior citizens?**

In [None]:
churn_by_senior = df.groupby('SeniorCitizen')['Churn'].value_counts(normalize=True).unstack() * 100
print(churn_by_senior)


**3.What is the distribution of churn across different contract types?**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(data=df, x='Contract', hue='Churn')
plt.title('Churn Distribution Across Contract Types')
plt.xlabel('Contract Type')
plt.ylabel('Number of Customers')
plt.legend(title='Churn')
plt.show()


**4. What is the distribution of tenure for customers who churned and did not churn?**

In [None]:
sns.histplot(data=df, x='tenure', hue='Churn', kde=True, bins=30)
plt.title('Distribution of Tenure by Churn Status')
plt.xlabel('Tenure (Months)')
plt.ylabel('Number of Customers')
plt.show()


**5.How does internet service type impact churn?**

In [None]:
sns.countplot(data=df, x='InternetService', hue='Churn')
plt.title('Churn Distribution by Internet Service Type')
plt.xlabel('Internet Service Type')
plt.ylabel('Number of Customers')
plt.legend(title='Churn')
plt.show()


**6.What is the impact of additional services on churn?**

In [None]:
additional_services = ['OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport']

for service in additional_services:
    plt.figure(figsize=(6, 4))
    sns.countplot(data=df, x=service, hue='Churn')
    plt.title(f'Churn Distribution by {service}')
    plt.xlabel(service)
    plt.ylabel('Number of Customers')
    plt.legend(title='Churn')
    plt.show()


**7. Which payment methods are most associated with churn?**

In [None]:
churn_by_payment = df.groupby('PaymentMethod')['Churn'].value_counts(normalize=True).unstack() * 100
print(churn_by_payment)


**What is the Count of Churn and Non-Churn?**

In [None]:
# Count plot for Churn
plt.figure(figsize=(10, 6))
sns.countplot(x='Churn', data=df, palette='pastel')
plt.title('Count of Churn and Non-Churn')
plt.xlabel('Churn')
plt.ylabel('Count')
plt.show()


**What is the correlation  betwen by Tenure and Churn ?**

In [None]:
churned = df[df['Churn'] == 'Yes']
not_churned = df[df['Churn'] == 'No']

# Plotting
plt.figure(figsize=(10, 6))
plt.hist([churned['tenure'], not_churned['tenure']], bins=10, color=['red', 'blue'], label=['Yes', 'No'])
plt.title(' Tenure by Churn')
plt.xlabel('Tenure')
plt.ylabel('Frequency')
plt.legend()
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Add text on top of bars
for rect in plt.gca().patches:
    height = rect.get_height()
    plt.gca().text(rect.get_x() + rect.get_width() / 2, height, height, ha='center', va='bottom')

**Conclusion**

Based on the analysis and visualizations of the telecom churn dataset, we can conclude the following:

**Contract Types and Churn**: Customers with month-to-month contracts have a significantly higher churn rate compared to those with longer-term contracts. This suggests that locking customers into longer contracts may reduce churn.

**Tenure and Churn**: Customers with shorter tenure are more likely to churn, indicating that newer customers are at a higher risk of leaving. Focused retention strategies for these customers can be beneficial.

**Impact of Monthly Charges**: Higher monthly charges are associated with increased churn. Offering value-added services or discounts to customers with higher monthly charges could help in retaining them.

**Demographic Insights**: Senior citizens and customers without dependents have a slightly higher churn rate. Tailoring offers and services to these demographics may improve retention.

**Additional Services and Churn**: Customers who do not subscribe to additional services like Online Security, Device Protection, and Tech Support tend to churn more. Bundling these services with the main plan might help in reducing churn.

Overall, the analysis suggests that targeting customers with shorter tenure, higher monthly charges, and those on month-to-month contracts with personalized offers and value-added services can be effective strategies to reduce churn and enhance customer retention.