## Project Title : 
Churn Prediction Project 

## Project Description: 
This project is known as churn prediction for a telecome company.  Imagine that we are working at a telecom company that offers phone and internet
services, and we have a problem: some of our customers are churning. They no longer are using our services and are going to a different provider. We would like to prevent that from happening, so we develop a system for identifying these customers and offer them an incentive to stay. We want to target them with promotional messages and give them a discount. We also would like to understand why the model thinks our customers churn, and for that, we need to be able to interpret the model’s predictions.
 
We have collected a dataset where we’ve recorded some information about our customers: what type of services they used, how much they paid, and how long they stayed with us. We also know who canceled their contracts and stopped using our services (churned). We will use this information as the target variable in the machinelearning model and predict it using all other available information. 

The project plan is as follows: 
- First, we download the dataset and do some initial preparation: rename columns and change values inside columns to be consistent throughout the entire dataset.
- Then we split the data into train, validation, and test so we can validate our models.
- As part of the initial data analysis, we look at feature importance to identify which features are important in our data.
- We transform categorical variables into numeric variables so we can use them in the model.
- Finally, we train a logistic regression model.

## Dataset Description
- Url:  https://www.kaggle.com/blastchar/telco-customer-churn.

- Column description
    - CustomerID: the ID of the customer
    - Gender: male/female
    - SeniorCitizen: whether the customer is a senior citizen (0/1)
    - Partner: whether they live with a partner (yes/no)
    - Dependents: whether they have dependents (yes/no)
    - Tenure: number of months since the start of the contract
    - PhoneService: whether they have phone service (yes/no)
    - MultipleLines: whether they have multiple phone lines (yes/no/no phone service)
    - InternetService: the type of internet service (no/fiber/optic)
    - OnlineSecurity: if online security is enabled (yes/no/no internet)
    - OnlineBackup: if online backup service is enabled (yes/no/no internet)
    - DeviceProtection: if the device protection service is enabled (yes/no/no internet)
    - TechSupport: if the customer has tech support (yes/no/no internet)
    - StreamingTV: if the TV streaming service is enabled (yes/no/no internet)
    - StreamingMovies: if the movie streaming service is enabled (yes/no/no internet)
    - Contract: the type of contract (monthly/yearly/two years)
    - PaperlessBilling: if the billing is paperless (yes/no)
    - PaymentMethod: payment method (electronic check, mailed check, bank transfer, credit card)
    - MonthlyCharges: the amount charged monthly (numeric)
    - TotalCharges: the total amount charged (numeric)
    - Churn: if the client has canceled the contract (yes/no)

## Environment Configuration
- Installing virtual Env
    - pip install pipenv 

- Installing Packages
    - pipenv install jupyter notebook pandas pyarrow numpy matplotlib seaborn scikit-learn

- Starting Virtual Env
    - pipenv shell 

- Starting Notebook
    - jupyter-notebook 

- Stoping Notebook 
    - Ctrl+c

- Deactiving Virtual Env
    - exit

## Importing Libraries

In [None]:
## librarie(s) for loading and preprocessing 


## libarie(s) for visualization 


## library for building a validation framwork


## library for feature engineering 


## library for ml algorithms


## library for ml metrics 



## Loading And Data Overview

In [None]:
## load dataset

## create a copy of the 


In [None]:
## view the first five rows 


In [None]:
## last five rows 


In [None]:
## check for the total rows and columns 


In [None]:
## check for the brief column summary 


In [None]:
## check for missing values 


In [None]:
## lets check for duplicates 


In [None]:
## check for uniqueness in each column


## Data Preprocessing 
- Normalizing the column names 
- Replacing empty string with nan and fill for missing values 
- deleted the customer id column 
- change the data type on the columns 

In [None]:
## let convert the the column names to lower case


In [None]:
## preview the columns


In [None]:
## replace  values in totalcharges column 


In [None]:
## fill in the missing values in the totalcharges column with mean


In [None]:
## delete the customer id column 


In [None]:
##del df['customerid']

In [None]:
## display the first five rows using the transpose


In [None]:
## lets change the datatype of 'object' columns to category datatypes.


In [None]:
## lets convert the target column, where yes == 1 and no = 0


In [None]:
## lets preview the churn column 


Exploratory Data Analysis
- Target Variable Analysis 
- Outlier analysis 

In [None]:
## lets display the distribution of the target column (churn)

In [None]:
## compute the total counts of each category in the target column

## Building a validation framework
- Let’s split the DataFrame such that
    - 20% of data goes to validation.
    - 20% goes to test.
    - The remaining 60% goes to train.

In [None]:
## split the dataset into training, validation, and test sets




## print the output of the train, validation, and test data sample


In [None]:
## select the target column from the dataframe and convert them in matrix format or numpy array


In [None]:
## delete the target column from the rest of the dataframe 


## Baseline Training of Logistics Regression Model
- To build a baseline model, we use only the numerical featues to train a simple ml algorithm to serve as our baseline model.

In [None]:
## select only numerical featues 

In [None]:
## convert the numerical features into numpy array

In [None]:
## instantiate a logistic regression algorithm 

## fit the training data to the algorithm 


In [None]:
## generate the validation predictions 


In [None]:
## previe the validation predictions


- The predictions of the model: a two-column matrix. 
- The first column contains  the probability that the target is zero (the client won’t churn). 
- The second column contains the opposite probability (the target is one, and the client will churn).

In [None]:
## lets select the data in the second column


- This output (probabilities) is often called soft predictions. 
- These tell us the probability of churning as a number between zero and one. It’s up to us to decide how to interpret this number and how to use it.
- To make the actual decision about whether to send a promotional letter to our customers, using the probability alone is not enough. 
- We need hard predictions — binary values of True (churn, so send the mail) or False (not churn, so don’t send the mail).
- To get the binary predictions, we take the probabilities and cut them above a certain threshold.

In [None]:
## lets set the prediction threshold to 0.5


In [None]:
## display the output

In [None]:
## lets compute the acccuracy using the accuracy_score metric 


## display the output


## Feature Engineering 
- transforming all categorical variables to numeric features.

In [None]:
## lets select the categorical, integer and float datatype featues 


In [None]:
## lets convert the dataframe to a dictionary format


In [None]:
## lets instantiate the DictVectorizer class


## lets train the vectorizer with the train data


In [None]:
## lets transform the training data


## lets transform the validation data


In [None]:
## lets instantiate a new Logistic regression alg


## lets fit the alg with the training data


In [None]:
## lets generate the validation prediction 


In [None]:
## lets generate the churn predictions using a threshold of 0.5


In [None]:
## lets compute the accurate of the validation predicition using the accuracy_score metrics


## lets print the output


## Saving Model 

In [None]:
## lets import the pickcle libarry


In [None]:
## specifyging where to save the file

    ## save the model
   

## Loading The Model

In [None]:
## lelts load the saved model


In [None]:
## a sample customer


In [None]:
## lets create a function to make a single prediction 


In [None]:
## lets call the function to make the prediction 


In [None]:
## output the value of the prediction 


In [None]:
## lets make the prediction by setting the threshold and returning a verdict
## 'verdict: Churn' , 'verdict: Not Churn'
