# Telco Churn Final Report

## Imports

In [7]:
# imports used in your project go here 
import warnings
warnings.filterwarnings("ignore")
# Tabular data friends:
import pandas as pd
import numpy as np
# Data viz:
import matplotlib.pyplot as plt
import seaborn as sns
# Sklearn stuff:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, plot_confusion_matrix
# Data acquisition
from pydataset import data
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
import seaborn as sns
import numpy as np
import env
import os
import sklearn
from sklearn.metrics import accuracy_score
import acquire
import prepare

## Acquire

* Data acquired from the Codeup DB Server
* It contained 7043 rows and 25 columns before cleaning
* The data was aquired on 25 APR 2023
* Each row represents a unique Telco customer account
* Each column represents a feature of those accounts


In [12]:
# Imported acquired messy data 
df = acquire.get_telco_data()


## Prepare

* During the cleaning process the following changes were made:
    - The total_charges column was modified from a string to a float
    - Null values in total_charges were replaced with a '0', because they represented new accounts that had been charged yet
    - Duplicate observations were dropped: payment_type_id, internet_service_type_id, contract_type_id, Unnamed: 0
    - Then dummy columns were created for use in the model, and the original columns were removed
        - These columns were: gender, partner, dependents, tech_support, streaming_tv, streaming_movies, paperless_billing, churn, contract_type, internet_service_type, and payment_type

In [6]:
prepare.prep_telco(df)

Unnamed: 0.1,Unnamed: 0,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,online_security,...,streaming_movies_Yes,paperless_billing_Yes,churn_Yes,contract_type_One year,contract_type_Two year,internet_service_type_Fiber optic,internet_service_type_None,payment_type_Credit card (automatic),payment_type_Electronic check,payment_type_Mailed check
0,0,0002-ORFBO,Female,0,Yes,Yes,9,Yes,No,No,...,0,1,0,1,0,0,0,0,0,1
1,1,0003-MKNFE,Male,0,No,No,9,Yes,Yes,No,...,1,0,0,0,0,0,0,0,0,1
2,2,0004-TLHLJ,Male,0,No,No,4,Yes,No,No,...,0,1,1,0,0,1,0,0,1,0
3,3,0011-IGKFF,Male,1,Yes,No,13,Yes,No,No,...,1,1,1,0,0,1,0,0,1,0
4,4,0013-EXCHZ,Female,1,Yes,No,3,Yes,No,No,...,0,1,1,0,0,1,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,7038,9987-LUTYD,Female,0,No,No,13,Yes,No,Yes,...,0,0,0,1,0,0,0,0,0,1
7039,7039,9992-RRAMN,Male,0,Yes,No,22,Yes,Yes,No,...,1,1,1,0,0,1,0,0,1,0
7040,7040,9992-UJOEL,Male,0,No,No,2,Yes,No,No,...,0,1,0,0,0,0,0,0,0,1
7041,7041,9993-LHIEB,Male,0,Yes,Yes,67,Yes,No,Yes,...,1,0,0,0,1,0,0,0,0,1


## Data Dictionary

| Feature | Key | Data Type | Definition |
|---|---|---|---|
| customer_id | Unique ID | object | identifier for each individual customer's account |
| gender | Male/Female | object | Whether the client is a female or a male |
| senior_citizen | 1 = Yes 2 = No | int64 | Whether the client is a senior citizen or not |
| partner | Yes or No | object | Whether the client has a partner or not |
| dependents | Yes or No | object | Whether the client has dependents or not |
| tenure | Months | int64 | Number of months the customer has stayed with the company |
| phone_service | Yes or No | object | Whether the client has a phone service or not |
| multiple_lines | No phone service <br>No<br>Yes | object | Whether the client has multiple lines or not |
| online_security | 7043 non-null | object | Whether the client has online security or not |
| online_backup | 7043 non-null | object | Whether the client has online backup or not |
| device_protection | 7043 non-null | object | Whether the client has device protection or not |
| tech_support | No internet service<br>No<br>Yes | object | Whether the client has tech support or not |
| streaming_tv | No internet service<br>No<br>Yes | object | Whether the client has streaming TV or not |
| streaming_movies | No internet service<br>No<br>Yes | object | Whether the client has streaming movies or not |
| paperless_billing | Yes or No | object | Whether the client has paperless billing or not |
| monthly_charges | in USD | float64 | The amount charged to the customer monthly |
| total_charges | in USD | object | The total amount charged to the customer |
| churn | Yes or No | object | Has the client churned or not |
| contract_type | Month-to-Month<br>One year<br>Two year | object | Indicates the customer’s current contract type |
| internet_service_type | DSL<br>Fiber optic<br>No | object | Whether the client is subscribed to Internet service with the company |
| payment_type | Electronic check<br>Mailed check<br>Bank transfer (automatic)<br>Credit Card (automatic) | object | The customer’s payment method |
|  |  |  |  |

In [4]:
# Import your prepare function and use it to clean your data here

## Explore

* Here you will explore your data then highlight 4 questions that you asked of the data and how those questions influenced your analysis
* Remember to split your data before exploring how different variables relate to one another
* Each question should be stated directly 
* Each question should be supported by a visualization
* Each question should be answered in natural language
* Two questions must be supported by a statistical test, but you may choose to support more than two
* See the following example, and read the comments in the next cell

**The following empty code block** is here to represent the countless questions, visualizations, and statistical tests 
that did not make your final report. Data scientist often create a myriad of questions, visualizations 
and statistical tests that do not make it into the final notebook. This is okay and expected. Remember 
that shotgun approaches to your data such as using pair plots to look at the relationships of each feature 
are a great way to explore your data, but they have no place in your final report. 
**Your final report is about showing and supporting your findings, not showing the work you did to get there!**

#4 questions
* Do any of the demographic features provide relevant preditive value for churn?
    *
* Are there any account features that indicat higher turn rates?

* 


## You may use this as a template for how to ask and answer each question:

### 1) Question about the data
* Ask a question about the data for which you got a meaningful result
* There is no connection can be a meaningful result

### 2) Visualization of the data answering the question

* Visualizations should be accompanied by take-aways telling the reader exactly what you want them to get from the chart
* You can include theses as bullet points under the chart
* Use your chart title to provide the main take-away from each visualization
* Each visualization should answer one, and only one, of the explore questions

### 3) Statistical test
* Be sure you are using the correct statistical test for the type of variables you are testing
* Be sure that you are not violating any of the assumptions for the statistical test you are choosing
* Your notebook should run and produce the results of the test you are using (This may be done through imports)
* Include an introduction to the kind of test you are doing
* Include the Ho and Ha for the test
* Include the alpha you are using
* Include the readout of the p-value for the test
* Interpret the results of the test in natural language (I reject the null hypothesis is not sufficient)

### 4) Answer to the question
* Answer the question you posed of the data by referring to the chart and statistical test (if you used one)
* If the question relates to drivers, explain why the feature in question would/wouldn't make a good driver

## Exploration Summary
* After your explore section, before you start modeling, provide a summary of your findings in Explore
* Include a summary of your take-aways
* Include a summary of the features you examined and weather or not you will be going to Modeling with each feature and why
* It is important to note which features will be going into your model so the reader knows what features you are using to model on

## Modeling

### Introduction
* Explain how you will be evaluating your models
* Include the evaluation metric you will be using and why you have chosen it
* Create a baseline and briefly explain how it was calculated 

In [3]:
# If you use code to generate your baseline run the code and generate the output here

Printout should read: <br>
Baseline: "number" "evaluation metric"

### Best 3 Models
* Show the three best model results obtained using your selected features to predict the target variable
* Typically students will show the top models they are able to generate for three different model types

## You may use this as a template for how to introduce your models:

### Model Type

In [4]:
# Code that runs the best model in that model type goes here 
# (This may be imported from a module)

Printout of model code should read: <br>
"Model Type" <br>
"evaluation metric" on train: "evaluation result" <br>
"evaluation metric" on validate: "evaluation result"

### Test Model
* Choose the best model out of the three as you best model and explain why you have chosen it
* Explain that you will now run your final model on test data to gauge how it will perform on unseen data

In [5]:
# Code that runs the best overall model on test data (this may be imported from a module)

Printout of model code should read: <br>
"Model Type" <br>
"evaluation metric" on Test: "evaluation result" <br>

### Modeling Wrap 
* Give a final interpretation of how the models test score compares to the baseline and weather you would recommend this model for production

## Conclusion

### Summery
* Summarize your findings and answer the questions you brought up in explore 
* Summarize how drivers discovered lead or did not lead to a successful model 

### Recommendations
* Recommendations are actions the stakeholder should take based on your insights

### Next Steps
* Next Steps are what you, as a Data Scientist, would do if provided more time to work on the project

**Where there is code in your report there should also be code comments telling the reader what each code block is doing. This is true for any and all code blocks even if you are using a function to import code from a module.**
<br>
<br>
**Your Notebook should contain adequate markdown that documents your thought process, decision making, and navigation through the pipeline. As a Data Scientist, your job does not end with making data discoveries. It includes effectively communicating those discoveries as well. This means documentation is a critical part of your job.**