# Churn Modeling with Four Models
**By Jason "Scott" Person
**Applied Machine Learning 1 @ Newman University**

## About this Data Set
**This data is from [the Churn-Modelling data set from Kaggle](https://www.kaggle.com/shubh0799/churn-modelling).**<br/>
**Number of Records:** 10,000<br/>
**Number of original fields:** 14 (including a supplied index)<br/>
**Fields include:**
- `RowNumber` - a supplied index
- `CustomerId` - unique ID number for each customer
- `Surname` - customer last name
- `CreditScore` - customer credit score
- `Geography` - the country in which the customer resides
- `Gender` - Male or Female
- `Age` - customer's age as integer
- `Tenure` - number of years as a customer, in integers
- `Balance` - customer's total bank balance
- `NumOfProducts` - the number of banking products a custom participates in
- `HasCrCard` - binary 0 or 1 indicating whether the customer has a bank credit card
- `IsActiveMember` - binary 0 or 1 indicating whether the customer has been active within past ?? time period
- `EstimatedSalary` - the customer's estimated salary
- `Exited` - binary 0 or 1 indicating whether the customer has left the bank and closed all accounts

In [0]:
# Basic Imports for Data Science
import numpy as np
import pandas as pd

# Machine Learning Prerequisites
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Five Model Algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

# 1. Read and Review Data

This is previously cleaned churn data.

In [0]:
# Read data and show sample
df = pd.read_csv('churn_cleaned.csv')
df.head(10)

In [0]:
# Show basic data info
df.info()

# 2. Split Data into Training and Test Sets

In [0]:
# Create list of features by dropping the target column
features = df.drop('Exited', axis=1)

# Create list that contains just the target column
labels = df['Exited']

In [0]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

# Print the sizes of each set with percentages of total size
total_size = features.shape[0]
print(f"Training set size: {X_train.shape[0]} ({(X_train.shape[0] / total_size) * 100:.2f}%)")
print(f"Test set size: {X_test.shape[0]} ({(X_test.shape[0] / total_size) * 100:.2f}%)")

# 3. Train Models

In [0]:
# Note, some content below won't render in VSCode and presumably other editors. It is just status data for the model training.

# Define a models list
models = [
    LogisticRegression(max_iter=10000),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    GradientBoostingClassifier()
]

# Train the models using the training features and labels
for model in models:
    model.fit(X_train, y_train)
    # Report trained model
    print(f"Trained and ready: {model}")

# 4. Test Models

In [0]:
# Test all models by running models list through a for loop

for model in models:
    # Use the model to generate predictions for the Test split, based on its features only
    y_pred = model.predict(X_test)

    # Compare model's predictive performance to the provided test labels
    score = round(accuracy_score(y_test, y_pred) * 100, 4)

    # Report the model and its score
    print(model)
    print(f'  {score}\n')