# Let's Get This Snowball Rolling

Now, it's your turn to evaluate how a logistic regression model performs. This time, you’ll do so in the context of trying to help a fintech startup more quickly grow its user base. By applying the code you learned to customer data, you’ll discover how machine learning has the potential to turbocharge the growth trajectory of a fintech firm.

## Instructions

1. Read in the dataset about the current customers of the startup.

2. Split the data into X and y and then into testing and training sets.

3. Fit a logistic regression classifier.

4. Create the predicted values for the testing and the training data.

5. Print a confusion matrix for the training data.

6. Print a confusion matrix for the testing data.

7. Print the training classification report.

8. Print the testing classification report.

9. Answer the following question: How does the model performance compare between the training data and the testing data?


## Resources:

Following are links to modules from the scikit learn library that will be utilized:

[Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)

[train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

[classifiction_report](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)

[confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)


In [2]:
# Import the required modules
import pandas as pd
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report


## Step 1: Read in the dataset about the current customers of the startup.

In [6]:
# Read the usage_stats.csv file from the Resources folder into a Pandas DataFrame
customer_df = pd.read_csv(Path("../Resources/usage_stats.csv"))

# Review the DataFrame
# YOUR CODE HERE
display(customer_df.tail())
display(customer_df.tail())


Unnamed: 0,Usage Stats,Referral History,Customer Rank,target
1205,1.97554,-2.200099,0.345623,1
1206,2.093416,-1.592133,-1.300825,0
1207,2.010334,-1.758225,-1.173162,0
1208,4.451947,-0.502815,-2.35502,0
1209,2.141445,-1.993869,-0.946396,0


Unnamed: 0,Usage Stats,Referral History,Customer Rank,target
1205,1.97554,-2.200099,0.345623,1
1206,2.093416,-1.592133,-1.300825,0
1207,2.010334,-1.758225,-1.173162,0
1208,4.451947,-0.502815,-2.35502,0
1209,2.141445,-1.993869,-0.946396,0


## Step 2: Split the data into X and y and then into testing and training sets.

In [24]:
# Split the data into X (features) and y (target)

# The y variable should focus on the target column
y = customer_df["target"]

# The X variable should include all features except the target
X = customer_df.loc[:,"Usage Stats":"Customer Rank"]


In [35]:
# Split into testing and training sets using train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, 
                                                    y, 
                                                  #  test_size=0.33, 
                                                    random_state=42)


## Step 3: Fit a logistic regression classifier.

In [36]:
# Declare a logistic regression model.
# Apply a random_state of 9 to the model
logistic_regression_model = LogisticRegression(random_state=9)

# Fit and save the logistic regression model using the training data
lr_model = logistic_regression_model.fit(X_train, y_train)


## Step 4: Create the predicted values for the testing and the training data.

In [37]:
#Generate training predictions
training_predictions = logistic_regression_model.predict(X_train)

#Generate testing predictions
testing_predictions = logistic_regression_model.predict(X_test)


## Step 5: Print a confusion matrix for the training data.

In [38]:
# Import the model for sklearn's confusion matrix

from sklearn.metrics import confusion_matrix

# Create and save the confustion matrix for the training data
training_matrix = confusion_matrix(y_train, training_predictions)

# Print the confusion matrix for the training data
print(training_matrix)


[[808   7]
 [ 19  73]]


## Step 6: Pring a confusion matrix for the texting data.

In [39]:
# Create and save the confustion matrix for the testing data
test_matrix = confusion_matrix(y_test, testing_predictions)

# Print the confusion matrix for the testing data
print(test_matrix)


[[274   0]
 [  2  27]]


## Step 7: Print the training classification report.

In [40]:
# Create and save the training classifiction report
training_report = classification_report(y_train, training_predictions)

# Print the training classification report
print(training_report)


              precision    recall  f1-score   support

           0       0.98      0.99      0.98       815
           1       0.91      0.79      0.85        92

    accuracy                           0.97       907
   macro avg       0.94      0.89      0.92       907
weighted avg       0.97      0.97      0.97       907



## Step 8: Print the testing classification report.

In [41]:
# Create and save the testing classifiction report
testing_report = classification_report(y_test, testing_predictions)

# Print the testing classification report
print(testing_report)

              precision    recall  f1-score   support

           0       0.99      1.00      1.00       274
           1       1.00      0.93      0.96        29

    accuracy                           0.99       303
   macro avg       1.00      0.97      0.98       303
weighted avg       0.99      0.99      0.99       303



## Step 9: Answer the following question

**Question:** How does the performance of the training and test dataset compare?

**Sample Answer:** very similar