# Which Model Predicts Megaline Phone Plan Preference Most Accurately?

## Project Description

The following project uses pre-processed data from the Megaline mobile phone company detailing their customer's monthly beahviors (number of calls made, data used, etc.). The data is derived from customers who have already switched to one of Megaline's new phone plans (Smart or Ultra) and will be used to train several classification models. Each model will be fitted to predict which of Megaline's new phone plans should be advertised to customers who have not yet switched to one of the Smart or Ultra plans. The purpose of this project is to test each model for optimal accuracy by iterating through different combinations of hyperparemters in order to provide Megaline with the most accurate and time efficient model for targeted advertisement. 

## Import Neccesary Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

## Open Dataset

In [2]:
url = 'https://raw.githubusercontent.com/pvnkd0v3/megaline_model_training_tt_project/main/users_behavior.csv'
megaline = pd.read_csv(url)

## View Data

In [3]:
megaline

Unnamed: 0,calls,minutes,messages,mb_used,is_ultra
0,40.0,311.90,83.0,19915.42,0
1,85.0,516.75,56.0,22696.96,0
2,77.0,467.66,86.0,21060.45,0
3,106.0,745.53,81.0,8437.39,1
4,66.0,418.74,1.0,14502.75,0
...,...,...,...,...,...
3209,122.0,910.98,20.0,35124.90,1
3210,25.0,190.36,0.0,3275.61,0
3211,97.0,634.44,70.0,13974.06,0
3212,64.0,462.32,90.0,31239.78,0


The megaline dataframe contains observations of 3214 customers of Megaline, a mobile carrier company. Each observation represents the monthly behavior of a single customer including the number of calls made, total call durations (in minutes), number of text messages sent, data used (in MB), and which of Megaline's new plans the user is subscribed to (0 for Smart 1 for Ultra). **Note:** This dataset has been previously processed in preperation for this project.

## Define Model Features and Targets

In [4]:
features = megaline.drop('is_ultra', axis=1) #User behavior that will be used to predict target (new phone plan)
target = megaline['is_ultra'] #New phone plans to be predicted using features (user behavior)

## Create Test and Validation Datasets

In [5]:
features_train, features_valid, target_train, target_valid = train_test_split(features, 
                                                                              target, 
                                                                              test_size=0.25, 
                                                                              random_state=12345)

## Create and Test Models

### Decision Tree Model

In [6]:
best_depth = 0 #Model depth placeholder value
best_score = 0 #Accuracy score placeholder value

for depth in range(1, 51): #Loop that will create a DecisionTreeClassifier model with different depths
    model = DecisionTreeClassifier(max_depth=depth, random_state=12345) 
    model.fit(features_train, target_train) #Fit current model with intended features and target
    
    score = model.score(features_valid, target_valid) #Get accuracy score of current model using validation dataset
    if score > best_score: #Replaces depth and accuracy place holders with info from model that has the highest accuracy score
        best_score = score
        best_depth = depth
        
print(f'Best depth: {best_depth}. Accuracy: {best_score}.') #Print statement results in optimal model and score from loop

    

Best depth: 7. Accuracy: 0.7898009950248757.


When given a range of depths between 1 and 50 (for sake of time efficiency) the Decision Tree model had the highest accuracy with a depth of 7 which returned an accuracy of approximately 0.790 when tested on the validation dataset. 

### Random Forest Model

In [7]:
best_est = 0 #Model's number of estimators placeholder value
best_depth = 0 #Model depth placeholder value
best_score = 0 #Accuracy score placeholder value

for est in range(1, 51, 10): #Loop that will create a RandomForestClassifier models with different combinations of estimators
    for depth in range(1, 51): #and depths
        model = RandomForestClassifier(n_estimators=est,
                                      max_depth=depth,
                                      random_state=12345)
        model.fit(features_train, target_train) #Fit current model with features and target
        
        score = model.score(features_valid, target_valid)
        if score > best_score: #Replaces estimator, depth, and accuracy score placeholders with respective info from the most recent optimal model
            best_score = score
            best_est = est
            best_depth = depth

print(f'Best # estimators: {best_est}. Best depth: {best_depth}. Accuracy: {score}.') #Print statement results in optimal model and score from loop
        


Best # estimators: 21. Best depth: 9. Accuracy: 0.7922885572139303.


When given a range of estimators between 1 and 50 in increments of 10 and a range of depths between 1 and 50 (both ranges for sake of time efficiency) the Random Forest model had the highest accuracy with 21 estimators and a depth of 9 which returned an accuracy of approximately 0.792 when tested on the validation dataset. Although both models exceed the 0.75 accuracy threshold, the Random Forest model is slightly more accurate than the Decision Tree model.

### Logistic Regression Model

In [8]:
solvers = ['liblinear', 'lbfgs', 'newton-cg', 'sag', 'saga'] #List of solvers for LogisticRegression model
best_score = 0 #Accuracy score placeholder value
best_solver = None #Model solver placeholder value

for solver in solvers: #Loop that will create LogisticRegression models using different solvers
    model = LogisticRegression(solver=solver, random_state=12345)
    model.fit(features_train, target_train) #Fit current model with features and target

    score = model.score(features_valid, target_valid)
    if score > best_score: #Replaces solver and accuracy score placeholders with respective info from the most recent optimal model
        best_score = score
        best_solver = solver

print(f'Best solver: {best_solver}. Accuracy: {best_score}.') #Print statement results in optimal model and score from loop

Best solver: newton-cg. Accuracy: 0.7599502487562189.




The Logistic Regression model was most accurate when using the 'newton_cg' solver. When this model was tested on the validation dataset, it returned an accuracy of approximately 0.760 making it the least accurate of the three models regarding the given training data and range of hyperparameters.

## Conclusion

Three classification models (DecisionTreeClassifier, RandomForestClassifier, and LogisticRegression) were tested using a validation dataset for their accuracy in determining which Megaline phone plan a customer will be drawn to based on monthly phone data. Furthermore, each model was tested using different hyperparameter arguements to determine each model's optimality. The DecisionTreeClassifier model was tested using different depths from 1 to 50, The RandomForestClassifier with different depths from 1 to 50 in increments of 10 and number of estimators from 1 to 50, and LogisticRegression with different solvers. The model that returned the highest accuracy score was the RandomForestClassifier with hyperparemeters depth set to 9 and the number of estimators set to 21. With the threshold for accuracy being 0.75, the RandomForestClassifier returned an accuracy score of approximately 0.792. Following was the DecisionTreeClassifier model with a depth of 7 returning an accuracy score of 0.790, and the LogisticRegression model using the newton_cg solver returning an accuracy score of 0.760. Ranges of hyperparemters can be increased to possibly achieve higher accuracy of the models at the cost of more time to run them, but in this case of time efficiency and optimal accuracy, it appears the RandomForestClassifier with a depth of 9 and 21 estimators should be used by Megaline to determine which of their new phone plans to advertise to customers who have not yet switched.