<a href="https://colab.research.google.com/github/rgantonio/danknight_webnotes/blob/main/sample_regressions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sample Regression Routines
The guide is simple and used for the sake of simple regressions routine. The steps are:

1. Data clean the data set
- Preferrably there are no NaN
- All empty sets are filled with entries
- All categorical sets are encoded to numerical values (make sure they make sense)
2. Seperate the $X$ features and $y$ data.
3. Split the data into train and test sets.
4. Scale the data to appropriate 
- Normalize the features and scale them into standard form
5. Initialize (or import) the model
- Ridge
- Regression
- Lasso
- Combinations with CV or so
6. Create the model
7. Fit the model. Can fit with cross validation models.
8. Analyze scores so and tune hyperparameters
- This includes reporting mean squared errors and so on
9. Repeat 6-7 until we get the desired scores
10. Save the model



**WARNING** The codes below won't work. It's just a sample guide. Specifically it will use the cross_validate format because it provides a rigorous analysis for the error metrics.

In [None]:
##########################################
# Importing important packages
##########################################

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

##########################################
# 1. Let's assume that a data set has been cleaned. 
#    Let's import the data in this part.
##########################################

df = pd.read_csv("cool_data.csv")

# Make it a habit to inspect the first few entries
df.head()

# Make it a habit to inspect the overall information
df. info()

##########################################
# 2. Separate X and y data
##########################################
X = df.drop('y_data',axis = 1)
y = df['y_data']

##########################################
# 3. Split data into train and test sets
##########################################

# First import the train_test_split package
from sklearn.model_selection import train_test_split

# Split the train and test sets. Make sure to indicate the X features and y labels
# test_size is the percentage of tests from the entire data set
# random_state indicates the random seed
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

##########################################
# 4. Scale the data into standard distribution
##########################################

# Import the StandardScaler package
from sklearn.preprocessing import StandardScaler

# Declare the scaler
scaler = StandardScaler()

# Compute the mean and standard deviation and save it into the scaler
scaler.fit(X_train)

# Transform the train and test features according to the precomputed fitting parameters
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

##########################################
# 5. Initialize or import the model
##########################################

# Importing ridge regression (this is the L2 version)
# Note you can take in Lasso or ElasticNet depending on what you need
from sklearn.linear_model import Ridge

##########################################
# 6. Creating the model
##########################################
# The alpha term here is the learning rate / parameter
model = Ridge(alpha=100)

##########################################
# 7. Fit the model
##########################################

# In some cases we would like to import the models with cross validation scoring
from sklearn.model_selection import cross_validate

# We fit the model at the same time generate the scoring metrics for us to tune the hyperparameters
# We have the model previous initialized
# We have X_train data, y_train data
# We set the scoring metrics
# Do a K-fold of 5 from the cv =  argument
scores = cross_validate(model,X_train,y_train,
                         scoring=['neg_mean_absolute_error','neg_mean_squared_error','max_error'],cv=5)

##########################################
# 8. Analyze scores so and tune hyperparameters
##########################################

# If you either used cross_validate or cross_val_score you can just call scores to view the scores
# If not, you can import:
#   from sklearn.metrics import mean_squared_error
# Then compare the mean squared error by:
# mean_squared_error(y_test,y_test_pred)
# Of course don't forget to create the predictions for this part

##########################################
# 9. Repeat 6-8 depending on how you tune the hyperparameters
##########################################

# At this point you should also check the data depending on your hold out test 
# At this point this is the final reporting of your data
model = Ridge(alpha=1)
model.fit(X_train,y_train)
y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test,y_final_test_pred)

##########################################
# 10. Save the model!
##########################################

# Import the joblib to save the model
from joblib import dump, load

# Use dump to save the model
dump(final_model, 'sales_model.joblib') 

# Use load to import the model
loaded_model = load('sales_model.joblib')

# Run the imported model
loaded_model.predict(campaign)