# Machine Learning Algorithms - Predict Solutions

Complete the following functions using the Machine Learning techniques you have covered in the training notebooks.

## Pre-processing

### Import Data

In [62]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support as score

df = pd.read_csv('data.csv').drop('Unnamed: 0', axis=1)

### Pre-process Data

In [63]:
# Regression labels
y_r = df['target_return']

# Classification labels
y_c = df['target_return'].apply(lambda x: 1 if x > 0 else 0)

# Features
X = df.drop(['Date', 'company', 'target_return'], axis=1)

In [64]:
# Standardize data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_standardize = pd.DataFrame(X_scaled,columns=X.columns)

In [65]:
# Regression train/test split
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_standardize, y_r, test_size=0.3, random_state=101)

# Classification train/test split
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_standardize, y_c, test_size=0.3, random_state=101)

## Function 1

Write a function to return the intercept as a float (rounded to the nearest 3 integers) of a linear regression model

* Given the training features (X_train) and labels (y_train)

In [66]:
def lin_reg_intercept(X_train, y_train):
    
    "Returns intercept (float) of linear regression model"
    
    # Your code here
    lm = LinearRegression()
    lm.fit(X_train, y_train)
    
    return round(lm.intercept_, 3)

In [67]:
lin_reg_intercept(X_train_r, y_train_r)

0.027

## Function 2

Write a function to return the number of coefficients greater than 0 in a lasso model (as an integer)

* Given the training features (X_train) and labels (y_train)
* For a specific value of the regularisation parameter (alpha)

In [68]:
def lasso_predictors(X_train, y_train, alpha):
    
    "Returns number (integer) of coefficients in lasso model that are greater than 0"
    
    # Your code here
    
    # Create instance for the model object
    lasso = Lasso(alpha)
    
    # Fit the models
    lasso.fit(X_train, y_train)
    
    # Compute the coefficience
    lasso_coef = lasso.coef_
    newList = []
    
    for i in lasso_coef:
        if i > 0:
            newList.append(i)
            
    return len(newList)
    

In [69]:
lasso_predictors(X_train_r, y_train_r, 0.005)

2

## Function 3

Write a function to return the mean squared error as a float (rounded to the nearest 3 integers) of a linear regression model 

* Given the training features (X_train) training labels (y_train), testing features (X_test) and testing labels (y_test)

In [70]:
def lnr_mse(X_train, y_train, X_test, y_test):
    
    "Returns the MSE (float) of a linear regression model"
    
    
    # Your code here
    
    # Create an instance for the model
    lm = LinearRegression()
    
    # Fit the modles
    lm.fit(X_train_r, y_train_r)
    
    yPred = lm.predict(X_test_r)
    mse = mean_squared_error(y_test_r, yPred)
    return round(mse, 3)
    

In [71]:
lnr_mse(X_train_r, y_train_r, X_test_r, y_test_r)

0.032

## Function 4

Write a function to return the mean absolute error as a float (rounded to the nearest 3 integers) of a ridge regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)
* For a specific value of the regularisation parameter (alpha)

In [72]:
def ridge_mae(X_train, y_train, X_test, y_test, alpha):
    
    "Returns the MAE (float) of the ridge regression model"
    
    # Your code here
    
    # Create an instance for the model
    ridge = Ridge(alpha)
    
    # Fit the model
    ridge.fit(X_train, y_train)
    yPred = ridge.predict(X_test)
    
    mae = mean_absolute_error(y_test, yPred)
    
    return round(mae, 3)

In [73]:
ridge_mae(X_train_r, y_train_r, X_test_r, y_test_r, 1)

0.096

## Function 5

Write a function to return the root mean squared error as a float (rounded to the nearest 3 integers) of a linear regression model

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [74]:
def lnr_rmse(X_train, y_train, X_test, y_test):
    
    "Returns the root mean squared error (float) of a linear regression model"
    
    # Your code here
    
    # Create an instance for the model
    lm = LinearRegression()
    
    # Fit the model
    lm.fit(X_train_c, y_train_c)
    yPred = lm.predict(X_test_c)
    
    mse = mean_squared_error(y_test_c, yPred)
    rmse = np.sqrt(mse)
    return round(rmse, 3)

In [75]:
lnr_rmse(X_train_c, y_train_c, X_test_c, y_test_c)

1.106

## Function 6

Write a function to return the highest coefficient in a logistic regression model as a float (rounded to the nearest 3 integers)

* Given the training features (X_train) and labels (y_train)

In [76]:
def highest_coef(X_train, y_train):
    
    "Returns the highest coefficient in a logistic regression model as a float (rounded to the nearest 3 integers)"
    
    # Your code here
    
    # Create an instace for the model
    lm = LogisticRegression()
    
    # Fit the model
    lm.fit(X_train, y_train)
    
    # Compute coeffience
    log_coef = max(lm.coef_)
    
    return round(max(log_coef), 3)

In [77]:
highest_coef(X_train_c, y_train_c)



0.977

## Function 7

Write a function to return the number of true positives (as an integer) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [78]:
def log_reg_tp(X_train, y_train, X_test, y_test):
    
    "Returns the number (integer) of true positives for a logistic regression model"
    
    # Your code here
    
    # Create an instance for the model
    lm = LogisticRegression()
    
    # Fit the model
    lm.fit(X_train, y_train)
    yPred = lm.predict(X_test)
    
    TP = confusion_matrix(y_test, yPred)
    for i in TP:
            return TP[0][0]
    
    

In [79]:
log_reg_tp(X_train_c, y_train_c, X_test_c, y_test_c)



16

## Function 8

Write a function to return the precision as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [88]:
def lgr_precision(X_train, y_train, X_test, y_test):
    
    "Returns the precision (float) for a logistic regression model"
    
    # Your code here
    
    # Create an instance for the model
    lm = LogisticRegression()
    
    # Fit the model
    lm.fit(X_train, y_train)
    
    yPred = lm.predict(X_test)
    precision = score(y_test_c, yPred, average='weighted') [0]
    
    return round(precision, 3)

In [89]:
lgr_precision(X_train_c, y_train_c, X_test_c, y_test_c)



0.608

## Function 9

Write a function to return the f1-score as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [117]:
def lgr_f1_score(X_train, y_train, X_test, y_test):
    
    "Returns the f1-score (float) for the logistic regression model"
     
    # Your code here
    lm = LogisticRegression()
    
    lm.fit(X_train, y_train)
    
    yPred = lm.predict(X_test)
    
    f1_score = score(y_test,yPred,average='weighted')[2]
    
    return round(f1_score, 3)

In [118]:
lgr_f1_score(X_train_c, y_train_c, X_test_c, y_test_c)



0.577

## Function 10

Write a function to return a specific metric (precision, recall or f1-score) as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [119]:
def lgr_metric_output(X_train, y_train, X_test, y_test, metric):
    
    "Returns the chosen metric (float) for the logistic regression model"
    
    # Your code here
    
    lm = LogisticRegression()
    
    lm.fit(X_train, y_train)
    
    yPred = lm.predict(X_test)
    
    my_dictionary = {'Precision':0,'Recall':1,'F1_score':2}
    
    return round(score(y_test,yPred,average='weighted')[my_dictionary[metric]],3)
    

In [120]:
lgr_metric_output(X_train_c, y_train_c, X_test_c, y_test_c, 'F1_score')



0.577