# Steps in Machine Learning Algorithm
- Import necessary libraries
- Load the dataset
- Get the X(explanatory features/variables) and y(target variable,always a series))
- If there is any categorical variable,convert to num data type,else skip this step.
- Split the data set in training and test sample using ***from sklearn.model_selection import train_test_split***
- Create the instance of the ML model.
- Train the Model on training dataset
- Test the Model on test sample of X to get the predicted y's
- Compare the Predicted y's with the test y to evaluate the model.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv("Advertising Budget and Sales.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,TV Ad Budget ($),Radio Ad Budget ($),Newspaper Ad Budget ($),Sales ($)
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


In [4]:
df.drop(columns = "Unnamed: 0",inplace = True)
df.head()

Unnamed: 0,TV Ad Budget ($),Radio Ad Budget ($),Newspaper Ad Budget ($),Sales ($)
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [7]:
df.rename(columns  ={"TV Ad Budget ($)":"TV","Radio Ad Budget ($)":"Radio","Newspaper Ad Budget ($)": "Newspaper","Sales ($)":"Sales"},inplace = True)
df.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [None]:
# y must be series ,x needs to be dataframe

In [9]:
X = df.drop("Sales",axis = 1) #independent/explanatory/feature
X.head()

Unnamed: 0,TV,Radio,Newspaper
0,230.1,37.8,69.2
1,44.5,39.3,45.1
2,17.2,45.9,69.3
3,151.5,41.3,58.5
4,180.8,10.8,58.4


In [10]:
y = df["Sales"] # dependent/explained/target
y.head()

0    22.1
1    10.4
2     9.3
3    18.5
4    12.9
Name: Sales, dtype: float64

In [12]:
from sklearn.model_selection import train_test_split
help(train_test_split)

Help on function train_test_split in module sklearn.model_selection._split:

train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)
    Split arrays or matrices into random train and test subsets.

    Quick utility that wraps input validation,
    ``next(ShuffleSplit().split(X, y))``, and application to input data
    into a single call for splitting (and optionally subsampling) data into a
    one-liner.

    Read more in the :ref:`User Guide <cross_validation>`.

    Parameters
    ----------
    *arrays : sequence of indexables with same length / shape[0]
        Allowed inputs are lists, numpy arrays, scipy-sparse
        matrices or pandas dataframes.

    test_size : float or int, default=None
        If float, should be between 0.0 and 1.0 and represent the proportion
        of the dataset to include in the test split. If int, represents the
        absolute number of test samples. If None, the value is set to the
        com

In [13]:
 X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

In [14]:
len(X)

200

In [16]:
len(X_test),len(X_train)

(66, 134)

In [17]:
X_train, X_test, y_train, y_test

(        TV  Radio  Newspaper
 42   293.6   27.7        1.8
 189   18.7   12.1       23.4
 90   134.3    4.9        9.3
 136   25.6   39.0        9.3
 51   100.4    9.6        3.6
 ..     ...    ...        ...
 106   25.0   11.0       29.7
 14   204.1   32.9       46.0
 92   217.7   33.5       59.0
 179  165.6   10.0       17.6
 102  280.2   10.1       21.4
 
 [134 rows x 3 columns],
         TV  Radio  Newspaper
 95   163.3   31.6       52.9
 15   195.4   47.7       52.9
 30   292.9   28.3       43.2
 158   11.7   36.9       45.2
 128  220.3   49.0        3.2
 ..     ...    ...        ...
 97   184.9   21.0       22.0
 31   112.9   17.4       38.6
 12    23.8   35.1       65.9
 35   290.7    4.1        8.5
 119   19.4   16.0       22.3
 
 [66 rows x 3 columns],
 42     20.7
 189     6.7
 90     11.2
 136     9.5
 51     10.7
        ... 
 106     7.2
 14     19.0
 92     19.4
 179    12.6
 102    14.8
 Name: Sales, Length: 134, dtype: float64,
 95     16.9
 15     22.4
 30     21.4
 1

In [18]:
from sklearn.linear_model import LinearRegression #LinearRegression is object so we need to create instance of it

### Initialize the model

In [19]:
model_1 = LinearRegression() 

### Fit the model to your data

In [20]:
model_1.fit(X_train,y_train) # X_train and y_train are training data and corresponding target values

### To Make predictions

In [23]:
y_predicted = model_1.predict(X_test)
y_predicted

array([16.58673085, 21.18622524, 21.66752973, 10.81086512, 22.25210881,
       13.31459455, 21.23875284,  7.38400509, 13.43971113, 15.19445383,
        9.01548612,  6.56945204, 14.4156926 ,  8.93560138,  9.56335776,
       12.10760805,  8.86091137, 16.25163621, 10.31036304, 18.83571624,
       19.81058732, 13.67550716, 12.45182294, 21.58072583,  7.67409148,
        5.67090757, 20.95448184, 11.89301758,  9.13043149,  8.49435255,
       12.32217788,  9.99097553, 21.71995241, 12.64869606, 18.25348116,
       20.17390876, 14.20864218, 21.02816483, 10.91608737,  4.42671034,
        9.59359543, 12.53133363, 10.14637196,  8.1294087 , 13.32973122,
        5.27563699,  9.30534511, 14.15272317,  8.75979349, 11.67053724,
       15.66273733, 11.75350353, 13.21744723, 11.06273296,  6.41769181,
        9.84865789,  9.45756213, 24.32601732,  7.68903682, 12.30794356,
       17.57952015, 15.27952025, 11.45659815, 11.12311877, 16.60003773,
        6.90611478])

# Model Evaluation/testing

In [47]:
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score,root_mean_squared_error

#### Mean absolute error represents the average of the absolute difference between the actual and predicted y

In [49]:
err_metric_1 = mean_absolute_error(y_test,y_predicted).round(2)
err_metric_1

1.49

#### Root mean absolute error(rmse) is the square root of the mean_squared_error(mse) it measures the standard deviation of the residuals/error terms)

In [50]:
err_metric_2 = root_mean_squared_error(y_test,y_predicted).round(2)
err_metric_2  # when outliers/dispersed values are more use this

1.93

#### R2 score : also known as coefficient of determination represents the proportion of the variance in the dependent variable which is explained by the Linear Regression model 

In [None]:
err_metric_3 = round(r2_score(y_test,y_predicted),2)
err_metric_3