<a href="https://colab.research.google.com/github/geonextgis/Mastering-Machine-Learning-and-GEE-for-Earth-Science/blob/main/04_Machine_Learning_Algorithms/03_Multiple_Linear_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Multiple Linear Regression**
Multiple linear regression is a statistical method used in predictive modeling and data analysis. It extends simple linear regression, which involves modeling the relationship between a dependent variable (also known as the response variable) and a single independent variable (predictor), to cases where there are multiple independent variables. In multiple linear regression, you have more than one predictor variable.

<center><img src="https://miro.medium.com/v2/resize:fit:1400/0*pJsp76_deJvdDean" width="60%"></center>

<center><img src="https://aegis4048.github.io/images/featured_images/multiple_linear_regression_and_visualization.png" width="60%"></center>

## **Import Required Libraries**

In [1]:
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [23]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import plotly.express as px
import plotly.graph_objects as go

import warnings
warnings.filterwarnings("ignore")

## **Generate a Data for Regression**

In [4]:
# Generate a data for regression
# X = independent featues
# y = dependent feature
X, y = make_regression(n_samples=100,
                       n_features=2,
                       n_informative=2,
                       n_targets=1,
                       noise=50)

In [10]:
# Create a dataframe
data_dict = {"feature1": X[:, 0], "feature2": X[:, 1], "target":y}
df = pd.DataFrame(data=data_dict)
print(df.shape)
df.head()

(100, 3)


Unnamed: 0,feature1,feature2,target
0,0.372361,-0.963456,18.63181
1,0.189188,1.496411,107.140201
2,0.538128,0.010721,90.762397
3,-0.34346,-0.952194,49.038528
4,-0.667832,2.133493,106.942583


## **Plot the Data**

In [15]:
# Plot a 3-dimensional scatter plot
fig = px.scatter_3d(data_frame=df, x="feature1", y="feature2",
                    z="target", width=600, height=600)
fig.update_traces(marker={'size': 4})
fig.show()

## **Train Test Split**

In [17]:
X_train, X_test, y_train, y_test = train_test_split(df.drop("target", axis=1),
                                                    df["target"],
                                                    test_size=0.3,
                                                    random_state=0)
X_train.shape, X_test.shape

((70, 2), (30, 2))

## **Train a Linear Regression Model**

In [19]:
# Instantiate a linear Regression object
lr = LinearRegression()

# Fit the training data
lr.fit(X_train, y_train)

In [20]:
# Print the coefficients
print("Coefficients:", lr.coef_)

Coefficients: [19.57314123 18.20392878]


In [21]:
# Print the intercept value
print("Intercept:", lr.intercept_)

Intercept: 7.958513589526762


In [22]:
# Predict the test data
y_pred = lr.predict(X_test)
y_pred

array([ 16.49329693,   7.64997778,  18.68653128,   3.63250901,
        30.57832075,  -7.30496918,  -9.40785405, -10.51122727,
        -8.17123682, -13.88409678,  -1.03985522,   8.88560224,
        63.70202891, -25.71757987,  17.93083009,  27.80199636,
         8.36386671,  24.84158829,  32.71898928,  -1.83903719,
       -15.85060437,  52.95694829, -16.09773792,  28.92166936,
        22.26604079, -16.32220197,  20.64265895,  38.29870547,
       -18.38466618,  -8.4903337 ])

## **Accuracy Assessment**

In [24]:
print("Mean Absolute Error (MAE):", mean_absolute_error(y_test, y_pred))
print("Mean Squared Error (MSE):", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))

Mean Absolute Error (MAE): 41.66725079437291
Mean Squared Error (MSE): 2753.101344149744
R2 Score: 0.13484580575844196


## **Plot the Regression Plane**

In [25]:
# Check the minimum value of the data
df.min()

feature1     -2.079585
feature2     -2.768087
target     -118.579170
dtype: float64

In [26]:
# Check the maximum value of the data
df.max()

feature1      2.162938
feature2      2.133493
target      145.509742
dtype: float64

In [29]:
# Make a mesh grid
x = np.linspace(start=-3, stop=3, num=10)
y = np.linspace(start=-3, stop=3, num=10)
xGrid, yGrid = np.meshgrid(y, x)

In [34]:
# Combine x and y cor=ordinates grid
final = np.vstack((xGrid.ravel().reshape(1, 100), yGrid.ravel().reshape(1, 100))).T

# Predict the z value
final_z = lr.predict(final).reshape(10, 10)
z = final_z

In [39]:
fig = px.scatter_3d(data_frame=df, x="feature1", y="feature2",
                    z="target", width=600, height=600)
fig.update_traces(marker={'size': 4})
fig.add_trace(go.Surface(x=x, y=y, z=z))
fig.show()