# Multivariate linear

Multivariate linear regression is a statistical method used for modeling the relationship between multiple independent variables and a single dependent variable.
In contrast to simple linear regression, which involves only one independent variable, multivariate linear regression considers several predictors simultaneously.
The goal is to create a linear equation that best fits the observed data, allowing for predictions or explanations of the dependent variable based on the values of the independent variables.

First, we load our CSV file.

In [1]:
import numpy as np
import pandas as pd

In [2]:
CSV_PATH = "https://gitlab.com/oasci/courses/pitt/biosc1540-2024s/-/raw/main/biosc1540/files/csv/advertising-data.csv"

df = pd.read_csv(CSV_PATH)

In the case of multivariate regression, we collect all of our independent variables in one dataframe called `df_features` and our dependent variable in `df_labels`.

In [3]:
label_column = "Product_Sold"

df_labels = df[label_column]
df_features = df.drop(columns=[label_column], inplace=False)

Now we need to convert the dataframe to NumPy arrays and reshape the labels.

In [4]:
labels = df_labels.to_numpy().reshape(-1, 1)
features = df_features.to_numpy()

## Linear

In [5]:
from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(X=features, y=labels)
print(reg.coef_)
print(reg.intercept_)
print(reg.score(X=features, y=labels))

[[1.97147823 2.79786525 1.59446751 2.43283307 1.40693022 3.91183385]]
[36.65524744]
0.9401750192922066


## Ridge

In [6]:
from sklearn.linear_model import Ridge

reg = Ridge(alpha=1.0)
reg.fit(X=features, y=labels)
print(reg.coef_)
print(reg.intercept_)
print(reg.score(X=features, y=labels))

[[1.97147817 2.79786515 1.59446745 2.43283298 1.40693016 3.9118337 ]]
[36.65551041]
0.9401750192922053
