# Multivariate linear

Multivariate linear regression is a statistical method used for modeling the relationship between multiple independent variables and a single dependent variable.
In contrast to [univariate linear regression](../univariate-linear), which involves only one independent variable, multivariate linear regression considers several simultaneously.
The goal is to create a linear equation that best fits the observed data, allowing for predictions or explanations of the dependent variable based on the values of the independent variables.

As always, let's load our CSV file.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
CSV_PATH = "https://gitlab.com/oasci/courses/pitt/biosc1540-2024s/-/raw/main/biosc1540/files/csv/advertising-data.csv"

df = pd.read_csv(CSV_PATH)

In the case of multivariate regression, we collect all of our independent variables in one dataframe called `df_features` and our dependent variable in `df_targets`.

In [3]:
target_column = "Product_Sold"

df_features = df.drop(columns=[target_column], inplace=False)
df_targets = df[target_column]

Now we need to convert the dataframe to NumPy arrays and reshape the targets.

In [4]:
targets = df_targets.to_numpy().reshape(-1, 1)
features = df_features.to_numpy()

## Linear

You actually use the same procedure to do multivariate regression.

In [5]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X=features, y=targets)
print(f"Intercept: {model.intercept_[0]:.3f}")
print(f"Coefficients: {model.coef_[0]}")

Intercept: 36.655
Coefficients: [1.97147823 2.79786525 1.59446751 2.43283307 1.40693022 3.91183385]


The intercept (`reg.intercept_`) is the constant term in the linear equation.
The intercept represents the estimated product sales when all advertising spending is zero.

The coefficients (`reg.coef_`) represent the weights assigned to each advertising channel (TV, Billboards, Google Ads, Social Media, Influencer Marketing, Affiliate Marketing) in predicting product sales.
These coefficients indicate the estimated change in product sales for a one-unit increase in each respective advertising channel, holding other variables constant.

Awesome!
We already performed a linear fit for `Social_Media` and found the coefficient to be `2.418`.
Let's check to see what it is now.

In [6]:
social_media_index = df_features.columns.get_loc("Social_Media")
print(model.coef_[0][social_media_index])

2.432833068526262


Wait a second, that is slightly different than before.
Which one is more accurate?

Well, let's see what the score is of this new model and compare it against our univariate score of `0.154`.

In [7]:
print(model.score(X=features, y=targets))

0.9401750192922066


Wow!
That is much better.




In helping the company, you can use these coefficients to guide them on how each advertising channel contributes to product sales.
For example, higher coefficients suggest a stronger positive impact on sales.
Adjusting ad budgets based on these coefficients could optimize their advertising strategy for higher sales.
Remember that correlation does not imply causation, so further analysis and experimentation may be needed to validate the findings.

## Polynomial


[`PolynomialFeatures`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) will convert our individual features into all possible polynomial combinations.
For example, if we had two features $a$ and $b$, `PolynomialFeatures` would return the features $a$, $b$, $a^2$, $ab$, and $b^2$.

Let's proceed with our example data.

In [8]:
from sklearn.preprocessing import PolynomialFeatures

poly_degree = 2

# poly_featurizer is a class that will convert features
poly_featurizer = PolynomialFeatures(degree=poly_degree)

features_poly = poly_featurizer.fit_transform(features)

print(f"n features: {features.shape[1]}")
print(f"n poly features: {features_poly.shape[1]}")

n features: 6
n poly features: 28


Okay, we went from 6 to 28 features.
Curious as to what they are?

In [9]:
feature_names = poly_featurizer.get_feature_names_out()
print(feature_names)

['1' 'x0' 'x1' 'x2' 'x3' 'x4' 'x5' 'x0^2' 'x0 x1' 'x0 x2' 'x0 x3' 'x0 x4'
 'x0 x5' 'x1^2' 'x1 x2' 'x1 x3' 'x1 x4' 'x1 x5' 'x2^2' 'x2 x3' 'x2 x4'
 'x2 x5' 'x3^2' 'x3 x4' 'x3 x5' 'x4^2' 'x4 x5' 'x5^2']


1. Original features: $x_1, \; x_2, \; x_3, \; x_4, \; x_5, \; x_6$
2. Squared features: $x_1^2, \; x_2^2, \; x_3^2, \; x_4^2, \; x_5^2, \; x_6^2$
3. Product features: $x_1x_2, \; x_1x_3, \; x_1x_4, \; x_1x_5, \; x_1x_6, \; x_2x_3, \; x_2x_4, \; x_2x_5, \; x_2x_6, \; x_3x_4, \; x_3x_5, \; x_3x_6, \; x_4x_5, \; x_4x_6, \; x_5x_6$

We have already lost a lot of interpretability, as it is difficult to discern each feature's individual contributions.
Let's proceed anyway.

What is that `1` in there?
Well, that is the bias and this allows you to have a non-zero intercept.
This means we do not want our `LinearRegression` to fit another intercept, so we have to use `fit_intercept=False`.

In [10]:
model_poly = LinearRegression(fit_intercept=False)

# Fit the model with the polynomial features
model_poly.fit(features_poly, targets)

In [11]:
for feature_name, coeff in zip(feature_names, model_poly.coef_[0]):
    print(f"{feature_name}: {coeff:.3e}")

1: 2.703e+01
x0: 2.161e+00
x1: 3.447e+00
x2: 1.158e+00
x3: 2.262e+00
x4: 1.089e+00
x5: 4.033e+00
x0^2: -3.170e-04
x0 x1: -3.674e-04
x0 x2: 3.040e-04
x0 x3: -6.913e-05
x0 x4: 4.965e-05
x0 x5: 3.813e-04
x1^2: 1.981e-04
x1 x2: -4.774e-04
x1 x3: -3.110e-04
x1 x4: -1.083e-04
x1 x5: -4.007e-04
x2^2: 2.815e-04
x2 x3: 2.485e-04
x2 x4: 2.522e-04
x2 x5: -2.722e-05
x3^2: 7.594e-05
x3 x4: 3.029e-04
x3 x5: -1.274e-05
x4^2: -4.412e-05
x4 x5: 2.018e-04
x5^2: -1.886e-04


Let's compare our model score to the standard linear regression of `0.94018`

In [12]:
score_poly = model_poly.score(X=features_poly, y=targets)
print(f"Poly score: {score_poly:.5f}")

Poly score: 0.94243


We see that our polynomial model has a slightly better $R^2$ score, but its not substantial.
However, now we do not have an easily interpretable model to recommend our business.

## Ridge

Ridge regression, also known as Tikhonov regularization or $L2$ regularization, is a linear regression technique that introduces a regularization term to the linear regression objective function.
The primary goal of ridge regression is to address the issue of multicollinearity in linear regression models.

### Objective function


In ridge regression, a regularization term is added to this objective function.
The new objective function becomes:

$$
\text{minimize} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \alpha \sum_{j=1}^{p} \beta_j^2
$$

Here:

-   $n$ is the number of observations.
-   $p$ is the number of predictors (features).
-   $y_i$​ is the actual value of the dependent variable for the ii-th observation.
-   $\hat{y}_i$​ is the predicted value of the dependent variable for the ii-th observation.
-   $\alpha$ is the regularization parameter, controlling the strength of regularization.
-   $\beta_j$ represents the coefficients of the linear regression model.


### Limitations


**Not Suitable for Feature Selection**

One of the primary limitations of ridge regression lies in its inability to perform feature selection.
Unlike some other regularization techniques, ridge regression retains all features in the model, making it less suitable for scenarios where variable selection is a critical requirement.

**Interpretability Challenges**

While ridge regression addresses multicollinearity, it does so at the expense of straightforward interpretability.
The regularization term introduces a level of complexity to coefficient interpretation, as the coefficients are shrunk towards zero.
This departure from the clear interpretability of standard linear regression should be acknowledged.

**Dependency on Scaling**

The performance of ridge regression is influenced by the scale of the variables.
Rescaling or standardizing the features is often necessary to ensure effective regularization.
Failure to do so can result in unequal penalization of coefficients, impacting the model's stability.

**No Sparsity in Coefficients**

Unlike some regularization methods, such as LASSO (L1 regularization), ridge regression does not lead to exact zero coefficients.
It shrinks coefficients towards zero but retains all features in the model.
If sparsity is a critical consideration, alternative regularization methods may be more appropriate.

**Model Complexity**

Ridge regression introduces a level of model complexity, with the choice of the regularization parameter ($\alpha$) playing a pivotal role.
Selecting an optimal $\alpha$ value often involves techniques like cross-validation, adding an additional layer of complexity to model tuning.

**Assumption of Linearity**

Similar to standard linear regression, ridge regression assumes a linear relationship between the independent and dependent variables.
If the true relationship is significantly nonlinear, ridge regression may not capture the underlying patterns accurately.

**Limited Handling of Multicollinearity**

While ridge regression is effective in mitigating multicollinearity, it may not completely eliminate the issue.
In cases of severe multicollinearity, additional techniques or data preprocessing methods may be necessary.

**Sensitivity to Outliers**

Ridge regression exhibits reduced sensitivity to outliers compared to standard linear regression, yet extreme values can still influence the regularization process and impact the resulting coefficients.
Practitioners should exercise caution in the presence of outliers.

**Not a Panacea**

It is crucial to recognize that ridge regression is not a universal solution.
Its effectiveness depends on the specific characteristics of the data and the objectives of the analysis.
Careful consideration of alternative regularization techniques may be warranted in certain scenarios.

In [13]:
from sklearn.linear_model import Ridge

model_ridge = Ridge(alpha=1.0)
model_ridge.fit(X=features, y=targets)

print(f"Intercept: {model_ridge.intercept_[0]:.3f}")
print(f"Coefficients: {model_ridge.coef_[0]}")

Intercept: 36.656
Coefficients: [1.97147817 2.79786515 1.59446745 2.43283298 1.40693016 3.9118337 ]


In [14]:
print(model_ridge.score(X=features, y=targets))

0.9401750192922053
