| [$\leftarrow$ 3. Polynomial Regression ](n3_polynomial_regression.ipynb) | [5. Comparison and Conclusion $\rightarrow$](n5_comparison_conclusion.ipynb) |
| :-----------------------------------------------------------------: | :---------------------------------------------------------------: |

<hr>

### 4. **Linear and Polynomial Regression**

#### 4.1. **Getting Started**

On this notebook, the best polynomial model is combined with the best linear model. Again, we will compare the results between having the encoded species and having no species at all.

In [10]:
# Append the path to useful directories
import sys
sys.path.append('../my_functions')

# Packages needed
from download_dataset import download_dataset
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Download and extract the dataset
fishcatch = download_dataset(data_file='fishcatch', extension='.tar.xz')

# Quick peek at the data
df = pd.read_csv(fishcatch)
df.head(3).style.background_gradient(cmap='viridis')

Unnamed: 0,Species,Weight,Length1,Length2,Length3,Height,Width
0,Bream,242.0,23.2,25.4,30.0,11.52,4.02
1,Bream,290.0,24.0,26.3,31.2,12.48,4.3056
2,Bream,340.0,23.9,26.5,31.1,12.3778,4.6961


We transform weight using n = 5, since it was the best polynomial model from the previous notebook.

In [11]:
# Transforming weight
X_weight = df['Weight']

# Drop weight
df.drop('Weight', axis=1, inplace=True)

# PolynomialFeatures (preprocessing)
from sklearn.preprocessing import PolynomialFeatures

# Transforming the feature to the desired degree
poly = PolynomialFeatures(degree=5)
X_weight = poly.fit_transform(X_weight.values.reshape(-1, 1))

# Quick peek at the transformed feature
X_weight = pd.DataFrame(X_weight, columns=[f'Weight^{i}' for i in range(X_weight.shape[1])])
X_weight.head(3).style.background_gradient(cmap='viridis')

Unnamed: 0,Weight^0,Weight^1,Weight^2,Weight^3,Weight^4,Weight^5
0,1.0,242.0,58564.0,14172488.0,3429742096.0,829997587232.0
1,1.0,290.0,84100.0,24389000.0,7072810000.0,2051114900000.0
2,1.0,340.0,115600.0,39304000.0,13363360000.0,4543542400000.0


#### 4.2. **Droping the 'Species' variable**

In [12]:
# Including only the numeric columns
df_no_species = df.select_dtypes(include=['int64', 'float64'])
df_no_species.head(3).style.background_gradient(cmap='viridis')

Unnamed: 0,Length1,Length2,Length3,Height,Width
0,23.2,25.4,30.0,11.52,4.02
1,24.0,26.3,31.2,12.48,4.3056
2,23.9,26.5,31.1,12.3778,4.6961


In [13]:
# Combining the transformed weight with the other features
df_no_species = pd.concat([df_no_species, X_weight], axis=1)
df_no_species.head(3).style.background_gradient(cmap='viridis')

Unnamed: 0,Length1,Length2,Length3,Height,Width,Weight^0,Weight^1,Weight^2,Weight^3,Weight^4,Weight^5
0,23.2,25.4,30.0,11.52,4.02,1.0,242.0,58564.0,14172488.0,3429742096.0,829997587232.0
1,24.0,26.3,31.2,12.48,4.3056,1.0,290.0,84100.0,24389000.0,7072810000.0,2051114900000.0
2,23.9,26.5,31.1,12.3778,4.6961,1.0,340.0,115600.0,39304000.0,13363360000.0,4543542400000.0


### <hr>

| [$\leftarrow$ 3. Polynomial Regression ](n3_polynomial_regression.ipynb) | [5. Comparison and Conclusion $\rightarrow$](n5_comparison_conclusion.ipynb) |
| :-----------------------------------------------------------------: | :---------------------------------------------------------------: |
