Species: This column represents the species of the fish. It is a categorical variable that categorizes each fish into one of seven species. The species may include names like "Perch," "Bream," "Roach," "Pike," "Smelt," "Parkki," and "Whitefish." This column is the target variable for the polynomial regression analysis, where we aim to predict the fish's weight based on its other attributes.

Weight: This column represents the weight of the fish. It is a numerical variable that is typically measured in grams. The weight is the dependent variable we want to predict using polynomial regression.

Length1: This column represents the first measurement of the fish's length. It is a numerical variable, typically measured in centimetres.

Length2: This column represents the second measurement of the fish's length. It is another numerical variable, typically measured in centimetres.

Length3: This column represents the third measurement of the fish's length. Similar to the previous two columns, it is a numerical variable, usually measured in centimetres.

Height: This column represents the height of the fish. It is a numerical variable, typically measured in centimetres.

Width: This column represents the width of the fish. Like the other numerical variables, it is also typically measured in centimetres.

In [270]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [271]:
data = pd.read_csv("../Fish.csv")


In [272]:
## Display Data
print(data.head())

  Species  Weight  Length1  Length2  Length3   Height   Width
0   Bream   242.0     23.2     25.4     30.0  11.5200  4.0200
1   Bream   290.0     24.0     26.3     31.2  12.4800  4.3056
2   Bream   340.0     23.9     26.5     31.1  12.3778  4.6961
3   Bream   363.0     26.3     29.0     33.5  12.7300  4.4555
4   Bream   430.0     26.5     29.0     34.0  12.4440  5.1340


In [273]:
## Check null values
print(data.isnull().sum())

Species    0
Weight     0
Length1    0
Length2    0
Length3    0
Height     0
Width      0
dtype: int64


In [274]:
## Checking different type of Species
print(data['Species'].unique())

['Bream' 'Roach' 'Whitefish' 'Parkki' 'Perch' 'Pike' 'Smelt']


In [275]:
# Convert Species to numerical values
data['Species'] = data['Species'].map({'Bream': 0, 'Roach': 1, 'Whitefish': 2, 'Parkki': 3, 'Perch': 4, 'Pike': 5, 'Smelt': 6})

In [276]:
# Import StandardScaler
from sklearn.preprocessing import StandardScaler

In [277]:
scaler = StandardScaler()  # Scale numerical features
numerical_cols = ["Length1", "Length2","Length3", "Height", "Width"]  # Adjust based on your data
data[numerical_cols] = scaler.fit_transform(data[numerical_cols])


In [278]:
## Train test split
x = data.drop('Species', axis=1)
y = data['Species']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [279]:
print(" X train shape: ", x_train.shape)
print(" Y train shape: ", y_train.shape)

 X train shape:  (127, 6)
 Y train shape:  (127,)


In [280]:
## Reshape y_train and y_test
y_train = y_train.values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)

In [281]:
print(" Y train shape: ", y_train.shape)
print(" Y test shape: ", y_test.shape)

 Y train shape:  (127, 1)
 Y test shape:  (32, 1)


# Linear Regression

In [282]:
from sklearn.linear_model import LinearRegression

In [283]:
regression = LinearRegression()

In [284]:
model = regression.fit(x_train.values, y_train)

In [285]:
y_pred = model.predict(x_test.values)

In [286]:
# Check the accuracy of the model
from sklearn.metrics import r2_score
print("R2 Score: ", r2_score(y_test, y_pred))

R2 Score:  0.9001937135474993


In [287]:
import pickle

model_pickle_path = '../fish_pred_model.pkl'

model_pickle = open(model_pickle_path, 'wb')
pickle.dump(model, model_pickle)
model_pickle.close()