# Correction of test number 3

This test is about finding the weight of different fish species according to their dimensions.

The different columns of the dataset are:
- *Species* The species of the fish
- *Weight* The weight of the fish in gram (g)
- *Length1* The vertical length in cm
- *Length2* The diagonal length in cm
- *Length3* The cross length in cm
- *Height* The height in cm
- *Width* The diagonal width in cm

In [1]:
import pandas as pd
pd.set_option("display.max_rows", 16)
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('Fish.csv')
df.head()
df['Species'].value_counts()

Perch        56
Bream        35
Roach        20
Pike         17
Smelt        14
Parkki       11
Whitefish     6
Name: Species, dtype: int64

## Question 1

Find the three coefficients relating the weight of breams, a particular fish species, according to its Length1, Height and Width (3 factors).

In this question, we don't use any training/testing split, you should use the full dataset (35 fish).

Round your results with two (2) decimal places.

1. Length1 coefficient
2. Height coefficient
3. Width coefficient
4. Find the expected weight of a fish that has the following dimensions :
Length1 = 25, Width = 4.5 and Height = 12.5

In [2]:
Bream=df.loc[df['Species']=='Bream']

X_Length1=np.array(Bream['Length1']).reshape(-1,1)
X_Height=np.array(Bream['Height']).reshape(-1,1)
X_Width=np.array(Bream['Width']).reshape(-1,1)
Y=np.array(Bream['Weight']).reshape(-1,1)
X=np.concatenate((X_Length1,X_Height,X_Width),axis=1)

regr = LinearRegression().fit(X, Y)

# The estimated equation coefficients
print('Estimated equation is: (',round(regr.coef_[0,0],2),') Length1 + (',round(regr.coef_[0,1],2),') Height + (',round(regr.coef_[0,2],2),') Width + (',round(regr.intercept_[0],2),')')

X_Pred=np.array([25,12.5,4.5]).reshape(1,-1)
Y_Pred=regr.predict(X_Pred)
round(Y_Pred[0][0],2)

Estimated equation is: ( 12.81 ) Length1 + ( 63.13 ) Height + ( 51.63 ) Width + ( -1009.02 )


332.6

## Question 2

Find the Pearson correlation coefficient that links the Height of Perch, a particular fish species, to their Weight.

Round your results with two (3) decimal places.

1. The coefficient
2. Is there a good relatioship between those two factors ? If yes, write 'Yes'. If no, write 'No'.

Be careful this is case sensitive. Best method is to copy and paste the words between quote signs. Don't add the quote signs.

In [3]:
Perch=df.loc[df['Species']=='Perch']
Perch.head()

Unnamed: 0,Species,Weight,Length1,Length2,Length3,Height,Width
72,Perch,5.9,7.5,8.4,8.8,2.112,1.408
73,Perch,32.0,12.5,13.7,14.7,3.528,1.9992
74,Perch,40.0,13.8,15.0,16.0,3.824,2.432
75,Perch,51.5,15.0,16.2,17.2,4.5924,2.6316
76,Perch,70.0,15.7,17.4,18.5,4.588,2.9415


In [4]:
print(round(Perch.corr()['Height']['Weight'],3))

0.968


In [5]:
Perch.corr()

Unnamed: 0,Weight,Length1,Length2,Length3,Height,Width
Weight,1.0,0.958361,0.958656,0.959506,0.968441,0.963943
Length1,0.958361,1.0,0.999713,0.999427,0.98542,0.974447
Length2,0.958656,0.999713,1.0,0.999779,0.985584,0.974617
Length3,0.959506,0.999427,0.999779,1.0,0.985909,0.975131
Height,0.968441,0.98542,0.985584,0.985909,1.0,0.982943
Width,0.963943,0.974447,0.974617,0.975131,0.982943,1.0


## Question 3

This question involves using the function train_test_split.

For Perch fish, a particular fish species, find the coefficient of determination $R^2$ that you obtain when linking the couple (Lenght3,Height) to their Weight.

Create a training and testing dataset, by having 75% of the data in the training set and 25% of it in the testing set, and by choosing the random seed of 15, using the following command:

`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=15)`

1. The correlation coefficient
2. Does the regression model well fit the observed data ? If yes, write 'Yes'. If no, write 'No'.

In [6]:
from sklearn.metrics import r2_score

X_Length3=np.array(Perch['Length3']).reshape(-1,1)
X_Height=np.array(Perch['Height']).reshape(-1,1)
Y=np.array(Perch['Weight']).reshape(-1,1)
X=np.concatenate((X_Length3,X_Height),axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=15)

regr = LinearRegression().fit(X_train, y_train)

regr_Weight_pred = regr.predict(X_test)

round(r2_score(y_test, regr_Weight_pred),2)

0.95

## Bonus question

Add one 'Volume' column in your data, that is: 

$\textrm{Volume} = \textrm{Length2} * \textrm{Width} * \textrm{Height}$

That new column give you an indication of the volume of the fish.

For each fish species of the initial dataset, compute the relationship between the Volume and the Weight.

What fish is the most dense ? (Has the biggest coefficient between Volume and Weight)

In [7]:
df['Volume']=df['Length2']*df['Width']*df['Height']
Species_name=df.Species.unique().tolist()
Species_name

['Bream', 'Roach', 'Whitefish', 'Parkki', 'Perch', 'Pike', 'Smelt']

In [8]:
data=[]
for FishName in Species_name:
    Fish=df.loc[df['Species']==FishName]

    X_Volume=np.array(Fish['Volume']).reshape(-1,1)
    Y=np.array(Fish['Weight']).reshape(-1,1)

    regr_Fish = LinearRegression().fit(X_Volume, Y)

    # The estimated equation coefficients
    print('Estimated equation for '+FishName+' is: Weight = (',round(regr_Fish.coef_[0,0],2),') Volume + (',round(regr_Fish.intercept_[0],2),')')

    data.append([FishName,round(regr_Fish.coef_[0,0],2)])
    
Final_df = pd.DataFrame(data, columns = ['Species', 'Density']) 
Final_df[Final_df['Density']==Final_df['Density'].max()]

Estimated equation for Bream is: Weight = ( 0.19 ) Volume + ( 76.99 )
Estimated equation for Roach is: Weight = ( 0.25 ) Volume + ( 2.55 )
Estimated equation for Whitefish is: Weight = ( 0.3 ) Volume + ( -32.12 )
Estimated equation for Parkki is: Weight = ( 0.23 ) Volume + ( 5.71 )
Estimated equation for Perch is: Weight = ( 0.26 ) Volume + ( 15.67 )
Estimated equation for Pike is: Weight = ( 0.36 ) Volume + ( 1.21 )
Estimated equation for Smelt is: Weight = ( 0.2 ) Volume + ( 3.74 )


Unnamed: 0,Species,Density
5,Pike,0.36


In [12]:
data=[]
for FishName in Species_name:
    Fish=df.loc[df['Species']==FishName]

    X_Volume=np.array(Fish['Volume']).reshape(-1,1)
    Y=np.array(Fish['Weight']).reshape(-1,1)

    regr_Fish = LinearRegression(fit_intercept=False).fit(X_Volume, Y)

    # The estimated equation coefficients
    print('Estimated equation for '+FishName+' is: Weight = (',round(regr_Fish.coef_[0,0],2),') Volume')

    data.append([FishName,round(regr_Fish.coef_[0,0],2)])
    
Final_df = pd.DataFrame(data, columns = ['Species', 'Density']) 
Final_df[Final_df['Density']==Final_df['Density'].max()]

Estimated equation for Bream is: Weight = ( 0.21 ) Volume
Estimated equation for Roach is: Weight = ( 0.25 ) Volume
Estimated equation for Whitefish is: Weight = ( 0.29 ) Volume
Estimated equation for Parkki is: Weight = ( 0.24 ) Volume
Estimated equation for Perch is: Weight = ( 0.26 ) Volume
Estimated equation for Pike is: Weight = ( 0.36 ) Volume
Estimated equation for Smelt is: Weight = ( 0.28 ) Volume


Unnamed: 0,Species,Density
5,Pike,0.36


Unnamed: 0_level_0,Weight,Length1,Length2,Length3,Height,Width,Volume,coefficient1
Species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Bream,617.828571,30.305714,33.108571,38.354286,15.183211,5.427614,2848.26935,0.216914
Parkki,154.818182,18.727273,20.345455,22.790909,8.962427,3.220736,640.884646,0.241569
Perch,382.239286,25.735714,27.892857,29.571429,7.86187,4.745723,1432.076654,0.266913
Pike,718.705882,42.476471,45.482353,48.717647,7.713771,5.086382,2021.005503,0.355618
Roach,152.05,20.645,22.275,24.97,6.694795,3.65785,594.870203,0.255602
Smelt,11.178571,11.257143,11.921429,13.035714,2.209371,1.340093,37.691077,0.296584
Whitefish,531.0,28.8,31.316667,34.316667,10.027167,5.47305,1869.565691,0.284023
