### <b> Classification Models </b>

* The primary goal of the project was to focus on a TimeSeries approach in order to classify new entries of a given plant according to biometric and environmental features. However, due to a number of limited data points in the original dataset and a lack of frequent biometric measurements, I consider that a solid first approach would be to treat this problema as a Supervised Classification problem in order to predict the development stage, according to the BBCH scale, of a lettuce

* Using classic ML models will also help us establish a good baseline for prediction and error related metrics before trying some DL models

* Proposed Models: Decision Trees, Random Forests, or Gradient Boosting

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [2]:
lettuce_df = pd.read_csv("../data/enc_biometric_data.csv")

In [3]:
lettuce_df

Unnamed: 0,Date,Number,Line,Sample,CODE,No leaves,Diameter,Perpendicular,Height,Thickness 1,...,Thickness 3,Thickness 4,Thickness 5,Max. Temp.,Min. Temp.,Mean. Temp.,Max. Hum.,Min. Hum.,Mean. Hum.,BBCH
0,2024-09-07,1,1,1,39,10,14.5,10.4,9.50,0.00,...,0.00,0.00,0.0,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19
1,2024-09-07,2,1,1,32,10,10.9,9.7,9.80,0.00,...,0.00,0.00,0.0,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19
2,2024-09-07,3,1,1,40,13,15.8,13.9,9.40,0.00,...,0.00,0.00,0.0,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19
3,2024-09-07,4,1,1,31,10,12.1,7.2,9.50,0.00,...,0.00,0.00,0.0,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19
4,2024-09-07,5,1,1,37,11,14.2,10.5,12.50,0.00,...,0.00,0.00,0.0,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
355,2024-10-03,41,6,0,17,11,26.8,24.5,10.15,0.40,...,0.27,0.31,0.0,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19
356,2024-10-03,42,6,0,24,7,30.1,22.6,11.85,0.32,...,0.33,0.37,0.0,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19
357,2024-10-03,43,6,0,16,11,22.8,20.5,10.35,0.40,...,0.39,0.32,0.0,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19
358,2024-10-03,44,6,0,28,10,30.5,26.2,11.55,0.38,...,0.33,0.37,0.0,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19


##### <b> Separate "Date" column into Day, Month, Year columns </b>

In [4]:
lettuce_df["Date"] = pd.to_datetime(lettuce_df["Date"])

In [5]:
lettuce_df['Year'] = lettuce_df["Date"].dt.year
lettuce_df['Month'] = lettuce_df["Date"].dt.month
lettuce_df['Day'] = lettuce_df["Date"].dt.day

In [6]:
lettuce_df

Unnamed: 0,Date,Number,Line,Sample,CODE,No leaves,Diameter,Perpendicular,Height,Thickness 1,...,Max. Temp.,Min. Temp.,Mean. Temp.,Max. Hum.,Min. Hum.,Mean. Hum.,BBCH,Year,Month,Day
0,2024-09-07,1,1,1,39,10,14.5,10.4,9.50,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
1,2024-09-07,2,1,1,32,10,10.9,9.7,9.80,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
2,2024-09-07,3,1,1,40,13,15.8,13.9,9.40,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
3,2024-09-07,4,1,1,31,10,12.1,7.2,9.50,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
4,2024-09-07,5,1,1,37,11,14.2,10.5,12.50,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
355,2024-10-03,41,6,0,17,11,26.8,24.5,10.15,0.40,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3
356,2024-10-03,42,6,0,24,7,30.1,22.6,11.85,0.32,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3
357,2024-10-03,43,6,0,16,11,22.8,20.5,10.35,0.40,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3
358,2024-10-03,44,6,0,28,10,30.5,26.2,11.55,0.38,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3


In [10]:
# Drop date column
lettuce_df.drop(columns="Date", inplace=True)

In [11]:
lettuce_df

Unnamed: 0,Number,Line,Sample,CODE,No leaves,Diameter,Perpendicular,Height,Thickness 1,Thickness 2,...,Max. Temp.,Min. Temp.,Mean. Temp.,Max. Hum.,Min. Hum.,Mean. Hum.,BBCH,Year,Month,Day
0,1,1,1,39,10,14.5,10.4,9.50,0.00,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
1,2,1,1,32,10,10.9,9.7,9.80,0.00,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
2,3,1,1,40,13,15.8,13.9,9.40,0.00,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
3,4,1,1,31,10,12.1,7.2,9.50,0.00,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
4,5,1,1,37,11,14.2,10.5,12.50,0.00,0.00,...,28.500000,15.400000,19.395833,79.50000,44.100000,64.704167,19,2024,9,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
355,41,6,0,17,11,26.8,24.5,10.15,0.40,0.32,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3
356,42,6,0,24,7,30.1,22.6,11.85,0.32,0.26,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3
357,43,6,0,16,11,22.8,20.5,10.35,0.40,0.35,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3
358,44,6,0,28,10,30.5,26.2,11.55,0.38,0.32,...,21.758629,18.687616,19.879193,87.61628,81.984047,86.349222,19,2024,10,3


In [12]:
# Pop and reinsert bbch column so it remains in the end
bbch = lettuce_df.pop("BBCH")
lettuce_df["BBCH"] = bbch

#### <b> Train, Test split </b>