### <b> Constructing Dataframe for Biometric Data - 2nd Iteration </b>

#### <b> [PLANNING] </b>

The goal of this <b> 2nd iteration </b> is to construct a dataset with 45 entries, corresponding to the 45 lettuce plants in the study. <p></p>

For each each entry/plant the <b> input </b> and <b> output/target </b> variables will be: <p></p>

<b> [INPUT VARS] </b>

* <b> Accumulated Temperature (Acc. T) </b>: The total amount of temperature the plant was exposed to during the period of the experiment, expressed in Celsius degrees (C)
* <b> Average Humidity (Avg. Humidity) </b>: The average humidity the plant was exposed to inside the greenhouse as captured by the sensors during the period of the experiment, expressed in percentage (%).
* <b> Accumulated Radiation (Acc. R) </b>: The total amount of PAR (Photosynthetically active radiation) the plant was exposed to during the period of the experiment, expressed in micromoles per second ($\mu$/s)
* <b> Accumulated Irrigation (Acc. I) </b>: The total amount of the aquous solution the plant was irrigated with during the period of the experiment, expressed in miliLiters (mL)
* <b> Accumulated Nitrates (Acc. N) </b>: The total amount of nitrates the plant was fed during the period of the experiment (calculated as a fraction of the aquous solution), expressed in miliLiters (mL)


<b> [OUTPUT/TARGET VARS *(1)] </b>


* <b> Diameter </b>: The diameter of a chosen leaf of the plant at the time of harvest, expressed in centimeters (cm)
* <b> Perpendicular </b>: The length of perpendicular line to the chosen diameter of a chosen leaf at the time of harvest, expressed in centimeters (cm)
* <b> Weight </b>: The weight of the plant at the time of harvest, expressed in kilograms (Kg)
* <b> Height </b>: The height of the plant at the time of harvest, expressed in centimeters (cm)
* <b> Thickness </b>: Leaf thickness at the time of harvest, expressed in centimeters (cm)
* <b> Number of leaves (N leaves) </b>: The number of leaves the plant presents at the time of harvest

*(1): Each target variable will be predicted at a time




* T_base = 6ºC (GDD)
* Treshold de radiação PAR planta?
* Peso gerado através do consumo de água (output var -> future work/interessante para perceber eficiência da planta)
* sol A: 6 mil_moles azoto/L
* sol B: 13 mil_moles azoto/L
* sol C: 17 mil_moles azoto/L
* avg diameter + perpendicular para var output
* leaf thickness 
* análise nutricionais (valor finais de azoto por grupo)
* escala bbch
* leaf only fresh weight (valores excel em gramas)


In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

#### <b> Load previous iteration of final dataset (withoud encoding) </b>

In [3]:
biometry = pd.read_csv("../../data/final_biometric_data.csv")

In [4]:
biometry.rename(columns={"Unnamed: 0":"Date"}, inplace=True)

#### <b> Biometry at Harvest </b>

* Create auxiliary dataframe for biometry at the time of harvest <b> (harvest date: 2024-10-03) </b>

In [5]:
harvest_biometry = biometry.loc[biometry["Date"] == "2024-10-03"]

In [6]:
harvest_biometry.columns

Index(['Date', 'Number', 'Line', 'Sample', 'CODE', 'No leaves', 'Diameter',
       'Perpendicular', 'Height', 'Max. Temp.', 'Min. Temp.', 'Mean. Temp.',
       'Max. Hum.', 'Min. Hum.', 'Mean. Hum.', 'Combined Temperature Average',
       'Combined Temperature Std Dev', 'Combined Humidity Average',
       'Combined Humidity Std Dev', 'BBCH', 'Average Leaf Thickness'],
      dtype='object')

In [7]:
# Remove unnecessary columns
harvest_biometry.drop(columns=['Max. Temp.', 'Min. Temp.', 'Mean. Temp.',
       'Max. Hum.', 'Min. Hum.', 'Mean. Hum.', 'Combined Temperature Average',
       'Combined Temperature Std Dev', 'Combined Humidity Average',
       'Combined Humidity Std Dev',], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  harvest_biometry.drop(columns=['Max. Temp.', 'Min. Temp.', 'Mean. Temp.',


#### <b> Accumulated Irrigation </b>

In [10]:
irrigation = pd.read_excel("../../raw_data/irrigation.xlsx")

In [11]:
irrigation

Unnamed: 0,Date,Sample,Quantity (mL)
0,2024-08-24,rngra1,50.0
1,2024-08-24,rngra2,50.0
2,2024-08-24,rngra3,50.0
3,2024-08-24,rngra4,50.0
4,2024-08-24,rngra5,50.0
...,...,...,...
985,2024-10-01,rwgrc1,7.5
986,2024-10-01,rwgrc2,14.5
987,2024-10-01,rwgrc3,5.0
988,2024-10-01,rwgrc4,12.0


In [12]:
# Convert Sample column values to Upper case to match biometry dataset
irrigation["Sample"] = irrigation["Sample"].str.upper()

In [13]:
# Group By Sample and add up daily irrigation
accumulated_irrigation = irrigation.groupby('Sample')['Quantity (mL)'].sum()

#### <b> Accumulated Nitrates </b>