### <b> Constructing Dataframe for Biometric Data - 2nd Iteration </b>

#### <b> [PLANNING] </b>

The goal of this <b> 2nd iteration </b> is to construct a dataset with 45 entries, corresponding to the 45 lettuce plants in the study. <p></p>

For each each entry/plant the <b> input </b> and <b> output/target </b> variables will be: <p></p>

<b> [INPUT VARS] </b>

* <b> Growing Degree Days </b>: Measure of heat accumulation used to predict plant development ($$)
* <b> Average Humidity (Avg. Humidity) </b>: The average humidity the plant was exposed to inside the greenhouse as captured by the sensors during the period of the experiment, expressed in percentage (%).
* <b> Accumulated Radiation (Acc. R) </b>: The total amount of PAR (Photosynthetically active radiation) the plant was exposed to during the period of the experiment, expressed in micromoles per second ($\mu$/s)
* <b> Accumulated Irrigation (Acc. I) </b>: The total amount of the aquous solution the plant was irrigated with during the period of the experiment, expressed in miliLiters (mL)
* <b> Accumulated Nitrates (Acc. N) </b>: The total amount of nitrates the plant was fed during the period of the experiment (calculated as a fraction of the aquous solution), expressed in milimoles per Liter (milimoles/L)


<b> [OUTPUT/TARGET VARS *(1)] </b>


* <b> Diameter </b>: The diameter of a chosen leaf of the plant at the time of harvest, expressed in centimeters (cm)
* <b> Perpendicular </b>: The length of perpendicular line to the chosen diameter of a chosen leaf at the time of harvest, expressed in centimeters (cm)
* <b> Weight </b>: The weight of the plant at the time of harvest, expressed in kilograms (Kg)
* <b> Height </b>: The height of the plant at the time of harvest, expressed in centimeters (cm)
* <b> Thickness </b>: Leaf thickness at the time of harvest, expressed in centimeters (cm)
* <b> Number of leaves (N leaves) </b>: The number of leaves the plant presents at the time of harvest

*(1): Each target variable will be predicted at a time




* T_base = 6ºC (GDD)
* Treshold de radiação PAR planta?
* Peso gerado através do consumo de água (output var -> future work/interessante para perceber eficiência da planta)
* sol A: 6 mil_moles azoto/L
* sol B: 13 mil_moles azoto/L
* sol C: 17 mil_moles azoto/L
* avg diameter + perpendicular para var output
* leaf thickness 
* análise nutricionais (valor finais de azoto por grupo)
* escala bbch
* leaf only fresh weight (valores excel em gramas)


In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

#### <b> Load previous iteration of final dataset (withoud encoding) </b>

In [3]:
biometry = pd.read_csv("../../data/final_biometric_data.csv")

In [4]:
biometry.rename(columns={"Unnamed: 0":"Date"}, inplace=True)

#### <b> Biometry at Harvest </b>

* Create auxiliary dataframe for biometry at the time of harvest <b> (harvest date: 2024-10-03) </b>

In [5]:
harvest_biometry = biometry.loc[biometry["Date"] == "2024-10-03"]

In [6]:
harvest_biometry.columns

Index(['Date', 'Number', 'Line', 'Sample', 'CODE', 'No leaves', 'Diameter',
       'Perpendicular', 'Height', 'Max. Temp.', 'Min. Temp.', 'Mean. Temp.',
       'Max. Hum.', 'Min. Hum.', 'Mean. Hum.', 'Combined Temperature Average',
       'Combined Temperature Std Dev', 'Combined Humidity Average',
       'Combined Humidity Std Dev', 'BBCH', 'Average Leaf Thickness'],
      dtype='object')

In [7]:
# Remove unnecessary columns
harvest_biometry.drop(columns=['Max. Temp.', 'Min. Temp.', 'Mean. Temp.',
       'Max. Hum.', 'Min. Hum.', 'Mean. Hum.', 'Combined Temperature Average',
       'Combined Temperature Std Dev', 'Combined Humidity Average',
       'Combined Humidity Std Dev',], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  harvest_biometry.drop(columns=['Max. Temp.', 'Min. Temp.', 'Mean. Temp.',


#### <b> Accumulated Irrigation </b>

In [10]:
irrigation = pd.read_excel("../../raw_data/irrigation.xlsx")

In [11]:
irrigation

Unnamed: 0,Date,Sample,Quantity (mL)
0,2024-08-24,rngra1,50.0
1,2024-08-24,rngra2,50.0
2,2024-08-24,rngra3,50.0
3,2024-08-24,rngra4,50.0
4,2024-08-24,rngra5,50.0
...,...,...,...
985,2024-10-01,rwgrc1,7.5
986,2024-10-01,rwgrc2,14.5
987,2024-10-01,rwgrc3,5.0
988,2024-10-01,rwgrc4,12.0


In [12]:
# Convert Sample column values to Upper case to match biometry dataset
irrigation["Sample"] = irrigation["Sample"].str.upper()

In [13]:
# Group By Sample and add up daily irrigation
accumulated_irrigation = irrigation.groupby('Sample')['Quantity (mL)'].sum()

<b> Accumulated Irrigation column to add to dataset </b>

In [14]:
accumulated_irrigation

Sample
RNGRA1    1172.25
RNGRA2    1127.25
RNGRA3    1141.00
RNGRA4     966.25
RNGRA5     989.50
RNGRB1     970.25
RNGRB2    1331.00
RNGRB3    1114.75
RNGRB4    1005.50
RNGRB5    1240.00
RNGRC1    1017.75
RNGRC2    1021.25
RNGRC3    1061.50
RNGRC4    1110.00
RNGRC5    1173.75
RNROA1     657.50
RNROA2     856.25
RNROA3     870.50
RNROA4     884.25
RNROA5     800.25
RNROB1     719.25
RNROB2     901.25
RNROB3     928.00
RNROB4    1066.25
RNROB5     787.25
RNROC1     967.25
RNROC2     875.00
RNROC3     820.25
RNROC4     976.75
RNROC5     846.25
RWGRA1    1083.00
RWGRA2    1128.00
RWGRA3     990.50
RWGRA4    1063.00
RWGRA5    1083.50
RWGRB1    1059.50
RWGRB2     800.50
RWGRB3    1062.50
RWGRB4    1105.50
RWGRB5    1131.50
RWGRC1    1084.00
RWGRC2     877.50
RWGRC3    1045.50
RWGRC4    1058.00
RWGRC5     968.00
Name: Quantity (mL), dtype: float64

#### <b> Accumulated Nitrates </b>

Each plant sample belongs to either group A, B or C, depending or not if letter is displayed in its <i> Sample </i> name.
E.g.:

* RNGRA5 belongs to group A
* RNGRB1 belongs to group B
* RNGRC1  belongs to group C

Depending on the group the plant belongs to, it was irrigated with a aqueous solution with different concentrations of Nitrate.

* Group A solution's concentatrion: 6 mili_moles nitrates/L
* Group B solution's concentatrion: 13 mili_moles nitrates/L
* Group C solution's concentatrion: 17 mili_moles nitrates/L

The total amount of nitrates fed to the plant up until the harvest will be calculated based on the Daily Irrigation of the plant with the aqueous solution (volume) and the Concentrantrion of solution depending on which group the plants belong to (A, B or C), which will be added:

* $ Nitrates (mmol) = C (mmol/L) * V (L) $ (daily quantity for each sample)

This quantity will then be added up for each day resulting in the total amount of nitrates for each plant

In [15]:
irrigation

Unnamed: 0,Date,Sample,Quantity (mL)
0,2024-08-24,RNGRA1,50.0
1,2024-08-24,RNGRA2,50.0
2,2024-08-24,RNGRA3,50.0
3,2024-08-24,RNGRA4,50.0
4,2024-08-24,RNGRA5,50.0
...,...,...,...
985,2024-10-01,RWGRC1,7.5
986,2024-10-01,RWGRC2,14.5
987,2024-10-01,RWGRC3,5.0
988,2024-10-01,RWGRC4,12.0


In [16]:
irrigation["Quantity (L)"] = irrigation["Quantity (mL)"] / 1000

In [17]:
irrigation

Unnamed: 0,Date,Sample,Quantity (mL),Quantity (L)
0,2024-08-24,RNGRA1,50.0,0.0500
1,2024-08-24,RNGRA2,50.0,0.0500
2,2024-08-24,RNGRA3,50.0,0.0500
3,2024-08-24,RNGRA4,50.0,0.0500
4,2024-08-24,RNGRA5,50.0,0.0500
...,...,...,...,...
985,2024-10-01,RWGRC1,7.5,0.0075
986,2024-10-01,RWGRC2,14.5,0.0145
987,2024-10-01,RWGRC3,5.0,0.0050
988,2024-10-01,RWGRC4,12.0,0.0120


In [32]:
# Create new column Concentration and add the values defined above
irrigation.loc[irrigation["Sample"].str.contains('A'), 'Concentration'] = 6
irrigation.loc[irrigation["Sample"].str.contains('B'), 'Concentration'] = 13
irrigation.loc[irrigation["Sample"].str.contains('C'), 'Concentration'] = 17

In [38]:
irrigation

Unnamed: 0,Date,Sample,Quantity (mL),Quantity (L),Concentration
0,2024-08-24,RNGRA1,50.0,0.0500,6.0
1,2024-08-24,RNGRA2,50.0,0.0500,6.0
2,2024-08-24,RNGRA3,50.0,0.0500,6.0
3,2024-08-24,RNGRA4,50.0,0.0500,6.0
4,2024-08-24,RNGRA5,50.0,0.0500,6.0
...,...,...,...,...,...
985,2024-10-01,RWGRC1,7.5,0.0075,17.0
986,2024-10-01,RWGRC2,14.5,0.0145,17.0
987,2024-10-01,RWGRC3,5.0,0.0050,17.0
988,2024-10-01,RWGRC4,12.0,0.0120,17.0


In [39]:
irrigation["Quantity (milimoles)"] = irrigation["Quantity (L)"] * irrigation["Concentration"]

In [40]:
irrigation

Unnamed: 0,Date,Sample,Quantity (mL),Quantity (L),Concentration,Quantity (milimoles)
0,2024-08-24,RNGRA1,50.0,0.0500,6.0,0.3000
1,2024-08-24,RNGRA2,50.0,0.0500,6.0,0.3000
2,2024-08-24,RNGRA3,50.0,0.0500,6.0,0.3000
3,2024-08-24,RNGRA4,50.0,0.0500,6.0,0.3000
4,2024-08-24,RNGRA5,50.0,0.0500,6.0,0.3000
...,...,...,...,...,...,...
985,2024-10-01,RWGRC1,7.5,0.0075,17.0,0.1275
986,2024-10-01,RWGRC2,14.5,0.0145,17.0,0.2465
987,2024-10-01,RWGRC3,5.0,0.0050,17.0,0.0850
988,2024-10-01,RWGRC4,12.0,0.0120,17.0,0.2040


In [41]:
nitrates = irrigation.groupby("Sample")["Quantity (milimoles)"].sum()

<b> Nitrates quantity column to add to dataset </b>

In [42]:
nitrates

Sample
RNGRA1     7.03350
RNGRA2     6.76350
RNGRA3     6.84600
RNGRA4     5.79750
RNGRA5     5.93700
RNGRB1    12.61325
RNGRB2    17.30300
RNGRB3    14.49175
RNGRB4    13.07150
RNGRB5    16.12000
RNGRC1    17.30175
RNGRC2    17.36125
RNGRC3    18.04550
RNGRC4    18.87000
RNGRC5    19.95375
RNROA1     3.94500
RNROA2     5.13750
RNROA3     5.22300
RNROA4     5.30550
RNROA5     4.80150
RNROB1     9.35025
RNROB2    11.71625
RNROB3    12.06400
RNROB4    13.86125
RNROB5    10.23425
RNROC1    16.44325
RNROC2    14.87500
RNROC3    13.94425
RNROC4    16.60475
RNROC5    14.38625
RWGRA1     6.49800
RWGRA2     6.76800
RWGRA3     5.94300
RWGRA4     6.37800
RWGRA5     6.50100
RWGRB1    13.77350
RWGRB2    10.40650
RWGRB3    13.81250
RWGRB4    14.37150
RWGRB5    14.70950
RWGRC1    18.42800
RWGRC2    14.91750
RWGRC3    17.77350
RWGRC4    17.98600
RWGRC5    16.45600
Name: Quantity (milimoles), dtype: float64