# NYUS.2.1.malus training

The model is trained using AutoGluon(1.1.0) in Python 3.10.14. However, the training data can generate a prediction model using the most updated AutoGluon package with any supported versions of Python.

## Model training
The goal of this step is to generate a model named 'NYUS2_1_malus' in a folder with same name. 

In [1]:
#!pip install autogluon==1.1.0
import autogluon
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [2]:
from autogluon.tabular import TabularDataset, TabularPredictor

In [3]:
df=pd.read_csv('All_LT50_data_grape_apple.csv', sep=",", header=0)

In [4]:
#Drop unnecessary columns
df_training = df.drop(['Date','Location','photoperiod.Daylength','DP'],axis=1)

In [6]:
#Randomizing train data
df_training = df_training.sample(frac=1, random_state=25)

In [7]:
df_training.shape

(11949, 171)

In [8]:
#Check row label (LT50)
LT50_column = 'LT50'
print("Summary of age variable: \n", df_training[LT50_column].describe())

Summary of age variable: 
 count    11949.000000
mean       -21.337650
std          5.700328
min        -45.000000
25%        -24.200000
50%        -21.912941
75%        -18.331500
max         -4.366667
Name: LT50, dtype: float64


In [9]:
#Training with AutoGluon
predictor_LT50 = TabularPredictor(label=LT50_column, path="NYUS2_1_malus").fit(df_training, presets='best_quality',num_bag_folds = 10, num_stack_levels = 4)

Presets specified: ['best_quality']
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=4, num_bag_folds=10, num_bag_sets=1
Dynamic stacking is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
Detecting stacked overfitting by sub-fitting AutoGluon on the input data. That is, copies of AutoGluon will be sub-fit on subset(s) of the data. Then, the holdout validation data is used to detect stacked overfitting.
Sub-fit(s) time limit is: 3600 seconds.
Starting holdout-based sub-fit for dynamic stacking. Context path is: NYUS2_1_malus\ds_sub_fit\sub_fit_ho.
Running the sub-fit in a ray process to avoid memory leakage.
Spend 928 seconds for the sub-fit(s) during dynamic stacking.
Time left for full fit of AutoGluon: 2672 seconds.
Star

  df = df.fillna(column_fills, inplace=False, downcast=False)
	-1.371	 = Validation score   (-root_mean_squared_error)
	273.24s	 = Training   runtime
	0.41s	 = Validation runtime
Fitting model: XGBoost_BAG_L4 ... Training model for up to 124.9s of the 368.54s of remaining time.
	Fitting 10 child models (S1F1 - S1F10) | Fitting with SequentialLocalFoldFittingStrategy
	-1.3666	 = Validation score   (-root_mean_squared_error)
	25.49s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: NeuralNetTorch_BAG_L4 ... Training model for up to 99.14s of the 342.78s of remaining time.
	Fitting 10 child models (S1F1 - S1F10) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, stopping training early. (Stopping on epoch 6)
	Ran out of time, stopping training early. (Stopping on epoch 6)
	Ran out of time, stopping training early. (Stopping on epoch 7)
	Ran out of time, stopping training early. (Stopping on epoch 7)
	Ran out of time, stopping training early. (Stopping on e

	Ran out of time, stopping training early. (Stopping on epoch 11)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
	Ran out of time, stopping training early. (Stopping on epoch 12)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
  df = df.fillna(column_fills, inplace=False, downcast=False)
	Ran out of time, stopping training early. (Stopping on epoch 15)
  df = df.fillna(column_fills, inplace=False, downcast=Fal

	Ran out of time, stopping training early. (Stopping on epoch 23)
  df = df.fillna(column_fills, inplace=False, downcast=False)
	-1.3809	 = Validation score   (-root_mean_squared_error)
	145.97s	 = Training   runtime
	0.42s	 = Validation runtime
Fitting model: WeightedEnsemble_L6 ... Training model for up to 360.0s of the 5.47s of remaining time.
	Ensemble Weights: {'LightGBMXT_BAG_L1': 0.292, 'CatBoost_BAG_L1': 0.264, 'ExtraTreesMSE_BAG_L2': 0.111, 'LightGBM_BAG_L1': 0.083, 'RandomForestMSE_BAG_L2': 0.083, 'NeuralNetFastAI_BAG_L2': 0.069, 'NeuralNetTorch_BAG_L2': 0.056, 'CatBoost_BAG_L2': 0.042}
	-1.2959	 = Validation score   (-root_mean_squared_error)
	0.48s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 2691.04s ... Best model: "WeightedEnsemble_L6"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("NYUS.2")


In [10]:
#The best model with best performance during training
predictor_LT50.get_model_best()

  predictor_LT50.get_model_best()


'WeightedEnsemble_L3'