# Run the Training step
This notebook provides step-by-step instructions on how to install the training module for tile-based classification and execute a training run to evaluate its performance.

> Note: Before proceeding, make sure to select the correct kernel. In the top-right corner of the notebook, choose the Jupyter kernel named `Bash`.

## Setup the environment

In [1]:
export WORKSPACE=/workspace/machine-learning-process
export RUNTIME=${WORKSPACE}/runs
mkdir -p ${RUNTIME}
cd ${RUNTIME}
printenv | grep RUNTIME
pwd

XDG_RUNTIME_DIR=/workspace/.local
RUNTIME=/workspace/machine-learning-process/runs
/workspace/machine-learning-process/runs


## Create a hatch environment

The hatch environment provides a dedicated Python where the `make-ml-model` step dependencies are installed. This process can be done with hatch.

In [2]:
cd ${WORKSPACE}/training/make-ml-model
hatch env prune
hatch env create default

[2K∙∙∙ Waiting on shared resource                                                  
[2K[32m.. [0m [1;35mCreating environment: default[0m0m
[2K[32m  .[0m [1;35mInstalling project in development mode[0mt mode[0m
[1A[2K[?25l[32m.  [0m [1;35mChecking dependencies[0m
[2K[32m   [0m [1;35mSyncing dependencies[0mencies[0m
[1A[2K


## Run the make-ml-model application 

First dump the help:

In [3]:
hatch run default:tile-based-training --help

2025-05-12 14:46:46.361114: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-12 14:46:46.369642: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-12 14:46:46.423756: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-12 14:46:46.469253: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747061206.519975    1783 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747061206.53

In the cell below, the user can check the MLFLOW_TRACKING_URI which defined as environment variable during deployment of the code-server.

In [4]:
echo ${MLFLOW_TRACKING_URI} 

http://my-mlflow:5000


Now, run the `tile-based-training` command line tool with the parameters:

- stac_reference: https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json
- BATCH_SIZE: 2 
- CLASSES: 10 
- DECAY: 0.1 
- EPOCHS: 50 
- EPSILON: 0.000001 
- LEARNING_RATE: 0.0001 
- LOSS: categorical_crossentropy 
- MEMENTUM: 0.95 
- OPTIMIZER: Adam 
- REGULARIZER: None 
- SAMPLES_PER_CLASS: 1000

Make sure your mlflow is running 

In [5]:
hatch run default:tile-based-training \
    --stac_reference https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json \
    --BATCH_SIZE 2 \
    --CLASSES 10 \
    --DECAY 0.1 \
    --EPOCHS 5 \
    --EPSILON 0.000001 \
    --LEARNING_RATE 0.0001 \
    --LOSS categorical_crossentropy \
    --MEMENTUM 0.95 \
    --OPTIMIZER Adam \
    --REGULARIZER None \
    --SAMPLES_PER_CLASS 10


2025-05-12 14:46:54.435264: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-12 14:46:54.436021: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-12 14:46:54.439686: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-12 14:46:54.450275: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747061214.468311    1845 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747061214.47

List the outputs:



In [6]:
tree ${WORKSPACE}/training/make-ml-model/src/tile_based_training/output 

/workspace/machine-learning-process/training/make-ml-model/src/tile_based_training/output
├── data_ingestion
│   └── splitted_data.json
├── prepare_base_model
│   └── base_model.keras
└── training
    └── trained_model.keras

4 directories, 3 files


The user may train several tile-based classifiers using the `tile-based-training` module. One of the tracked artifacts through MLflow is the model's weights. The next step is to retrieve the best model, based on the desired evaluation metric, from the MLflow artifact registry and convert it to the ONNX format. This activity is explained in ["Export the Best Model to ONNX Format"](./ExtractModel.ipynb). Finally, this model can be integrated into the inference application package.

> **Note:** This process has already been completed. However, users may need to repeat it with their own candidate models.

## Clean-up 

In [None]:
rm -fr ${RUNTIME}/envs ${WORKSPACE}/training/make-ml-model/src/tile_based_training/output 