# Starting a training round for a task
In this notebook, we will trigger training for a task in the project, and monitor the training job progress. We will use the project created in notebook [004](004_create_pipeline_project_from_dataset.ipynb) so if you have not run that one yet, it is recommended to do it first to make sure the project exists and is ready for training

In [None]:
# As usual we will connect to the platform first, using the server details from the .env file

from geti_sdk import Geti
from geti_sdk.utils import get_server_details_from_env

geti_server_configuration = get_server_details_from_env()

geti = Geti(server_config=geti_server_configuration)

### Selecting a project for training
As before, let's list all projects in the workspace and select one that we want to train

In [None]:
from geti_sdk.rest_clients import ProjectClient

project_client = ProjectClient(session=geti.session, workspace_id=geti.workspace_id)
projects = project_client.list_projects()

We will use the `COCO multitask animal demo` that we created in notebook [004](004_create_pipeline_project_from_dataset.ipynb). 

In [None]:
PROJECT_NAME = "COCO multitask animal demo"

project = project_client.get_project(project_name=PROJECT_NAME)

## Preparing to start training

#### Setting up the TrainingClient
To start and monitor training jobs on the platform, a `TrainingClient` needs to be created for the project:

In [None]:
from geti_sdk.rest_clients import TrainingClient

training_client = TrainingClient(
    session=geti.session, workspace_id=geti.workspace_id, project=project
)

#### Selecting a task to train
First thing to do is to select the task that we want to train. Let's go with the `detection` task in our project, which is the first trainable task in the pipeline. We will print a summary of the task to make sure we pick the right one

In [None]:
task = project.get_trainable_tasks()[0]
print(task.summary)

#### Listing the available algorithms
Now, let's list the available algorithms for this task. The training_client can be used for this:

In [None]:
available_algorithms = training_client.get_algorithms_for_task(task=task)
print(available_algorithms.summary)

Let's go with the algorithm that Geti uses by default for object detection. It is the `ATSS` algorithm, which is a larger and more accurate model than the `SSD` one. Because of its size it is also slower, but let's say we care most about accuracy for now.

We can get the default algorithms from the list of available algorithms. The cell below shows how it is done, and retrieves the default Detection algorithm.

In [None]:
algorithm = available_algorithms.get_default_for_task_type(task.type)

print(f"Default algorithm for `{task.type}` task: `{algorithm.name}`.\n")
print(algorithm.overview)

## Checking platform status
Before we start a new training round it may be a good idea to check the platform status, to make sure the project is not running another job already. In that case submitting a new job might not start training as expected, depending on what job is already running. The `training_client` can also be used to check the project status:

In [None]:
status = training_client.get_status()
print(status.summary)

## Starting the training
At this point we can start the training, using the `training_client.train_task()` method. The method takes additional optional parameters such as `train_from_scratch` and `enable_pot_optimization`, but we will leave these to their default values (`False`) for now. The `train_task()` method will return a `Job` object, that we can use to monitor the training progress.

In [None]:
job = training_client.train_task(
    algorithm=algorithm,
    task=task,
)

### Monitoring the training process
Using the training_client and the training `job` we just started, we can monitor the training progress on the platform. The `training_client.monitor_job()` method can monitor the status of a job, and will update the job status every 15 seconds. Program execution is halted untill the job has completed (either successfully or cancelled/failed).

> **NOTE**: Because training the task will take quite a bit of time, you may want to interrupt the monitoring at some point. This can be done by selecting the cell in which the monitoring is running and pressing the 'Interrupt the kernel' (solid square) button at the top of the page, or by navigating to the 'Kernel' menu in the top menu bar and selecting 'Interrupt the kernel' there. This will not cancel the job on the platform, it will just abort the job progress monitoring in the notebook.

In [None]:
training_client.monitor_job(job);

## Getting the model resulting from the training job
Once the training has finished successfully, we can set up a `ModelClient` for the project and use it to get the model that was trained in this particular job

In [None]:
from geti_sdk.rest_clients import ModelClient

model_client = ModelClient(
    session=geti.session, workspace_id=geti.workspace_id, project=project
)

To get the model information, simply pass the job to the `model_client.get_model_for_job()` method. Note that this will not download the actual model weights itself: Instead, it will return a `Model` object that holds all metadata for the model, such as the score it achieved on the test dataset, its creation date, the algorithm that it implements, etc. 

Trying to request the model while the training job is still running will result in a ValueError. In that case, please be patient and try again when the job is completed.

In [None]:
model = model_client.get_model_for_job(job)

if model is not None:
    print(model.overview)
else:
    print(
        f"Job '{job.name}' completed with status '{job.status.state}: "
        f"{job.status.message}', but did not result in a trained model. Most likely "
        f"the model training has failed, you could try restarting the training to "
        f"see if the problem persists."
    )