# ML cube Platform SDK - Task and Model creation

In this notebook, you will see how to create a Task and how to add a model to start monitoring its performance.

**Requirements**:

1. API Key of a User with roles `COMPANY_ADMIN` or `PROJECT_ADMIN`
2. ID of the project

**User Input**

You will need to provide some values for variables and names to ensure the notebook runs correctly. Whenever you see the comment `# TO COMPLETE`, make sure to fill the empty string accordingly.

**Imports**

In [None]:
import logging
logger = logging.getLogger("platform_tutorial")

In [None]:
from ml3_platform_sdk.client import ML3PlatformClient
from ml3_platform_sdk import enums as ml3_enums
from ml3_platform_sdk import models as ml3_models

**Define entities ids**

Specify here the ids of the entities that are required to work on this notebook.

In [None]:
# TO COMPLETE
project_id = ""

If you don't remember the id of the project you can get the list of your projects:
```py
projects: List[Project] = client.get_projects()
logger.info(f'Projects inside the company are: {projects}')
```

**Instantiate the Client**

To interact with ML cube Platform, you need to instantiate the client. You will then use its methods to perform requests.
Please, insert the api key we provided you to instantiate the client.

In [None]:
API_KEY = ""  # TO COMPLETE
URL = "https://api.platform.mlcube.com"
ml3_client = ML3PlatformClient(api_key=API_KEY, url=URL)

**Create Task**

To monitor your models you need to add them in a `Task`. 

A `Task` represents an AI problem, such as regression or binary classification, over a dataset.

In [None]:
# TO COMPLETE
task_id = ml3_client.create_task(
    project_id=project_id,
    name="my-task",
    tags=['first-task'],
    task_type=ml3_enums.TaskType.REGRESSION,
    data_structure=ml3_enums.DataStructure.TABULAR,
    optional_target=False,
    cost_info=ml3_models.RegressionTaskCostInfo(
        currency=ml3_enums.Currency.EURO,
        overestimation_cost=30.0,
        underestimation_cost=20.0,
    )
)
logger.info(f"Created task with id {task_id}")

**Data schema**

The data schema defines the structure of the data used in this task.
It contains features, target and a set of mandatory metadata required by the ML cube Platform.
Each sample must have a `timestamp` and an `identifier`: the timestamp is used to sort your data chronologically and the identifier is used to share information about data without transferring them.

The data schema is specified by the class `DataSchema` which can be found in the `models` package of our sdk.
In the following cell you can see an example of a DataSchema. As you can notice, the model predictions are not included. The reason is that they will be automatically added when a model is created within the task.

In [None]:
# TO COMPLETE

data_schema = ml3_models.DataSchema(
    columns=[
        # METADATA - SAMPLE ID
        ml3_models.ColumnInfo(
            name='sample_id',
            data_type=ml3_enums.DataType.STRING,
            role=ml3_enums.ColumnRole.ID,
            is_nullable=False
        ),
        # METADATA - TIMESTAMP
        ml3_models.ColumnInfo(
            name='timestamp',
            data_type=ml3_enums.DataType.FLOAT,
            role=ml3_enums.ColumnRole.TIME_ID,
            is_nullable=False
        ),
        # FEATURE
        ml3_models.ColumnInfo(
            name='feature_0',
            data_type=ml3_enums.DataType.FLOAT,
            role=ml3_enums.ColumnRole.INPUT,
            is_nullable=False
        ),
        # TARGET
        ml3_models.ColumnInfo(
            name='target',
            data_type=ml3_enums.DataType.FLOAT,
            role=ml3_enums.ColumnRole.TARGET,
            is_nullable=False
        )
    ]
)

ml3_client.add_data_schema(task_id=task_id, data_schema=data_schema)

**Historical data**

Now that you have inserted the data schema for your Task you are able to upload data.
There are two classes of data: *historical* and *production*.
Historical data represent data you had before the model was in production, while production data are those coming from the production environment.
The reference data for the model are selected from the historical data by specifying a time range.

This is the first time you send data to ML cube Platform, therefore, we have some things to explain:

- data are composed of features, targets, predictions. You send each category separately since data can come from multiple sources;
- the operation of sending data belong to the category of operations that runs a pipeline inside the ML cube Platform. In this case the pipeline is composed only by the data step that reads the data, validate them and then if the storing policy is MLCUBE it stores inside the ML cube Platform's Secure Storage;
- the pipeline is identified by a `job_id` and you can follow the execution status by asking the client its information. Additionally, you can wait for the completion of the job by calling the method `wait_job_completion(job_id)`.

In the cell below, we send features using a `LocalDataSource`, since we have the file locally, and we use a `GCSDataSource` for the target because we have it on the cloud.
In order to use remote data sources you need to add credentials on the ML cube Platform. You can specify them in the `DataSource` object.

In [None]:
# TO COMPLETE
inputs_data_source = ml3_models.LocalDataSource(
    file_path="path/to/file.csv",
    file_type=ml3_enums.FileType.CSV,
    is_folder=False,
    folder_type=None
)
target_data_source = ml3_models.GCSDataSource(
    object_path="gs://path/to/file.csv",
    credentials_id='gcp_credentials_id',
    file_type=ml3_enums.FileType.CSV,
    is_folder=False,
    folder_type=None
)

logger.info(f'API - Add historical data')
job_id = ml3_client.add_historical_data(
    task_id=task_id,
    inputs=ml3_models.TabularData(source=inputs_data_source),
    target=ml3_models.TabularData(source=target_data_source)
)
logger.info(f'Job created, id {job_id}')

ml3_client.show_jobs()

logger.info(f'Waiting job completion')
ml3_client.wait_job_completion(job_id=job_id)
logger.info(f'Job completed')

**Create Model**

After the task is created, you can add AI models inside it.
A model is uniquely identified by the pair `name` and `version`.
The version identifies a specific trained instance of the model. Whenever you retrain your model, you will update its version on the ML cube Platform.
The metric_name field indicates the error or performance metric used in the ML Cube Platform to display the model's performance and it is also included in the retraining report.

In [None]:
# TO COMPLETE
model_id = ml3_client.create_model(
    task_id=task_id,
    name="model-name",
    version="v0.0.1",
    metric_name=ml3_enums.ModelMetricName.RMSE,
    preferred_suggestion_type=ml3_enums.SuggestionType.SAMPLE_WEIGHTS,
    with_probabilistic_output=False,
)
logger.info(f"Created model with id {model_id}")

**Model reference**

In the previous cell you created the model, but it still misses the training dataset. In the ML cube Platform, the training dataset is called *reference*.
You can add the reference data of the model by specifying the time range. The ML cube Platform will automatically select from the previously uploaded data the reference data specified.

In [None]:
# TO COMPLETE
logger.info(f'API - Add model reference')
job_id = ml3_client.set_model_reference(
    model_id=model_id,
    from_timestamp=0.,
    to_timestamp=0.,
)
logger.info(f'Job created, id {job_id}')

ml3_client.show_jobs()

logger.info(f'Waiting job completion')
ml3_client.wait_job_completion(job_id=job_id)
logger.info(f'Job completed')

**Congratulations!**

In this notebook, you learned how to create a task, add a model to the task, and upload both historical and reference data to the ML Cube Platform.