# ML cube Platform SDK - Task and Model creation

In this notebook, you will see how to create a Task and add models to it in order to start to monitor them.

**Requirements**:

1. API Key of a User with roles `COMPANY_ADMIN` or `PROJECT_ADMIN`
2. Id of the project

**User Input**

In the notebook you will need to complete variables and names to correctly run it.
Whenever you see the comment `# TO COMPLETE` you need to fill the empty string.

**Imports**

**Define entities ids**

Specify here the ids of the entities that are required to work on this notebook.

In [None]:
# TO COMPLETE
project_id = ""

If you don't remember the id of the projcet you can get the list of projects:
```py
projects: List[Project] = client.get_projects()
logger.info(f'Projects inside the company are: {projects}')
```

In [None]:
import logging
logger = logging.getLogger("platform_tutorial")

In [None]:
from ml3_platform_sdk.client import ML3PlatformClient
from ml3_platform_sdk import enums as ml3_enums
from ml3_platform_sdk import models as ml3_models

**Instantiace the Client**

To interact with ML cube Platform you need to instantiate the client only the first time.
Then you will use its methods to perform requests.
Please, insert the api key we provided you to instantiate the client.

In [None]:
API_KEY = ""  # TO COMPLETE
URL = "https://api.platform.mlcube.com"
ml3_client = ML3PlatformClient(api_key=API_KEY, url=URL)

**Create Task**

To monitor your models you need to add them in a `Task`, it represents a AI problem like regression or classification over a dataset.

In [None]:
# TO COMPLETE
task_id = ml3_client.create_task(
    project_id=project_id,
    name="my-task",
    tags=['first-task'],
    task_type=ml3_enums.TaskType.REGRESSION,
    data_structure=ml3_enums.DataStructure.TABULAR,
    optional_target=False,
    cost_info=ml3_models.RegressionTaskCostInfo(
        currency=ml3_enums.Currency.EURO,
        overestimation_cost=30.0,
        underestimation_cost=20.0,
    )
)
logger.info(f"Created task with id {task_id}")

**Data schema**

The data schema describes the data used in this task by your models.
It contains features, targets and a set of mandatory metadata required by ML cube Platform for a correct function.
Each sample is required to have associated a `timestamp` and an `identifier`: the timestamp is used to sort your data and the identifier to share information about data without transferring them.

The data schema is specified by the class `DataSchema` that you find in the `models` package of our sdk.
In the following cell you can see an example of a DataSchema, as you can notice, the model's predictions are not mentioned. That's why they will be automatically added when you create a model for the task.

In [None]:
# TO COMPLETE

data_schema = ml3_models.DataSchema(
    columns=[
        # METADATA - SAMPLE ID
        ml3_models.ColumnInfo(
            name='sample_id',
            data_type=ml3_enums.DataType.STRING,
            role=ml3_enums.ColumnRole.ID,
            is_nullable=False
        ),
        # METADATA - TIMESTAMP
        ml3_models.ColumnInfo(
            name='timestamp',
            data_type=ml3_enums.DataType.FLOAT,
            role=ml3_enums.ColumnRole.TIME_ID,
            is_nullable=False
        ),
        # FEATURE
        ml3_models.ColumnInfo(
            name='feature_0',
            data_type=ml3_enums.DataType.FLOAT,
            role=ml3_enums.ColumnRole.INPUT,
            is_nullable=False
        ),
        # TARGET
        ml3_models.ColumnInfo(
            name='target',
            data_type=ml3_enums.DataType.FLOAT,
            role=ml3_enums.ColumnRole.TARGET,
            is_nullable=False
        )
    ]
)

ml3_client.add_data_schema(task_id=task_id, data_schema=data_schema)

**Historical data**

Ok, now that you inserted the data schema for your Task you are able to upload data.
There are two classes of data: *historical* and *production*.
Historical data represents data you had before the model was in production while, production data are data that comes from the production environment.
Model reference data are selected from historical one by specifying the time range.

This is the first time you send data to ML cube Platform, therefore, we have some things to explain:

- data are composed of features, targets, predictions. You send each category separately since data can come from multiple sources;
- the operations of sending data belong to the category of operations that runs a pipeline inside ML cube Platform. In this case the pipeline is composed only by the data step that reads the data, validate them and then if the storing policy is MLCUBE it stores inside the ML cube Platform's Secure Storage;
- the pipeline is identified by a `job_id` and you can follow the execution status by asking to the client its information.

In the cell below, we sends features using `LocalDataSource` beceause we have the file locally, and we use a `GCSDataSource` for the target because we have it on the cloud.
In order to use remote data sources you need to add credentials on ML cube Platform and then you specify them in the `DataSource` object.

In [None]:
# TO COMPLETE
inputs_data_source = ml3_models.LocalDataSource(
    data_structure=ml3_enums.DataStructure.TABULAR,
    file_path="path/to/file.csv",
    file_type=ml3_enums.FileType.CSV,
    is_folder=False,
    folder_type=None
)
target_data_source = ml3_models.GCSDataSource(
    dataset_type=ml3_enums.DatasetType.TABULAR,
    object_path="gs://path/to/file.csv",
    credentials_id='gcp_credentials_id',
    file_type=ml3_enums.FileType.CSV,
    is_folder=False,
    folder_type=None
)

logger.info(f'API - Add historical data')
job_id = ml3_client.add_historical_data(
    task_id=task_id,
    inputs=ml3_models.TabularData(source=inputs_data_source),
    target=ml3_models.TabularData(source=target_data_source)
)
logger.info(f'Job created, id {job_id}')

ml3_client.show_jobs()

logger.info(f'Waiting job completion')
ml3_client.wait_job_completion(job_id=job_id)
logger.info(f'Job completed')

**Create Model**

After the task is created, you can add AI models inside it.
A model is univoquely identified by the pair `name` and `version`.
The version identifies a specific trained instance of the model, whenever, you retrain your model, you will update its version on ML cube Platform.
The field `metric_name` represents the error or performance metric used inside ML cube Platform to show to you the statistics of the model or in the retraining report.

In [None]:
# TO COMPLETE
model_id = ml3_client.create_model(
    task_id=task_id,
    name="model-name",
    version="v0.2.1",
    metric_name=ml3_enums.ModelMetricName.RMSE,
    preferred_suggestion_type=ml3_enums.SuggestionType.SAMPLE_WEIGHTS
)
logger.info(f"Created model with id {model_id}")

**Model reference**

In the previous cell you created the model but it is not complete because it misses the training dataset that in ML cube Platform is called *reference*.
Here you add the reference data of the model by specifying the time range, ML cube Platform automatically select from all the previously uploaded data the reference data.

In [None]:
# TO COMPLETE
logger.info(f'API - Add model reference')
job_id = ml3_client.set_model_reference(
    model_id=model_id,
    from_timestamp=0.,
    to_timestamp=0.,
)
logger.info(f'Job created, id {job_id}')

ml3_client.show_jobs()

logger.info(f'Waiting job completion')
ml3_client.wait_job_completion(job_id=job_id)
logger.info(f'Job completed')

**Congratulations!**

In this notebook, you learned how to create a task, add a model to this task and uploading to ML cube Platform both historical and reference data.