# Part 2: Deploy a simple ML model - Further step configuration

## Import libraries

In [None]:
import craft_ai_sdk
import dotenv
import os
import pandas as pd
from sklearn import datasets

## Load environnement variables

In [None]:
dotenv.load_dotenv()

SDK_TOKEN = os.environ["CRAFT_AI_ACCESS_TOKEN"]
ENVIRONMENT_URL = os.environ["CRAFT_AI_ENVIRONMENT_URL"]

## SDK instantiation

In [None]:
sdk = craft_ai_sdk.CraftAiSdk(sdk_token=SDK_TOKEN, environment_url=ENVIRONMENT_URL)

## Clean Previous part

We can start by cleaning the objects we created in the hello world use case.

To do so we can simply use `delete_pipeline` and `delete_step` functions of the sdk. 

/!\ The order in which you call these functions is important since you can't delete a step that is used in the in a pipeline.

<u>Tips</u> : you can also use directly the `delete_step` function with the argument `force_dependents_deletion` passed to `True`. It will delete everything linked to the step as well.

In [None]:
sdk.delete_pipeline(pipeline_name="part-1-hello-world")
sdk.delete_step(step_name="part-1-hello-world")

## Upload dataset to Data Store

This use case uses the famous Iris dataset.

With the Craft.AI platform, your environment comes with computational resources and file storage. That's what we call the **data store**.

You can upload and download files and organize them using the SDK.

We will start by uploading this dataset to the data store using the `upload_data_store_object` function of the sdk. 

You have to pass two arguments:
- `filepath_or_buffer` : path of the file to be uploaded or a file-like object
- `object_path_in_datastore`: path to save the file to

You can find further information in the SDK documentation.

In [None]:
iris = datasets.load_iris(as_frame=True)
iris_df = pd.concat([iris.data, iris.target], axis=1)
iris_df.to_parquet("iris.parquet")

sdk.upload_data_store_object(
    filepath_or_buffer="iris.parquet",
    object_path_in_datastore="get_started/dataset/iris.parquet",
)

os.remove("iris.parquet")

We can also check all the objects contained in the datastore using the `list_data_store_objects` function.

In [None]:
sdk.list_data_store_objects()

And get information about a specific item with the `get_data_store_object_information` function.

In [None]:
sdk.get_data_store_object_information("get_started/dataset/iris.parquet")

## Step creation with the SDK

### Create a step

Now, it's time to create the **step** embedding our code. 

We will do exactly what we have done previously in the *Hello_world* section, but this time we will use more advanced options.

The argument `container_config` can contain many things to parametrize our step. Here we will focus on two specific parameters:
- `included_folders`: sometimes you may not need tp include all the files of your preject repository in your step. You can then specify the files and folder(s) to be included to prevent the step from accessing all code available in the repository. Here we only want to include the `src` folder.
- `requirements_path`: in order for our `requirements.txt` file to be taken into account and therefore to onboard all necessary librairies in the step, we add the path of this file in the `container_config`.

You can find further information and configuration settings in the SDK documentation.

The `included_folders` and `requirements_path` can also be **specified by default** directly in your project, on the platform (in the Settings of your project). If you specify those arguments in your step creation, it will take it first into account before checking the values set at the project level.


In [None]:
sdk.create_step(
    step_name="part-2-iristrain",
    function_path="src/part-2-irisModel.py",
    function_name="TrainIris",
    description="This function creates a classifier model for iris",
    container_config={
        "requirements_path": "requirements.txt",
        "included_folders": ["/src"],
    },
)

## Create a pipeline

### Create a pipeline

In [None]:
sdk.create_pipeline(
    pipeline_name="part-2-iristrain",
    step_name="part-2-iristrain",
)

## Run your Pipeline

In [None]:
sdk.run_pipeline(pipeline_name="part-2-iristrain")

## Check model creation

We can check the creation of the model by investigating the data store.

Using the previously introduced `get_data_store_object_information`, we can easily verify that the model has been created and well uploaded.

In [None]:
sdk.get_data_store_object_information("get_started/models/iris_knn_model.joblib")

Finally, to clean the datastore, we use the `delete_data_store_object` function.

In [None]:
sdk.delete_data_store_object("get_started/models/iris_knn_model.joblib")
sdk.delete_data_store_object("get_started/dataset/iris.parquet")