<a href="https://colab.research.google.com/github/vectice/vectice-examples/blob/master/Quick_references/Experiment_Quick_Reference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install Vectice

In [None]:
!pip install --q "vectice[github, gitlab, bitbucket]==2.2.3"

In [None]:
!pip show vectice

### Vectice Configuration

In [None]:
## Import the required packages
from vectice import Vectice
from vectice import Experiment
from vectice.api.json import ModelType
from vectice.api.json import JobType
from vectice.api.json import JobArtifactType
from vectice.api.json import ModelVersionStatus
from vectice.api.json import VersionStrategy
from vectice.api.json import FileMetadata
import logging
import os
logging.basicConfig(level=logging.INFO)

# Specify the API endpoint for Vectice.
# You can specify your API endpoint here in the notebook, but we recommand you to add it to a .env file
os.environ['VECTICE_API_ENDPOINT']= "Your API Endpoint"

# To use the Vectice Python library, you first need to authenticate your account using an API token.
# You can generate an API token from the Vectice UI, by going to the "API Tokens" section in the "My Profile" section
# which is located under your profile picture.
# You can specify your API Token here in the notebook, but we recommend you to add it to a .env file
os.environ['VECTICE_API_TOKEN'] = "Your API Token"

#Project id
project_id= ID

### Create your experiment

Experiment is a high level API dedicated to capture your Job and Runs. The Experiment class comes with some great features to make the use of Vectice easier when it comes to managing a unique Job and all its entities.

The Experiment can also be used with the context manager syntax (python with keyword). In this case, the end of the run will be automatically managed.

When initializing an Experiment, it :

- Creates an instance of Vectice to create Artifacts and an instance of the Client to make API calls.

- Gets or creates the job linked to the experiment.

- Manages the creation of a Runs for this job.

- Manages the input and output artifacts. By default, every artifact used before starting the experiment is considered as an _input_ of the job, and every artifact added after starting the experiment is considered as an _output_.

- Beside the CRUD naming conventions, Experiment uses other specific naming convention to make its usage as simple as possible.

     - add methods : add_dataset_version() will add the defined dataset version to the experiment

    - add_model_version() will add the defined model version to the experiment

    - log methods :
        - log_hyper_parameters() add a set of hyper parameter to the specified model of the run

        - log_metrics() add a set of metrics to the specified model of the run. the metrics are timestamped key/values pair

When creating an experiment, we need to specify:

1) a job (mandatory): Job names are unique for each project

2) A project id or name: Only the project id is needed, but if you want to use the project name, the wokspace name or id is required.

3) a job type (optional)


Job names and job types are useful to group and search runs in the Vectice UI.

In [None]:
# autocode = True enables you to track your git changes for your code automatically every time you execute a run (see below).
#Only the project id is needed, but if you want to use the project name, the wokspace name or id is required.
experiment = Experiment(job="New Experiment", project=project_id, job_type=JobType.PREPARATION, auto_code=True)

**You can see in the app all the created assets and artifacts. You can easily find them in the "My Recent Activity" from the home page of the application or by typing their names in the global search, or by going to the specified section for the asset/artifact in your project in the Vectice application**

### We create a dataset and a model to use 

In [None]:
## Creating datasets and models to be used to test model and dataset versions
dataset = experiment.vectice.create_dataset(name="Dataset")
model = experiment.vectice.create_model(name="Model")

You can see in the Vectice app that a new dataset called **Dataset** and a model called **Model** have been created in your project.

### Dataset versions

Experiment enables you to add, get, list, update and delete dataset version

#### Add_dataset_version

1- Using auto-versioning (By default): The Vectice library automatically detects if there have been changes to the dataset you are using. If it detects changes, it will generate a new version of your dataset automatically or use the last version if not.

2- turning off auto-versioning (version_strategy=VersionStrategy.MANUAL): In this case you can add resources and declare your files metadata when adding a dataset version to a dataset.

- If you specify a version name:
    - If the version doesn't exist in the dataset: A new dataset version will be created with the given name and used in the experiment.
    - If the version exists: The exisiting version will be used in the experiment

- If you don't specify a version name: A new version will be created and used in the experiment.

##### Using auto-versioning

In [None]:
#The Vectice library automatically detects if there have been changes to the dataset you are using.
#If it detects changes, it will generate a new version of your dataset automatically. 

experiment.add_dataset_version(dataset="Dataset", version_strategy=VersionStrategy.AUTOMATIC)
experiment.start()
experiment.complete()

You can see in the Vectice app that a new run has been created and a dataset version of the dataset **Dataset** is attached to the run.

##### Turning off auto-versioning

###### Add a dataset version with resources

- You can turn off auto-versioning and do manuel versioning.
- You can add resources in this case
- To add resources to a dataset. This dataset should have a connection linked to it, to be able to add resources from the storage.
- You can see how to create a connection in Vectice [here](https://doc.vectice.com/connections/index.html).

In [None]:
## We create a dataset with a connection
dataset_with_connection = experiment.vectice.create_dataset(name="dataset_with_connection", connection="connection name")

You can see in the Vectice app that a new dataset called **dataset_with_connection** has been created in your project.

In [None]:
## Adding a dataset version with resources
## Resources. Here, we use a files stored in GCS
## You can use your own files
resources = ["gs://vectice_tutorial/kc_house_data_cleaned.csv"]
experiment.add_dataset_version(dataset="dataset_with_connection", version_name="My_version", description="new desc",version_strategy=VersionStrategy.MANUAL,
                        is_starred=True, properties={"key":"value"}, resources=resources)
experiment.start()
experiment.complete()

You can see in the Vectice app that a new run has been created and a new dataset version called **My_version** of the dataset **dataset_with_connection** has been created and attached to the run.

###### Add a dataset version and declare files metadata

- You can turn off auto-versioning and do manuel versioning.
- You can declare your files metadata manually in this case

In [None]:
experiment.add_dataset_version(dataset="Dataset", version_name="new_version", description="new desc",version_strategy=VersionStrategy.MANUAL,
                        is_starred=True, properties={"key":"value"}, metadata=[FileMetadata(name="File.txt", size=100)])
experiment.start()
experiment.complete()

You can see in the Vectice app that a new run has been created and a new dataset version called **new_version** of the dataset dataset_with_connection has been created and attached to the run.

#### List dataset versions

In [None]:
## You can get the list of your dataset versions by providing the dataset name or id
experiment.list_dataset_versions(dataset="Dataset")

#### Get dataset versions

In [None]:
## You can get a dataset version by providing only the version id
## If you specify the version name, the dataset name or id is required also.
version = experiment.get_dataset_version(version="new_version", dataset="Dataset")
version

##### Update dataset version

In [None]:
## You can update a dataset version's information by providing only the dataset version id
## If you specify the version name, the dataset name or id is required also.
experiment.update_dataset_version(version="new_version", dataset="Dataset", name="new_version2", description="New_desc", is_starred=True,
                           properties={"key": "new_value"})

You can see in the Vectice app that the dataset version has been updated.

#### Delete dataset_version

In [None]:
## You can delete a dataset version by providing only the version id
## If you specify the version name, the dataset name or id is required also.
experiment.delete_dataset_version(version="new_version2", dataset="Dataset")

You can see in the Vectice app that the dataset version has been deleted.

### Model versions

Experiment enables you to add, get, list, update and delete model version

In [None]:
## We initialize our experiment
#Only the project id is needed, but if you want to use the project name, the wokspace name or id are required.
exp = Experiment(job="experiment 2", project=project_id, job_type=JobType.TRAINING)

#### Add model version

We can create a model version and declare the version name, description, metrics, hyperparameters, status, the used algorithm, if the version should be starred or not (not starred by default), and attachments.

1- You can use the model name or id.

2- If you use the model name:

- If the model already exists in the project: It will be used in the experiment.

- If the model doesn't exist in the project: A new model will be created with the given name ane used in the experiment.

3- If you specify a version name:
  - If the version doesn't exist in the model: A new model version will be created with the given name and used in the experiment.
  - If the version exists: The version will be used in the experiment

4- If you don't specify a version name: A new version will be created and used in the experiment.

In [None]:
## Hyperparameters
hyper_parameters = {"key": 10, "key2": "value"}
## Metrics
metrics = {"key": 10, "key2": 100}
experiment.start()

experiment.add_model_version(model="Model", algorithm="algo", status=ModelVersionStatus.STAGING,
                          hyper_parameters=hyper_parameters, metrics=metrics)
## You can aslo add attachment if you want

experiment.complete()

You can see in the Vectice app that a new version of the model **Model** has been created and attached to the created run.

##### Log metrics

You can also log metrics to a new or existing model version

In [None]:
## You can log metrics to a model version by providing only the version id
## If you specify the version name, the model name or id is required also.
experiment.log_metrics(model_version="version 1", metrics={"key3": 1, "key4":2}, model="Model")

You can see in the Vectice app that the metrics have been successfully added to the version "version 1" of the model **Model**

##### Log hyperparameters

You can log hyper parameters to a new or existing model version

In [None]:
## You can log hyper parameters to a model version by providing only the version id
## If you specify the version name, the model name or id is required also.
experiment.log_hyper_parameters(model_version="version 1", hyper_parameters={"key3": "1", "key4":2}, model="Model")

You can see in the Vectice app that the hyper parameters have been successfully added to the version "version 1" of the model Model

#### Add model version attachments

You can also add attachments to a new or existing model version

In [None]:
## You can add attachments to a model version by providing only the version id
## If you specify the version name, the model name or id is required also.
experiment.add_model_version_attachment(file=[attachment], model_version="version 1", model="Model")

You can see in the Vectice app that the attachments have been successfully added to the version "version 1" of the model Model

#### Get model version

In [None]:
## You can get a model version by providing only the version id
## If you specify the version name, the model name or id is required also.
experiment.get_model_version(version="version 1", model="Model")

#### List model versions

In [None]:
## You can get a the list of model versions by providing the model name or id.
experiment.list_model_versions(model="Model")

#### List model version attachments, metrics and hyperparameters

- You can get the list of a model version attachments, metrics, and hyper patameters by providing only the version id

- If you specify the version name, the dataset name or id is required also.

In [None]:
experiment.list_model_version_attachments(model_version="version 1", model="Model")

In [None]:
experiment.list_metrics(model_version="version 1", model="Model")

In [None]:
experiment.list_hyper_parameters(model_version="version 1", model="Model")

#### List model versions dataframe

You can also get all the model versions you created in previous runs, for offline analysis and understanding in more details what's driving the models performance.

In [None]:
## You can get a dataframe of the a model by providing the model or id
experiment.list_model_versions_dataframe(model="Model")

#### Update model version

You can update your model version's name, description, metrics, hyperparameters, status, the used algorithm, if you want to star the version and attachments.

In [None]:
## You can update a model version's information by providing only the dataset version id
## If you specify the version name, the model name or id is required also.

## Updated Hyperparameters
new_hyper_parameters = {"key": 100, "key2": "value 2"}
## Updated metrics
new_metrics = {"key": 100, "key2": 1000}

experiment.update_model_version(version="version 1", model="Model", name="new version", description="new desc", status=ModelVersionStatus.PRODUCTION,
                         algorithm="new algo", is_starred=True, metrics=new_metrics, hyper_parameters=new_hyper_parameters)

#### Delete model version attachments, metrics and hyper parameters

-  You can delete a mode version attachments, metrics, and hyper parameters by providing only the version id
- If you specify the version name, the model name or id is required also.

In [None]:
experiment.delete_model_version_attachment(file=attachment, model_version="new version", model="Model")

In [None]:
experiment.delete_metrics(model_version="new version", model="Model", metrics=new_metrics)

In [None]:
experiment.delete_hyper_parameters(model_version="new version", model="Model", keys=["key"])

#### Delete model version

In [None]:
## You can delete a mode version by providing only the version id
## If you specify the version name, the model name or id is required also.
experiment.delete_model_version(version="new version", model="Model")

### Code versions

#### Create code version

Vectice enables you to track your source code by creating code versions. This can be done automatically and manually.

##### Creating a code version automatically


If you are using your local environment with GIT installed or JupyterLab etc... the code tracking can be automated by setting autocode=True when creating the Vectice instance.

##### Creating a code version manually

You can create a code version for your code present in GitHub, GitLab, or Bitbucket manually by using:

       **experiment.add_code_version_uri**

- For this you need to specify a git_uri composed of the github(or gitlab or bitbucket)/project/repository and an entrypoint composed of folder/file in git.

In [None]:
## Example with Github
## You can do the same things for you files in Gitlab or Bitbucket
experiment.add_code_version_uri(git_uri="https://github.com/vectice/vectice-examples", entrypoint="Quick_References/Experiment_Quick_Reference.ipynb")
experiment.start()
experiment.complete()

You can see in the Vectice application that a new run has been created and the code version has been attached to this run.

#### list_code_version

You can also get the list of the code versions created in you project by using **list_code_versions()**

In [None]:
experiment.vectice.list_code_versions()

#### Get code version

You can get a code version by using **get_code_version()**

In [None]:
experiment.get_code_version(code_version="Version 1")

#### Delete code version

You can delete a code version by using **delete.code_version()**

In [None]:
experiment.delete_code_version(code_version="Version 1")

### Run

- By default, every artifact added to the experiment before start() is considered as an _input_ of the job, and every artifact added after the start is considered as an _output
- You still can specify inputs, manually if you want, when you start your experiment and outputs when you complete it.
- The inputs can be code, dataset and model versions and the outputs can be dataset and model versions.


In [None]:
## We create a new experiment
experiment = Experiment(job="Experiment 3", project=project_id, job_type=JobType.TRAINING, auto_code=True)

#### Start() and complete()



By default, every artifact added to the experiment before start() is considered as an input of the job, and every artifact added after the start is considered as an _output

In [None]:
## We add a dataset version and a model version to the experiment before starting it
experiment.add_dataset_version(dataset="Dataset", version_strategy=VersionStrategy.AUTOMATIC)
experiment.add_code_version_uri(git_uri="https://github.com/vectice/vectice-examples", entrypoint="Quick_References/Experiment_Quick_Reference.ipynb")

In [None]:
## We start the experiment
experiment.start()

You can see in the Vectice app that a new run has been created with "Started" status and it has the added dataset and code versions attached as inputs.

In [None]:
## We add a model version to the experiment
hyper_parameters = {"key": 10, "key2": "value"}
## Metrics
metrics = {"key": 10, "key2": 100}
experiment.add_model_version(model="Model", algorithm="algorithm", status=ModelVersionStatus.STAGING,
                          hyper_parameters=hyper_parameters, metrics=metrics)

In [None]:
## We complete the experiment
experiment.complete()

You can see in the application that the run status has been updated to "Completed" and the added model version is attached as an output to the run.

You can also create all your artifacts before starting the experiment and specify manually if you want to use them a sinputs or outputs

In [None]:
## We add a dataset and code versions to the experiment and declare that we want to attach them as inputs of our experiment run
experiment.add_dataset_version(dataset="Dataset", version_strategy=VersionStrategy.AUTOMATIC, artifact_type=JobArtifactType.INPUT)
experiment.add_code_version_uri(git_uri="https://github.com/vectice/vectice-examples",
                                entrypoint="Quick_References/Experiment_Quick_Reference.ipynb", artifact_type=JobArtifactType.INPUT)

In [None]:
## We add a model version to the experiment and declare that we want to use it as output of our experiment run
experiment.add_model_version(model="Model", algorithm="algorithm", status=ModelVersionStatus.STAGING,
                          hyper_parameters=hyper_parameters, metrics=metrics, artifact_type=JobArtifactType.OUTPUT)

In [None]:
## We can start the experiment run
experiment.start()
## Do something...
## Complete the experiment
experiment.complete()

We can see in the Vectice app that a new run has been  created and it uses the specified inputs and outputs