## Creating a project from an existing dataset
In this notebook we'll use the `sc-api-tools` package to create a project from an existing dataset, and upload images and annotations to it.

### Setting up the connection to the platform
First, we set up the connection to the cluster. This is done by instantiating a client, with the hostname (or ip address) and login details for the cluster

In [1]:
from sc_api_tools import SCRESTClient

client = SCRESTClient(
    host='https://10.55.252.37/',
    username='admin@sc-project.intel.com',
    password='@SCAdmin'
)

Authenticating on host https://10.55.252.37/...
Authentication successful. Cookie received.


### Getting the COCO dataset
In the next cell, we get the path to the MS COCO dataset. 

If you already have the COCO dataset on your machine, please specify the `dataset_path` to point to the folder containing the dataset. 

If you don't have the dataset yet, the `get_coco_dataset` method will make an attempt to download the dataset. Even though it will only download the 2017 validation subset, this is still a ~1 Gb download so it may take some time, depending on your internet connection. 

Of course the data will only be downloaded once; if you have downloaded the dataset previously, the method should detect it and return the path to the data.

In [2]:
from sc_api_tools.demos import get_coco_dataset

COCO_PATH = get_coco_dataset(dataset_path=None)

COCO dataset (subset: val2017) found at path c:\users\ljcornel\pycharmprojects\frameworks.ai.interactive-ai-workflow.sonoma-creek-api-tools\data


### Reading the dataset
Next, we need to load the COCO dataset using Datumaro. The `sc-api-tools` packages provides the `DatumAnnotationReader` class to do so. It can read datasets in all formats supported by Datumaro.

In [3]:
from sc_api_tools.annotation_readers import DatumAnnotationReader

annotation_reader = DatumAnnotationReader(
    base_data_folder=COCO_PATH,
    annotation_format='coco'
)

Datumaro dataset consisting of 5000 items in coco format was loaded from c:\users\ljcornel\pycharmprojects\frameworks.ai.interactive-ai-workflow.sonoma-creek-api-tools\data
Datumaro dataset was created in 13.3 seconds


### Selecting the labels
The MS COCO dataset contains 80 different classes, and while we could create a project including all of them, for this demo we'll select only a couple of them. This is done using the `filter_dataset` method of the annotation reader.

In [4]:
annotation_reader.filter_dataset(labels=['dog', 'cat', 'horse'], criterion='OR')

After filtering, dataset with labels ['dog', 'cat', 'horse'] contains 473 items.


### Creating the project
Now that we have a selection of data we'd like to upload, we get to create the project. The COCO dataset is best suited for detection or segmentation type projects. 

To create the project, we'll be using a method `create_single_task_project_from_dataset` from the `client` that we set up previously. This will not only create the project, but also upload the media and annotations from our dataset. 

The project name and type can be set via their respective input parameters, `project_name` and `project_type`. The number of images that is uploaded and annotated can be controlled as well. Finally, if `enable_auto_train` is set to `True` the project will start training right after all annotations have been uploaded (provided that sufficient images have been annotated to trigger auto-training).

In [5]:
project = client.create_single_task_project_from_dataset(
    project_name='COCO demo project',
    project_type='detection',
    path_to_images=COCO_PATH,
    annotation_reader=annotation_reader,
    number_of_images_to_upload=100,
    number_of_images_to_annotate=90,
    enable_auto_train=False
)

Project created successfully.
Starting image upload...
Uploading... 100 images uploaded successfully.
Upload complete. Uploaded 100 new images in 21.3 seconds.
Annotations have been converted to boxes
Dataset is prepared for detection task.
Starting image annotation upload...
Upload complete. Uploaded 90 new image annotations


That's it! A new project named `COCO demo project` should now appear in your workspace. To check it's properties, we can print an overview of it in the cell below.

In [None]:
print(project.overview)

Project: COCO demo project
  Task 1: Detection task
    Labels: ['dog', 'cat', 'horse', 'No Object']

