# Creating a project from an existing dataset
In notebook [002 create project from dataset](002_create_project_from_dataset.ipynb) we saw how to create a single task project from an existing dataset, and upload images and annotations to it. 

In this notebook, we will expand on this and create a pipeline project with two subsequent tasks instead. We will annotate the tasks by grouping some of the classes in the dataset.

In [None]:
# As usual we will connect to the platform first, using the server details from the .env file

from geti_sdk import Geti
from geti_sdk.utils import get_server_details_from_env

geti_server_configuration = get_server_details_from_env()

geti = Geti(server_config=geti_server_configuration)

### Getting the COCO dataset
As we did in notebook [002](002_create_project_from_dataset.ipynb) before, we will use the MS COCO dataset. We get the path to it here

In [None]:
from geti_sdk.demos import get_coco_dataset

COCO_PATH = get_coco_dataset(dataset_path=None)

### Reading the dataset
Again as before, we will create an annotation reader to read the dataset. However, since in this case we will be annotating two different tasks, we will also need two annotation readers. Each will provide the annotations for one of the tasks in the pipeline, with different labels

In [None]:
from geti_sdk.annotation_readers import DatumAnnotationReader

annotation_reader_task_1 = DatumAnnotationReader(
    base_data_folder=COCO_PATH, annotation_format="coco"
)
annotation_reader_task_2 = DatumAnnotationReader(
    base_data_folder=COCO_PATH, annotation_format="coco"
)

### Selecting labels and project type
As before, we will use a subset of the COCO dataset for simplicity. Let's create a multi-task project of type: `detection_to_classification`. To do so, we have to prepare the annotation readers in a specific way.

Let's use a single label 'animal' for the detection task. The classification task will use the labels 'Domestic' and 'Wild', to try to discriminate these two groups of animals.

First, we will have to specify which animals we consider 'Domestic', and which 'Wild'. This is done in the cell below

In [None]:
domestic_animals = ["dog", "cat", "horse"]
wild_animals = ["elephant", "giraffe"]

#### Preparing the detection annotation reader
Now that we know the labels of interest, we can filter the dataset for the detection annotation reader and group the classes to a single 'animal' label.

In [None]:
all_labels = domestic_animals + wild_animals
annotation_reader_task_1.filter_dataset(labels=all_labels, criterion="OR")
annotation_reader_task_1.group_labels(labels_to_group=all_labels, group_name="animal")

#### Preparing the classification annotation reader
For the classification task, we apply the same filter but we group the classes differently, using the 'domestic' and 'wild' labels.

In [None]:
annotation_reader_task_2.filter_dataset(labels=all_labels, criterion="OR")
annotation_reader_task_2.group_labels(
    labels_to_group=domestic_animals, group_name="domestic"
)
annotation_reader_task_2.group_labels(labels_to_group=wild_animals, group_name="wild")

## Creating the project
Now that we have determined the project_type and prepared the annotation readers, we can use the `Geti` to create the project and upload the images and annotations.

The Geti class provides a convenience method `geti.create_task_chain_project_from_dataset` that creates the project and uploads the images and annotations. It is very similar to the method we used to create a project in notebook [002](002_create_project_from_dataset.ipynb), but instead of taking the parameter `annotation_reader`, it takes a `label_source_per_task` instead. This parameter should receive a list of label sources, one entry per task. For each task, the source can either be an annotation reader, a list of label names or a list of dictionaries specifying label properties, as we used at the end of notebook [001](001_create_project.ipynb). 

Passing lists of label names or properties can be useful if you do not have annotations available for one of the tasks in the pipeline, but you do know what labels you would like that task to have and you plan on annotating that tasks through the platform.

In [None]:
PROJECT_NAME = "COCO multitask animal demo"
PROJECT_TYPE = "detection_to_classification"

project = geti.create_task_chain_project_from_dataset(
    project_name=PROJECT_NAME,
    project_type=PROJECT_TYPE,
    path_to_images=COCO_PATH,
    label_source_per_task=[annotation_reader_task_1, annotation_reader_task_2],
    number_of_images_to_upload=100,
    number_of_images_to_annotate=90,
    enable_auto_train=True,
)

That's it! The project has been created and should now have started training the detection task. Let's have a look at the project summary, even though there should not be any surprises there at this point.

In [None]:
print(project.summary)