# Fiftyone Dataset Preparation
---



# Dataset

The training process will use Cityscapes dataset from [Cityscape](www.cityscapes-dataset.com). CamVid is an optional dataset for training but for this one I will use Cityscapes.

![city-scape-sample](https://www.cityscapes-dataset.com/wordpress/wp-content/uploads/2015/07/zuerich00.png)

## Download

Plase follow the instruction to [download](https://www.cityscapes-dataset.com/login/) the dataset.

Since we're using Fiftyone, it requires the following sets:
- leftImg8bit_trainvaltest.zip
- gtFine_trainvaltest.zip (optional)
- gtCoarse.zip (optional)
- gtBbox_cityPersons_trainval.zip (optional)

Put it in a folder `datasets/cityscapes/raw/`
```
datasets/
  - cityscapes
    - raw/
      - leftImg8bit_trainvaltest.zip
      - gtFine_trainvaltest.zip
    - cityscape_fo_image-segmentation/
    - cityscape_fo_coco-detection/
```

In [None]:
# Run the following command to create directories
!mkdir -p data/datasets/cityscapes/raw/

DATASET_DIR = "data/datasets/cityscapes/"
FO_SEGMENTATION_DIR = "data/datasets/cityscapes/cityscape_fo_image-segmentation"
FO_COCO_DIR = "data/content/datasets/cityscapes/cityscape_fo_coco-detection"

In [None]:
!wget --keep-session-cookies --save-cookies=cookies.txt --post-data 'username=username&password=password&submit=Login' https://www.cityscapes-dataset.com/login/

# leftImg8bit_trainvaltest.zip
!wget --load-cookies cookies.txt --content-disposition https://www.cityscapes-dataset.com/file-handling/?packageID=3 -P data/datasets/cityscapes/raw/

# gtFine_trainvaltest.zip
!wget --load-cookies cookies.txt --content-disposition https://www.cityscapes-dataset.com/file-handling/?packageID=1 -P data/datasets/cityscapes/raw/

## Fiftyone Preparation


---

We'll use [Fiftyone](https://voxel51.com/fiftyone/) to generate different dataset format. S

In [None]:
# install fiftyone
!pip install fiftyone --no-cache-dir
!pip install --upgrade opencv-python opencv-python-headless --no-cache-dir

### Generate dataset for training and validation

The following script will generate fiftyone's ImageSegmentationDirectory and COCODetectionDataset. Please read the full [documentation](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html)

In [None]:
import os
import fiftyone as fo
from fiftyone import ViewField as F
import fiftyone.zoo as foz

classes = ["road", "sidewalk", "building", "wall", "fence", "pole", "traffic_light", "traffic_sign",
           "vegetation", "terrain", "sky", "person", "rider", "car", "truck", "bus", "train", "motorcycle", "bicycle"]
class_map = dict(zip(range(19), classes))


for split in ["train", "validation", "test"]:
    dataset = foz.load_zoo_dataset(
        "cityscapes",
        split=split,
        source_dir=os.path.join(DATASET_DIR, "raw"),
        dataset_dir=os.path.join(DATASET_DIR, "fiftyone_cityscape"),
    )

    match = F("label").is_in(classes)
    if split != "test":
        matching_view = dataset.match(
            F("gt_fine.polylines").filter(match).length() > 0
        )
    else:
        matching_view = dataset

    # Generate ImageSegmentationDirectory format
    matching_view.export(
        dataset_type=fo.types.ImageSegmentationDirectory,
        export_dir=FO_SEGMENTATION_DIR,
        data_path=f"data_{split}/",
        labels_path=f"labels_{split}/",
        label_field="gt_fine",
        export_media="symlink",
        mask_targets=class_map)

    # Generate COCODetectionDataset format
    matching_view.export(
        export_dir=FO_COCO_DIR,
        dataset_type=fo.types.COCODetectionDataset,
        labels_path=f"labels/{split}.json",
        label_field="gt_fine",
        export_media="symlink",
        classes=classes,
    )


### Preview dataset

In [None]:
session = fo.launch_app(dataset)
session.view = dataset.take(100)