# Datasets

All dataset will have classes that read annotations. The dataset class needs to extend our custom `BaseDataset` class. 

The `BaseDataset` class has the following methods:

* `generate_dataset_statistics`: generates a summary of the dataset (e.g. number of images, number of annotations, etc.)
* `save_dataset_statistics`: saves the summary to a `json` file




## **0. Toy example**
- get dataset instance
- create index
- generate statistics
- save statistics


``` python

# 0. Toy example

from bases.example import Example

D = Example()

# generate and load stats
D.generate_dataset_statistics()

# save the stats
D.save_dataset_statistics(save_path = "./summaries",
                            dataset_name = None,
                            file_name = None
                            )

```

## **1. COCO dataset**

``` python
## 1. COCO dataset

from bases.coco import COCO
from pathlib import Path

coco_year = 2017
subset = f"instances_train{coco_year}"
annotiation_file = Path(f"./data/coco/{coco_year}/annotations/{subset}.json")
D = COCO(annotation_file=str(annotiation_file))

# generate and load stats
D.generate_dataset_statistics()

# save the stats
D.save_dataset_statistics(save_path = "./summaries",
                            dataset_name = f"coco",
                            file_name = f"{subset}_stats.json"
                            )
```

```
loading annotations into memory...
Done (t=6.79s)
creating index...
index created!
[INFO] Generating dataset statistics for the COCO...
description: COCO 2017 Dataset
url: http://cocodataset.org
version: 1.0
year: 2017
contributor: COCO Consortium
date_created: 2017/09/01
[INFO] Dataset statistics generated for the COCO.
[INFO] saving ...: instances_train2017_stats.json
[INFO] dataset statistics saved to: summaries/coco/instances_train2017_stats.json
```

In [2]:
D.dataset["categories"]

[{'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'},
 {'supercategory': 'vehicle', 'id': 3, 'name': 'car'},
 {'supercategory': 'vehicle', 'id': 4, 'name': 'motorcycle'},
 {'supercategory': 'vehicle', 'id': 5, 'name': 'airplane'},
 {'supercategory': 'vehicle', 'id': 6, 'name': 'bus'},
 {'supercategory': 'vehicle', 'id': 7, 'name': 'train'},
 {'supercategory': 'vehicle', 'id': 8, 'name': 'truck'},
 {'supercategory': 'vehicle', 'id': 9, 'name': 'boat'},
 {'supercategory': 'outdoor', 'id': 10, 'name': 'traffic light'},
 {'supercategory': 'outdoor', 'id': 11, 'name': 'fire hydrant'},
 {'supercategory': 'outdoor', 'id': 13, 'name': 'stop sign'},
 {'supercategory': 'outdoor', 'id': 14, 'name': 'parking meter'},
 {'supercategory': 'outdoor', 'id': 15, 'name': 'bench'},
 {'supercategory': 'animal', 'id': 16, 'name': 'bird'},
 {'supercategory': 'animal', 'id': 17, 'name': 'cat'},
 {'supercategory': 'animal', 'id': 18, 'name': 'dog'},

In [3]:
i=[None,None,None]

In [4]:
i

[None, None, None]