# Datasets

All dataset will have classes that read annotations. The dataset class needs to extend our custom `BaseDataset` class. 

The `BaseDataset` class has the following methods:

* `generate_dataset_statistics`: generates a summary of the dataset (e.g. number of images, number of annotations, etc.)
* `save_dataset_statistics`: saves the summary to a `json` file




## **0. Toy example**
- get dataset instance
- create index
- generate statistics
- save statistics


``` python

# 0. Toy example

from bases.example import Example

D = Example()

# generate and load stats
D.generate_dataset_statistics()

# save the stats
D.save_dataset_statistics(save_path = "./summaries",
                            dataset_name = None,
                            file_name = None
                            )

```

## **1. COCO dataset**

``` python
## 1. COCO dataset

from bases.coco import COCO
from pathlib import Path

coco_year = 2017
subset = f"instances_train{coco_year}"
annotiation_file = Path(f"./data/coco/{coco_year}/annotations/{subset}.json")
D = COCO(annotation_file=str(annotiation_file))

# generate and load stats
D.generate_dataset_statistics()

# save the stats
D.save_dataset_statistics(save_path = "./summaries",
                            dataset_name = f"coco",
                            file_name = f"{subset}_stats.json"
                            )
```

## **2. SkyData dataset**


```python
## 2. Skydata 

from bases.skydata import SkyData
from pathlib import Path

subset = f"train_DET"
annotiation_file = Path(f"./data/skydata/annotations/{subset}.json")
D = SkyData(annotation_file=str(annotiation_file))

# generate and load stats
D.generate_dataset_statistics()

# save the stats
D.save_dataset_statistics(save_path = "./summaries",
                            dataset_name = f"skydata",
                            file_name = f"{subset}_stats.json"
                            )
```


## **3. Visdrone dataset**


```python
## 3. Visdrone 

from bases.visdrone import VisDrone
from pathlib import Path

import os


converted_path = Path("./data/visdrone/converted_annotations")
annotation_file = converted_path / "visdrone_converted_to_coco_format.json"

# if annotation_file does not exist, convert the dataset
if not annotation_file.exists():

    try:
        #TODO this could be improved 
        # The script could take arguments 
        print("[INFO] Converting visdrone dataset to COCO format...")
        os.system("python ./scripts/conver_visdrone_to_coco_format.py")
        print("[INFO] Finished converting dataset to COCO format.")
    except:
        print("[ERROR] Could not convert dataset to COCO format.")
        exit(1)


D = VisDrone(annotation_file=str(annotation_file))

# generate and load stats
D.generate_dataset_statistics()

# save the stats
D.save_dataset_statistics(save_path = "./summaries",
                            dataset_name = f"visdrone",
                            file_name = f"{annotation_file.stem}_stats.json"
                            )
                            
```

In [1]:

## 3. Visdrone 

from bases.visdrone import VisDrone
from pathlib import Path

import os


converted_path = Path("./data/visdrone/converted_annotations")
annotation_file = converted_path / "visdrone_converted_to_coco_format.json"

# if annotation_file does not exist, convert the dataset
if not annotation_file.exists():

    try:
        #TODO this could be improved 
        # The script could take arguments 
        print("[INFO] Converting visdrone dataset to COCO format...")
        os.system("python ./scripts/conver_visdrone_to_coco_format.py")
        print("[INFO] Finished converting dataset to COCO format.")
    except:
        print("[ERROR] Could not convert dataset to COCO format.")
        exit(1)


D = VisDrone(annotation_file=str(annotation_file))

# generate and load stats
D.generate_dataset_statistics()

# save the stats
D.save_dataset_statistics(save_path = "./summaries",
                            dataset_name = f"visdrone",
                            file_name = f"{annotation_file.stem}_stats.json"
                            )
                            


loading annotations into memory...
Done (t=0.84s)
creating index...
index created!
[INFO] Generating dataset statistics for the VisDrone...
[INFO] Dataset statistics generated for the VisDrone.
[INFO] saving ...: visdrone_converted_to_coco_format_stats.json
[INFO] dataset statistics saved to: summaries/visdrone/visdrone_converted_to_coco_format_stats.json


## **4. Visdrone dataset**