### Merge Multiple Datasets for classification
Datumaro supports merging multiple datasets into single dataset.

In this document, we import 'mnist' and 'mnist_csv' datasets.
Note that their data format is the same as mnist but the data of former is stored as pickle and the latter as csv.
Then, we export the merged dataset to single dataset.

In [None]:
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

import datumaro as dm
from datumaro.components.operations import compute_image_statistics

We export sample mnist datasets separately.
Note that the data format of them are the same.

In [None]:
dataset_mnist = dm.Dataset.import_from("../tests/assets/mnist_dataset", format="mnist")
dataset_mnist_csv = dm.Dataset.import_from("../tests/assets/mnist_csv_dataset", format="mnist_csv")

If the data formats are the same, we call 'Dataset.from_extractors' to merge them into the one.

In [None]:
dataset = dm.Dataset.from_extractors(dataset_mnist, dataset_mnist_csv)

print("statistics for the merged mnist dataset")
compute_image_statistics(dataset)

statistics for the merged mnist dataset


{'dataset': {'images count': 5,
  'unique images count': 1,
  'repeated images count': 1,
  'repeated images': [[('0', 'test'),
    ('0', 'train'),
    ('1', 'test'),
    ('1', 'train'),
    ('2', 'test')]]},
 'subsets': {'train': {'images count': 2,
   'image mean': [0.9999999999999964, 0.9999999999999964, 0.9999999999999964],
   'image std': [0.0, 0.0, 0.0]},
  'test': {'images count': 3,
   'image mean': [0.9999999999999964, 0.9999999999999964, 0.9999999999999964],
   'image std': [0.0, 0.0, 0.0]}}}

Then, we can export the merged dataset to a data format you want.

In [None]:
dataset.export("merged_dataset", "cifar")
!ls merged_dataset

['batches.meta', 'test', 'train']