diff --git a/docs/source/2x_user_guide.md b/docs/source/2x_user_guide.md index 941e80d6a39..7a00594308d 100644 --- a/docs/source/2x_user_guide.md +++ b/docs/source/2x_user_guide.md @@ -67,7 +67,7 @@ This part provides the advanced topics that help user dive deep into IntelĀ® Neu Add New Adaptor -Distillation for Quantization +Distillation SmoothQuant Weight-Only Quantization Layer-Wise Quantization diff --git a/docs/source/dataset.md b/docs/source/dataset.md deleted file mode 100644 index 0695d78a3ac..00000000000 --- a/docs/source/dataset.md +++ /dev/null @@ -1,165 +0,0 @@ -Dataset -======= - -1. [Introduction](#introduction) - -2. [Supported Framework Dataset Matrix](#supported-framework-dataset-matrix) - -3. [Get start with Dataset API](#get-start-with-dataset-api) - -4. [Examples](#examples) - -## Introduction - -To adapt to its internal dataloader API, IntelĀ® Neural Compressor implements some built-in datasets. - -A dataset is a container which holds all data that can be used by the dataloader, and have the ability to be fetched by index or created as an iterator. One can implement a specific dataset by inheriting from the Dataset class by implementing `__iter__` method or `__getitem__` method, while implementing `__getitem__` method, `__len__` method is recommended. - -Users can use Neural Compressor built-in dataset objects as well as register their own datasets. - -## Supported Framework Dataset Matrix - -#### TensorFlow - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageRecord(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/validation-000-of-100
root/validation-001-of-100
...
root/validation-099-of-100
The file name needs to follow this pattern: '* - * -of- *' | **In yaml file:**
dataset:
   ImageRecord:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageRecord'] (root=root, transform=transform, filter=None)
| -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORecord(root, num_cores, transform, filter) | **root** (str): Root directory of dataset
**num_cores** (int, default=28):The number of input Datasets to interleave from in parallel
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Root is a full path to tfrecord file, which contains the file name.
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORecord:
     root: /path/to/tfrecord
     num_cores: 28
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORecord'] (root, num_cores=28, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| COCONpy(root, npy_dir, anno_dir) | **root** (str): Root directory of dataset
**npy_dir** (str, default='val2017'): npy file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory | Please arrange data in this way:
/root/npy_dir/1.jpg.npy
/root/npy_dir/2.jpg.npy
...
/root/npy_dir/n.jpg.npy
/root/anno_dir
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     npy_dir: /path/to/npy
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCONpy'] (root, npy_dir, anno_dir)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| style_transfer(content_folder, style_folder, crop_ratio, resize_shape, image_format, transform, filter) | **content_folder** (str):Root directory of content images
**style_folder** (str):Root directory of style images
**crop_ratio** (float, default=0.1):cropped ratio to each side
**resize_shape** (tuple, default=(256, 256)):target size of image
**image_format** (str, default='jpg'): target image format
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Dataset used for style transfer task. This Dataset is to construct a dataset from two specific image holders representing content image folder and style image folder. | **In yaml file:**
dataset:
   style_transfer:
     content_folder: /path/to/content_folder
     style_folder: /path/to/style_folder
     crop_ratio: 0.1
     resize_shape: [256, 256]
     image_format: 'jpg'
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['style_transfer'] (content_folder, style_folder, crop_ratio, resize_shape, image_format, transform=transform, filter=None) | -| TFRecordDataset(root, transform, filter) | **root** (str): filename of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions |Root is a full path to tfrecord file, which contains the file name. | **In yaml file:**
dataset:
   TFRecordDataset:
     root: /path/to/tfrecord
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['TFRecordDataset'] (root, transform=transform) | -| bert(root, label_file, task, transform, filter) | **root** (str): path of dataset
**label_file** (str): path of label file
**task** (str, default='squad'): task type of model
**model_type** (str, default='bert'): model type, support 'bert'.
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset supports tfrecord data, please refer to [Guide](../examples/tensorflow/nlp/bert_large_squad/quantization/ptq/README.md) to create tfrecord file first. | **In yaml file:**
dataset:
   bert:
     root: /path/to/root
     label_file: /path/to/label_file
     task: squad
     model_type: bert
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (root, label_file, transform=transform) | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -#### PyTorch - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size>1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| bert(dataset, task, model_type, transform, filter) | **dataset** (list): list of data
**task** (str): the task of the model, support "classifier", "squad"
**model_type** (str, default='bert'): model type, support 'distilbert', 'bert', 'xlnet', 'xlm'
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This Dataset is to construct from the Bert TensorDataset and not a full implementation from yaml config. The original repo link is: https://github.com/huggingface/transformers. When you want use this Dataset, you should add it before you initialize your DataLoader. | **In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (dataset, task, model_type, transform=transform, filter=None)
Now not support yaml implementation | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -#### MXNet - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size > 1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -#### ONNXRT - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
***Please use Resize transform when batch_size > 1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| GLUE(data_dir, model_name_or_path, max_seq_length, do_lower_case, task, model_type, dynamic_length, evaluate, transform, filter) | **data_dir** (str): The input data dir
**model_name_or_path** (str): Path to pre-trained student model or shortcut name,
**max_seq_length** (int, default=128): The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.
**do_lower_case** (bool, default=True): Whether or not to lowercase the input.
**task** (bool, default=True): The name of the task to fine-tune. Choices include mrpc, qqp, qnli, rte, sts-b, cola, mnli, wnli.
**model_type** (str, default='bert'): model type, support 'distilbert', 'bert', 'mobilebert', 'roberta'.
**dynamic_length** (bool, default=False): Whether to use fixed sequence length.
**evaluate** (bool, default=True): Whether do evaluation or training.
**transform** (bool, default=True): If true,
**filter** (bool, default=True): If true, | Refer to [this example](/examples/onnxrt/language_translation/bert) on how to prepare dataset | **In yaml file:**
dataset:
   bert:
     data_dir: False
     model_name_or_path: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (data_dir='/path/to/data/', model_name_or_path='bert-base-uncased', max_seq_length=128, task='mrpc', model_type='bert', dynamic_length=True, transform=None, filter=None) | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -## Get start with Dataset API - -### Config dataloader in a yaml file - -```yaml -quantization: - approach: post_training_static_quant - calibration: - dataloader: - dataset: - COCORaw: - root: /path/to/calibration/dataset - filter: - LabelBalance: - size: 1 - transform: - Resize: - size: 300 - -evaluation: - accuracy: - metric: - ... - dataloader: - batch_size: 16 - dataset: - COCORaw: - root: /path/to/evaluation/dataset - transform: - Resize: - size: 300 - performance: - dataloader: - batch_size: 16 - dataset: - dummy_v2: - input_shape: [224, 224, 3] -``` - -## User-specific dataset - -Users can register their own datasets as follows: - -```python -class Dataset(object): - def __init__(self, args): - # init code here - - def __getitem__(self, idx): - # use idx to get data and label - return data, label - - def __len__(self): - return len - -``` - -After defining the dataset class, pass it to the quantizer: - -```python -from neural_compressor.experimental import Quantization, common - -quantizer = Quantization(yaml_file) -quantizer.calib_dataloader = common.DataLoader( - dataset -) # user can pass more optional args to dataloader such as batch_size and collate_fn -quantizer.model = graph -quantizer.eval_func = eval_func -q_model = quantizer.fit() -``` - -## Examples - -- Refer to this [example](https://github.com/intel/neural-compressor/tree/v1.14.2/examples/onnxrt/object_detection/onnx_model_zoo/DUC/quantization/ptq) to learn how to define a customised dataset. - -- Refer to this [HelloWorld example](/examples/helloworld/tf_example6) to learn how to configure a built-in dataset. diff --git a/docs/source/distillation_quantization.md b/docs/source/distillation_quantization.md deleted file mode 100644 index bf1894f5c86..00000000000 --- a/docs/source/distillation_quantization.md +++ /dev/null @@ -1,88 +0,0 @@ -Distillation for Quantization -============ - -1. [Introduction](#introduction) - - -2. [Distillation for Quantization Support Matrix](#distillation-for-quantization-support-matrix) - - -3. [Get Started with Distillation for Quantization API](#get-started-with-api) - - -4. [Examples](#examples) - - - -### Introduction - -Distillation and quantization are both promising methods to reduce the computational and memory footprint that huge transformer-based networks require. Quantization refers to a process of reducing the bit precision for both activations and weights. Distillation method transfers knowledge from a heavy teacher model to a light one (student) and it could be used as a performance-booster in lower-bits quantizations. Quantization-aware training recovers accuracy degradation from representation loss in the retraining process and typically provides better performance compared to post-training quantization. -Intel provides a quantization-aware training (QAT) method that incorporates a novel layer-by-layer knowledge distillation step for INT8 quantization pipelines. - - - -### Distillation for Quantization Support Matrix - -|**Algorithm** |**PyTorch** |**TensorFlow (Deprecated)** | -|---------------------------------|:--------:|:---------:| -|Distillation for Quantization |✔ |✖ | - - - -### Get Started with Distillation for Quantization API - -User can pass the customized training/evaluation functions to `Distillation` for quantization tasks. In this case, distillation process can be done by pre-defined hooks in Neural Compressor. Users could place those hooks inside the quantization training function. - -Neural Compressor defines several hooks for user pass - -``` -on_train_begin() : Hook executed before training begins -on_after_compute_loss(input, student_output, student_loss) : Hook executed after each batch inference of student model -on_epoch_end() : Hook executed at each epoch end -``` - -Following section illustrates how to use hooks in user pass-in training function: - -```python -def training_func_for_nc(model): - compression_manager.on_train_begin() - for epoch in range(epochs): - compression_manager.on_epoch_begin(epoch) - for i, batch in enumerate(dataloader): - compression_manager.on_step_begin(i) - ...... - output = model(batch) - loss = output.loss - loss = compression_manager.on_after_compute_loss(batch, output, loss) - loss.backward() - compression_manager.on_before_optimizer_step() - optimizer.step() - compression_manager.on_step_end() - compression_manager.on_epoch_end() - compression_manager.on_train_end() -... -``` - -In this case, the launcher code is like the following: - -```python -from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig -from neural_compressor import QuantizationAwareTrainingConfig -from neural_compressor.training import prepare_compression - -combs = [] -distillation_criterion = KnowledgeDistillationLossConfig() -d_conf = DistillationConfig(teacher_model=teacher_model, criterion=distillation_criterion) -combs.append(d_conf) -q_conf = QuantizationAwareTrainingConfig() -combs.append(q_conf) -compression_manager = prepare_compression(model, combs) -model = compression_manager.model - -model = training_func_for_nc(model) -eval_func(model) -``` - -### Examples - -For examples of distillation for quantization, please refer to [distillation-for-quantization examples](../../examples/pytorch/nlp/huggingface_models/text-classification/optimization_pipeline/distillation_for_quantization/fx/README.md) diff --git a/docs/source/pythonic_style.md b/docs/source/pythonic_style.md deleted file mode 100644 index d036e9775d5..00000000000 --- a/docs/source/pythonic_style.md +++ /dev/null @@ -1,146 +0,0 @@ -Pythonic Style Access for Configurations -==== - -1. [Introduction](#introduction) -2. [Supported Feature Matrix](#supported-feature-matrix) -3. [Get Started with Pythonic API for Configurations](#get-started-with-pythonic-api-for-configurations) - -## Introduction -To meet the variety of needs arising from various circumstances, INC now provides a -pythonic style access - Pythonic API - for same purpose of either user or framework configurations. - -The Pythonic API for Configuration allows users to specify configurations -directly in their python codes without referring to -a separate YAML file. While we support both simultaneously, -the Pythonic API for Configurations has several advantages over YAML files, -which one can tell from usages in the context below. Hence, we recommend -users to use the Pythonic API for Configurations moving forward. - -## Supported Feature Matrix - -### Pythonic API for User Configurations -| Optimization Techniques | Pythonic API | -|-------------------------|:------------:| -| Quantization | ✔ | -| Pruning | ✔ | -| Distillation | ✔ | -| NAS | ✔ | -### Pythonic API for Framework Configurations - -| Framework | Pythonic API | -|------------|:------------:| -| TensorFlow | ✔ | -| PyTorch | ✔ | -| ONNX | ✔ | -| MXNet | ✔ | - -## Get Started with Pythonic API for Configurations - -### Pythonic API for User Configurations -Now, let's go through the Pythonic API for Configurations in the order of -sections similar as in user YAML files. - -#### Quantization - -To specify quantization configurations, users can use the following -Pythonic API step by step. - -* First, load the ***config*** module -```python -from neural_compressor import config -``` -* Next, assign values to the attributes of *config.quantization* to use specific configurations, and pass the config to *Quantization* API. -```python -config.quantization.inputs = ["image"] # list of str -config.quantization.outputs = ["out"] # list of str -config.quantization.backend = "onnxrt_integerops" # support tensorflow, tensorflow_itex, pytorch, pytorch_ipex, pytorch_fx, onnxrt_qlinearops, onnxrt_integerops, onnxrt_qdq, onnxrt_qoperator, mxnet -config.quantization.approach = "post_training_dynamic_quant" # support post_training_static_quant, post_training_dynamic_quant, quant_aware_training -config.quantization.device = "cpu" # support cpu, gpu -config.quantization.op_type_dict = {"Conv": {"weight": {"dtype": ["fp32"]}, "activation": {"dtype": ["fp32"]}}} # dict -config.quantization.strategy = "mse" # support basic, mse, bayesian, random, exhaustive -config.quantization.objective = "accuracy" # support performance, accuracy, modelsize, footprint -config.quantization.timeout = 100 # int, default is 0 -config.quantization.accuracy_criterion.relative = 0.5 # float, default is 0.01 -config.quantization.reduce_range = ( - False # bool. default value depends on hardware, True if cpu supports VNNI instruction, otherwise is False -) -config.quantization.use_bf16 = False # bool -from neural_compressor.experimental import Quantization - -quantizer = Quantization(config) -``` - -#### Distillation -To specify distillation configurations, users can assign values to -the corresponding attributes. -```python -from neural_compressor import config - -config.distillation.optimizer = {"SGD": {"learning_rate": 0.0001}} - -from neural_compressor.experimental import Distillation - -distiller = Distillation(config) -``` -#### Pruning -To specify pruning configurations, users can assign values to the corresponding attributes. -```python -from neural_compressor import config - -config.pruning.weight_compression.initial_sparsity = 0.0 -config.pruning.weight_compression.target_sparsity = 0.9 -config.pruning.weight_compression.max_sparsity_ratio_per_layer = 0.98 -config.pruning.weight_compression.prune_type = "basic_magnitude" -config.pruning.weight_compression.start_epoch = 0 -config.pruning.weight_compression.end_epoch = 3 -config.pruning.weight_compression.start_step = 0 -config.pruning.weight_compression.end_step = 0 -config.pruning.weight_compression.update_frequency = 1.0 -config.pruning.weight_compression.update_frequency_on_step = 1 -config.pruning.weight_compression.prune_domain = "global" -config.pruning.weight_compression.pattern = "tile_pattern_1x1" - -from neural_compressor.experimental import Pruning - -prune = Pruning(config) -``` -#### NAS -To specify nas configurations, users can assign values to the -corresponding attributes. - -```python -from neural_compressor import config - -config.nas.approach = "dynas" -from neural_compressor.experimental import NAS - -nas = NAS(config) -``` - - -#### Benchmark -To specify benchmark configurations, users can assign values to the -corresponding attributes. -```python -from neural_compressor import config - -config.benchmark.warmup = 10 -config.benchmark.iteration = 10 -config.benchmark.cores_per_instance = 10 -config.benchmark.num_of_instance = 10 -config.benchmark.inter_num_of_threads = 10 -config.benchmark.intra_num_of_threads = 10 - -from neural_compressor.experimental import Benchmark - -benchmark = Benchmark(config) -``` -### Pythonic API for Framework Configurations -Now, let's go through the Pythonic API for Configurations in setting up similar framework -capabilities as in YAML files. Users can specify a framework's (eg. ONNX Runtime) capability by -assigning values to corresponding attributes. - -```python -config.onnxruntime.precisions = ["int8", "uint8"] -config.onnxruntime.graph_optimization_level = "DISABLE_ALL" # only onnxruntime has graph_optimization_level attribute -```