Skip to content

Commit

Permalink
[Docs] Translate data transform docs. (#737)
Browse files Browse the repository at this point in the history
* [Docs] Translate data transform tutorial and migration docs.

* Update according to comments

* Update image link
  • Loading branch information
mzr1996 committed Dec 7, 2022
1 parent fe26c65 commit 6a3028c
Show file tree
Hide file tree
Showing 6 changed files with 340 additions and 30 deletions.
155 changes: 154 additions & 1 deletion docs/en/advanced_tutorials/data_transform.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,156 @@
# Data transform

Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/data_transform.html).
In the OpenMMLab repositories, dataset construction and data preparation are decoupled from each other.
Usually, the dataset construction only parses the dataset and records the basic information of each sample,
while the data preparation is performed by a series of data transforms, such as data loading, preprocessing,
and formatting based on the basic information of the samples.

## To use Data Transforms

In MMEngine, we use various callable data transforms classes to perform data manipulation. These data
transformation classes can accept several configuration parameters for instantiation and then process the
input data dictionary by calling. Also, all data transforms accept a dictionary as input and output the
processed data as a dictionary. A simple example is as belows:

```{note}
In MMEngine, we don't have the implementations of data transforms. you can find the base data transform class
and many other data transforms in MMCV. So you need to install MMCV before learning this tutorial, see the
{external+mmcv:doc}`MMCV installation guild <get_started/installation>`.
```

```python
>>> import numpy as np
>>> from mmcv.transforms import Resize
>>>
>>> transform = Resize(scale=(224, 224))
>>> data_dict = {'img': np.random.rand(256, 256, 3)}
>>> data_dict = transform(data_dict)
>>> print(data_dict['img'].shape)
(224, 224, 3)
```

## To use in Config Files

In config files, we can compose multiple data transforms as a list, called a data pipeline. And the data
pipeline is an argument of the dataset.

Usually, a data pipeline consists of the following parts:

1. Data loading, use [`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) to load image files.
2. Label loading, use [`LoadAnnotations`](mmcv.transforms.LoadAnnotations) to load the bboxes, semantic segmentation and keypoint annotations.
3. Data processing and augmentation, like [`RandomResize`](mmcv.transforms.RandomResize).
4. Data formatting, we use different data transforms for different tasks. And the data transform for specified
task is implemented in the corresponding repository. For example, the data formatting transform for image
classification task is `PackClsInputs` and it's in MMClassification.

Here, taking the classification task as an example, we show a typical data pipeline in the figure below. For
each sample, the basic information stored in the dataset is a dictionary as shown on the far left side of the
figure, after which, every blue block represents a data transform, and in every data transform, we add some new fields (marked in green) or update some existing fields (marked in orange) in the data dictionary.

<div align=center>
<img src="https://user-images.githubusercontent.com/26739999/206081993-d5351151-466c-4b13-bf6d-9441c0c896c8.png" width="90%" style="background-color: white;padding: 10px;"/>
</div>

If want to use the above data pipeline in our config file, use the below settings:

```python
test_dataloader = dict(
batch_size=32,
dataset=dict(
type='ImageNet',
data_root='data/imagenet',
pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', size=256, keep_ratio=True),
dict(type='CenterCrop', crop_size=224),
dict(type='PackClsInputs'),
]
)
)
```

## Common Data Transforms

According to the functionality, the data transform classes can be divided into data loading, data
pre-processing & augmentation and data formatting.

### Data Loading

To support loading large-scale dataset, usually we won't load all dense data during dataset construction, but
only load the file path of these data. Therefore, we need to load these data in the data pipeline.

| Data Transforms | Functionality |
| :------------------------------------------------------: | :-----------------------------------------------------------------------------------: |
| [`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) | Load images according to the path. |
| [`LoadAnnotations`](mmcv.transforms.LoadImageFromFile) | Load and format annotations information, including bbox, segmentation map and others. |

### Data Pre-processing & Augmentation

Data transforms for pre-processing and augmentation usually manipulate the image and annotation data, like
cropping, padding, resizing and others.

| Data Transforms | Functionality |
| :--------------------------------------------------------: | :------------------------------------------------------------: |
| [`Pad`](mmcv.transforms.Pad) | Pad the margin of images. |
| [`CenterCrop`](mmcv.transforms.CenterCrop) | Crop the image and keep the center part. |
| [`Normalize`](mmcv.transforms.Normalize) | Normalize the image pixels. |
| [`Resize`](mmcv.transforms.Resize) | Resize images to the specified scale or ratio. |
| [`RandomResize`](mmcv.transforms.RandomResize) | Resize images to a random scale in the specified range. |
| [`RandomChoiceResize`](mmcv.transforms.RandomChoiceResize) | Resize images to a random scale from several specified scales. |
| [`RandomGrayscale`](mmcv.transforms.RandomGrayscale) | Randomly grayscale images. |
| [`RandomFlip`](mmcv.transforms.RandomFlip) | Randomly flip images. |

### Data Formatting

Data formatting transforms will convert the data to some specified type.

| Data Transforms | Functionality |
| :----------------------------------------------: | :---------------------------------------------------: |
| [`ToTensor`](mmcv.transforms.ToTensor) | Convert the data of specified field to `torch.Tensor` |
| [`ImageToTensor`](mmcv.transforms.ImageToTensor) | Convert images to `torch.Tensor` in PyTorch format. |

## Custom Data Transform Classes

To implement a new data transform class, the class needs to inherit `BaseTransform` and implement `transform`
method. Here, we use a simple flip transforms (`MyFlip`) as example:

```python
import random
import mmcv
from mmcv.transforms import BaseTransform, TRANSFORMS

@TRANSFORMS.register_module()
class MyFlip(BaseTransform):
def __init__(self, direction: str):
super().__init__()
self.direction = direction

def transform(self, results: dict) -> dict:
img = results['img']
results['img'] = mmcv.imflip(img, direction=self.direction)
return results
```

Then, we can instantiate a `MyFlip` object and use it to process our data dictionary.

```python
import numpy as np

transform = MyFlip(direction='horizontal')
data_dict = {'img': np.random.rand(224, 224, 3)}
data_dict = transform(data_dict)
processed_img = data_dict['img']
```

Or, use it in the data pipeline by modifying our config file:

```python
pipeline = [
...
dict(type='MyFlip', direction='horizontal'),
...
]
```

Please note that to use the class in our config file, we need to confirm the `MyFlip` class will be imported
during running.
3 changes: 1 addition & 2 deletions docs/en/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx.ext.autosectionlabel',
'sphinx_markdown_tables',
'myst_parser',
'sphinx_copybutton',
'sphinx.ext.autodoc.typehints',
Expand All @@ -58,7 +57,7 @@
'python': ('https://docs.python.org/3', None),
'numpy': ('https://numpy.org/doc/stable', None),
'torch': ('https://pytorch.org/docs/stable/', None),
'mmcv': ('https://mmcv.readthedocs.io/en/dev-2.x/', None),
'mmcv': ('https://mmcv.readthedocs.io/en/2.x/', None),
}

# Add any paths that contain templates here, relative to this directory.
Expand Down
161 changes: 160 additions & 1 deletion docs/en/migration/transform.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,162 @@
# Migrate Data Transform to OpenMMLab 2.0

Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/migration/transform.html).
## Introduction

According to the data transform interface convention of TorchVision, all data transform classes need to
implement the `__call__` method. And in the convention of OpenMMLab 1.0, we require the input and output of
the `__call__` method should be a dictionary.

In OpenMMLab 2.0, to make the data transform classes more extensible, we use `transform` method instead of
`__call__` method to implement data transformation, and all data transform classes should inherit the
[`mmcv.transforms.BaseTransfrom`](mmcv.transforms.BaseTransfrom) class. And you can still use these data
transform classes by calling.

A tutorial to implement a data transform class can be found in the [Data Transform](../advanced_tutorials/data_element.md).

In addition, we move some common data transform classes from every repositories to MMCV, and in this document,
we will compare the functionalities, usages and implementations between the original data transform classes (in [MMClassification v0.23.2](https://github.com/open-mmlab/mmclassification/tree/v0.23.2), [MMDetection v2.25.1](https://github.com/open-mmlab/mmdetection/tree/v2.25.1)) and the new data transform classes (in [MMCV v2.0.0rc1](https://github.com/open-mmlab/mmcv/tree/2.x))

## Functionality Differences

<table class="colwidths-auto docutils align-default">
<thead>
<tr>
<th></th>
<th>MMClassification (original)</th>
<th>MMDetection (original)</th>
<th>MMCV (new)</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>LoadImageFromFile</code></td>
<td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading.</td>
<td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support
specifying the order of channels.</td>
<td>Load images from 'img_path'. Support ignoring failed loading and specifying decode backend.</td>
</tr>
<tr>
<td><code>LoadAnnotations</code></td>
<td>Not available.</td>
<td>Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system.</td>
<td>Load bbox, label, mask (not include polygon masks), semantic segmentation.</td>
</tr>
<tr>
<td><code>Pad</code></td>
<td>Pad all images in the "img_fields" field.</td>
<td>Pad all images in the "img_fields" field. Support padding to integer multiple size.</td>
<td>Pad the image in the "img" field. Support padding to integer multiple size.</td>
</tr>
<tr>
<td><code>CenterCrop</code></td>
<td>Crop all images in the "img_fields" field. Support cropping as EfficientNet style.</td>
<td>Not available.</td>
<td>Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image.</td>
</tr>
<tr>
<td><code>Normalize</code></td>
<td>Normalize the image.</td>
<td>No differences.</td>
<td>No differences, but we recommend to use <a href="../tutorials/model.html#datapreprocessor">data preprocessor</a> to normalize the image.</td>
</tr>
<tr>
<td><code>Resize</code></td>
<td>Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge.</td>
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, the <code>img_scale</code> have a single scale, and <code>multiscale_mode="value"</code>.</td>
<td>Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally.</td>
</tr>
<tr>
<td><code>RandomResize</code></td>
<td>Not available</td>
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have two scales and <code>multiscale_mode="range"</code>, or <code>ratio_range</code> is not None.
<pre>Resize(
img_sacle=[(640, 480), (960, 720)],
mode="range",
)</pre>
</td>
<td>Have the same resize function as <code>Resize</code>. Support sampling the scale from a scale range or scale ratio range.
<pre>RandomResize(scale=[(640, 480), (960, 720)])</pre>
</td>
</tr>
<tr>
<td><code>RandomChoiceResize</code></td>
<td>Not available</td>
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have multiple scales, and <code>multiscale_mode="value"</code>.
<pre>Resize(
img_sacle=[(640, 480), (960, 720)],
mode="value",
)</pre>
</td>
<td>Have the same resize function as <code>Resize</code>. Support randomly choosing the scale from multiple scales or multiple scale ratios.
<pre>RandomChoiceResize(scales=[(640, 480), (960, 720)])</pre>
</td>
</tr>
<tr>
<td><code>RandomGrayscale</code></td>
<td>Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale.</td>
<td>Not available</td>
<td>Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale.</td>
</tr>
<tr>
<td><code>RandomFlip</code></td>
<td>Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically.</td>
<td>Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td>
<td>Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td>
</tr>
<tr>
<td><code>MultiScaleFlipAug</code></td>
<td>Not available</td>
<td>Used for test-time-augmentation.</td>
<td>Use <code><a href="https://mmcv.readthedocs.io/en/2.x/api/generated/mmcv.transforms.TestTimeAug.html">TestTimeAug</a></code></td>
</tr>
<tr>
<td><code>ToTensor</code></td>
<td>Convert the values in the specified fields to <code>torch.Tensor</code>.</td>
<td>No differences</td>
<td>No differences</td>
</tr>
<tr>
<td><code>ImageToTensor</code></td>
<td>Convert the values in the specified fields to <code>torch.Tensor</code> and transpose the channels to CHW.</td>
<td>No differences.</td>
<td>No differences.</td>
</tr>
</tbody>
</table>

## Implementation Differences

Take `RandomFlip` as example, the new version [RandomFlip](<>) in MMCV inherits `BaseTransfrom`, and move the
functionality implementation from `__call__` to `transform` method. In addition, the randomness related code
is placed in some extra methods and these methods need to be wrapped by `cache_randomness` decorator.

- MMDetection (original version)

```python
class RandomFlip:
def __call__(self, results):
"""Randomly flip images."""
...
# Randomly choose the flip direction
cur_dir = np.random.choice(direction_list, p=flip_ratio_list)
...
return results
```

- MMCV (new version)

```python
class RandomFlip(BaseTransfrom):
def transform(self, results):
"""Randomly flip images"""
...
cur_dir = self._random_direction()
...
return results

@cache_randomness
def _random_direction(self):
"""Randomly choose the flip direction"""
...
return np.random.choice(direction_list, p=flip_ratio_list)
```

0 comments on commit 6a3028c

Please sign in to comment.