Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Loading objects from different backends and dumping objects to different backends #1330

Merged
merged 50 commits into from
Oct 23, 2021
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
b2a2257
[Feature] Choose storage backend by the prefix of filepath
zhouzaida Sep 9, 2021
073f73e
refactor FileClient and add unittest
zhouzaida Sep 10, 2021
dfb9fc4
support loading from different backends
zhouzaida Sep 11, 2021
48cfdad
polish docstring
zhouzaida Sep 21, 2021
c2c9fc0
fix unittet
zhouzaida Sep 21, 2021
d641a8c
rename attribute str_like_obj to is_str_like_obj
zhouzaida Sep 22, 2021
68f0ab6
add infer_client method
zhouzaida Sep 23, 2021
31caf8e
add check_exist method
zhouzaida Sep 23, 2021
7e7a80f
rename var client to file_client
zhouzaida Sep 24, 2021
aa8274b
polish docstring
zhouzaida Sep 26, 2021
bb4712d
add join_paths method
zhouzaida Sep 27, 2021
2409531
Merge branch 'master' of https://github.com/open-mmlab/mmcv into load…
zhouzaida Sep 27, 2021
d4b6d96
remove join_paths and add _format_path
zhouzaida Sep 28, 2021
824cff3
Merge branch 'master' of https://github.com/open-mmlab/mmcv into load…
zhouzaida Oct 3, 2021
767f7fb
enhance unittest
zhouzaida Oct 3, 2021
b930678
refactor unittest
zhouzaida Oct 3, 2021
1752698
singleton pattern
zhouzaida Oct 4, 2021
fb9567c
fix test_clientio.py
zhouzaida Oct 4, 2021
00505f8
deprecate CephBackend
zhouzaida Oct 4, 2021
225d3a6
enhance docstring
zhouzaida Oct 6, 2021
22644da
refactor unittest for petrel
zhouzaida Oct 6, 2021
058b7e8
refactor unittest for disk backend
zhouzaida Oct 6, 2021
1692678
update io.md
zhouzaida Oct 6, 2021
01b9807
add concat_paths method
zhouzaida Oct 6, 2021
fed5a39
improve docstring
zhouzaida Oct 8, 2021
4959687
improve docstring
zhouzaida Oct 8, 2021
aea920a
add isdir and copyfile for file backend
zhouzaida Oct 10, 2021
6412103
delete copyfile and add get_local_path
zhouzaida Oct 11, 2021
c557ca3
Merge branch 'master' of https://github.com/open-mmlab/mmcv into load…
zhouzaida Oct 12, 2021
eeda74c
remove isdir method of petrel
zhouzaida Oct 12, 2021
ad52428
fix typo
zhouzaida Oct 12, 2021
941a884
add comment and polish docstring
zhouzaida Oct 13, 2021
198a465
polish docstring
zhouzaida Oct 14, 2021
e0d6a83
rename _path_mapping to _map_path
zhouzaida Oct 15, 2021
ae0cdd3
polish docstring and fix typo
zhouzaida Oct 15, 2021
a2e0162
refactor get_local_path
zhouzaida Oct 16, 2021
50ba26f
add list_dir_or_file for FileClient
zhouzaida Oct 17, 2021
4ad3bf5
add list_dir_or_file for PetrelBackend
zhouzaida Oct 18, 2021
df207d1
fix windows ci
zhouzaida Oct 18, 2021
d29a88d
Add return docstring
zhouzaida Oct 19, 2021
f18a779
polish docstring
zhouzaida Oct 19, 2021
b6eb5d1
fix typo
zhouzaida Oct 19, 2021
150d504
fix typo
zhouzaida Oct 19, 2021
208ff82
deprecate the conversion from Path to str
zhouzaida Oct 20, 2021
9ecfc12
add docs for loading checkpoints with FileClient
zhouzaida Oct 22, 2021
38559f1
refactor map_path
zhouzaida Oct 22, 2021
ea32388
add _ensure_methods to ensure methods have been implemented
zhouzaida Oct 22, 2021
a8cc11d
fix list_dir_or_file
zhouzaida Oct 22, 2021
e66fe61
rename _ensure_method_implemented to has_method
zhouzaida Oct 23, 2021
6987038
fix conflict
zhouzaida Oct 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 48 additions & 2 deletions docs/understand_mmcv/io.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,17 @@

This module provides two universal API to load and dump files of different formats.

```{note}
Since v1.3.16, the IO modules support loading (dumping) data from (to) different backends, respectively. More details are in PR [#1330](https://github.com/open-mmlab/mmcv/pull/1330).
```

### Load and dump data

`mmcv` provides a universal api for loading and dumping data, currently
supported formats are json, yaml and pickle.

#### Load from disk or dump to disk

```python
import mmcv

Expand All @@ -29,6 +35,20 @@ with open('test.yaml', 'w') as f:
data = mmcv.dump(data, f, file_format='yaml')
```

#### Load from other backends or dump to other backends

```python
import mmcv

# load data from a file
data = mmcv.load('s3://bucket-name/test.json')
data = mmcv.load('s3://bucket-name/test.yaml')
data = mmcv.load('s3://bucket-name/test.pkl')

# dump data to a file with a filename (infer format from file extension)
mmcv.dump(data, 's3://bucket-name/out.pkl')
```

It is also very convenient to extend the api to support more file formats.
All you need to do is to write a file handler inherited from `BaseFileHandler`
and register it with one or several file formats.
Expand Down Expand Up @@ -92,7 +112,9 @@ d
e
```

Then use `list_from_file` to load the list from a.txt.
#### Load from disk

Use `list_from_file` to load the list from a.txt.

```python
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved
>>> mmcv.list_from_file('a.txt')
Expand All @@ -113,11 +135,35 @@ For example `b.txt` is a text file with 3 lines.
3 panda
```

Then use `dict_from_file` to load the dict from `b.txt` .
Then use `dict_from_file` to load the dict from `b.txt`.

```python
>>> mmcv.dict_from_file('b.txt')
{'1': 'cat', '2': ['dog', 'cow'], '3': 'panda'}
>>> mmcv.dict_from_file('b.txt', key_type=int)
{1: 'cat', 2: ['dog', 'cow'], 3: 'panda'}
```

#### Load from other backends

Use `list_from_file` to load the list from `s3://bucket-name/a.txt`.

```python
>>> mmcv.list_from_file('s3://bucket-name/a.txt')
['a', 'b', 'c', 'd', 'e']
>>> mmcv.list_from_file('s3://bucket-name/a.txt', offset=2)
['c', 'd', 'e']
>>> mmcv.list_from_file('s3://bucket-name/a.txt', max_num=2)
['a', 'b']
>>> mmcv.list_from_file('s3://bucket-name/a.txt', prefix='/mnt/')
['/mnt/a', '/mnt/b', '/mnt/c', '/mnt/d', '/mnt/e']
```

Use `dict_from_file` to load the dict from `s3://bucket-name/b.txt`.

```python
>>> mmcv.dict_from_file('s3://bucket-name/b.txt')
{'1': 'cat', '2': ['dog', 'cow'], '3': 'panda'}
>>> mmcv.dict_from_file('s3://bucket-name/b.txt', key_type=int)
{1: 'cat', 2: ['dog', 'cow'], 3: 'panda'}
```
51 changes: 48 additions & 3 deletions docs_zh_CN/understand_mmcv/io.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,16 @@

文件输入输出模块提供了两个通用的 API 接口用于读取和保存不同格式的文件。

```{note}
在 v1.3.16 及之后的版本中,IO 模块支持从不同后端读取数据并支持将数据至不同后端。更多细节请访问 PR [#1330](https://github.com/open-mmlab/mmcv/pull/1330)。
```

### 读取和保存数据

`mmcv` 提供了一个通用的 api 用于读取和保存数据,目前支持的格式有 json、yaml 和 pickle。

#### 从硬盘读取数据或者将数据保存至硬盘

```python
import mmcv

Expand All @@ -28,6 +34,20 @@ with open('test.yaml', 'w') as f:
data = mmcv.dump(data, f, file_format='yaml')
```

#### 从其他后端加载或者保存至其他后端

```python
import mmcv

# 从 s3 文件读取数据
data = mmcv.load('s3://bucket-name/test.json')
data = mmcv.load('s3://bucket-name/test.yaml')
data = mmcv.load('s3://bucket-name/test.pkl')

# 将数据保存至 s3 文件 (根据文件名后缀反推文件类型)
mmcv.dump(data, 's3://bucket-name/out.pkl')
```

我们提供了易于拓展的方式以支持更多的文件格式。我们只需要创建一个继承自 `BaseFileHandler` 的
文件句柄类并将其注册到 `mmcv` 中即可。句柄类至少需要重写三个方法。

Expand All @@ -49,7 +69,7 @@ class TxtHandler1(mmcv.BaseFileHandler):
return str(obj)
```

`PickleHandler` 为例
`PickleHandler` 为例

```python
import pickle
Expand Down Expand Up @@ -87,8 +107,9 @@ c
d
e
```
#### 从硬盘读取

使用 `list_from_file` 读取 `a.txt`
使用 `list_from_file` 读取 `a.txt`

```python
>>> mmcv.list_from_file('a.txt')
Expand All @@ -109,11 +130,35 @@ e
3 panda
```

使用 `dict_from_file` 读取 `b.txt`
使用 `dict_from_file` 读取 `b.txt`

```python
>>> mmcv.dict_from_file('b.txt')
{'1': 'cat', '2': ['dog', 'cow'], '3': 'panda'}
>>> mmcv.dict_from_file('b.txt', key_type=int)
{1: 'cat', 2: ['dog', 'cow'], 3: 'panda'}
```

#### 从其他后端读取

使用 `list_from_file` 读取 `s3://bucket-name/a.txt` 。

```python
>>> mmcv.list_from_file('s3://bucket-name/a.txt')
['a', 'b', 'c', 'd', 'e']
>>> mmcv.list_from_file('s3://bucket-name/a.txt', offset=2)
['c', 'd', 'e']
>>> mmcv.list_from_file('s3://bucket-name/a.txt', max_num=2)
['a', 'b']
>>> mmcv.list_from_file('s3://bucket-name/a.txt', prefix='/mnt/')
['/mnt/a', '/mnt/b', '/mnt/c', '/mnt/d', '/mnt/e']
```

使用 `dict_from_file` 读取 `b.txt`

```python
>>> mmcv.dict_from_file('s3://bucket-name/b.txt')
{'1': 'cat', '2': ['dog', 'cow'], '3': 'panda'}
>>> mmcv.dict_from_file('s3://bucket-name/b.txt', key_type=int)
{1: 'cat', 2: ['dog', 'cow'], 3: 'panda'}
```
Loading