Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Refactor fileio but without breaking bc #533

Merged
merged 19 commits into from
Sep 26, 2022

Conversation

zhouzaida
Copy link
Member

@zhouzaida zhouzaida commented Sep 14, 2022

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

In this PR, we refactor the fileio module to make it more user-friendly and provide more frequently used functions. Note that this PR will not cause backward compatibility breaking which means the previous case works fine until the release of MMEngine v1.0.

  • Previous Case
>>> from mmengine.fileio import FileClient
>>> client = FileClient()
>>> client.get_text('s3://path/of/your/file')
'hello world'
  • New Case
>>> import mmengine.fileio as fileio
>>> fileio.get_text('s3://path/of/your/file')
'hello world'

Modification

  1. Move the file backends in mmengine/fileio/file_client.py to mmengine/fileio/backends but import them in former to avoide BC-breaking. Besides, rename HardDiskBackend to LocalBackend because the former class name is not enough accuracy. To avoid bc-breaking, still define HardDiskBackend in mmengine/fileio/file_client.py which inherits from LocalBackend and does not have any modifications. Each file contains a Backend.

    • mmengine/fileio/backends/base.py
    • mmengine/fileio/backends/local_backend.py (LocalBackend)
    • mmengine/fileio/backends/petrel_backend.py (PetrelBackend)
    • mmengine/fileio/backends/memcached_backend.py (MemcachedBackend)
    • mmengine/fileio/backends/lmdb_backend.py (LmdbBackend)
    • mmengine/fileio/backends/http_backend.py (HTTPBackend)
  2. Provide more interfaces for File Backends

    • get
    • get_text
    • put
    • put_text
    • exists
    • isdir
    • isfile
    • join_path
    • get_local_path
    • copyfile (new method)
    • copytree( new method)
    • copyfile_from_local (new method)
    • copytree_from_local (new method)
    • copyfile_to_local (new method)
    • copytree_to_local (new method)
    • remove
    • rmtree (new method)
    • copy_if_symlink_fails (new method)
    • list_dir_of_file

    Note: Only LocalBackend and PetrelBackend implement all of the above interfaces.

  3. Functions or methods involving FileClient have also been modified

    • mmengine/hooks/checkpoint_hook.py

      Add a new argument backend_args to the construct method of CheckpointHook. The backend_args is a dictionary which probably contains a backend key-value determining the backend that will be used , and the rest of it will be passed to the __init__ method of backends.
      The difference between the filelds of file_client_args and backend_args is that file_client_args accepts the prefix field, while backend_args does not.

      # passing `file_client_args` will print a deprecated warning but still works
      hook = CheckpointHook(xxx, file_client_args={'backend': 'disk'})
      # passing `backend_args` will work fine
      # the value of `backend` should be `local` mapping to 'LocalBackend' rather than `disk`
      hook = CheckpointHook(xxx, backend_args={'backend': 'local'})
    • mmengine/hooks/logger_hook.py

      same as above

    • mmengine/runner/checkpoint.py

    • mmengine/runner/runner.py
      Add a new argument backend_args to the save_checkpoint method of Runner.

BC-breaking

There is no bc-breaking util release of MMEngine v1.0.0 (current version is v0.1.0)

Use cases

There are two ways to call a method of a file backend:

  • Initialize a file backend with get_file_backend and call its methods.
  • Directly call unified I/O functions, which will call get_file_backend
    first and then call the corresponding backend method.
>>> # Initialize a file backend and call its methods
>>> import mmengine.fileio as fileio
>>> backend = fileio.get_file_backend(backend_args={'backend': 'petrel'})
>>> backend.get('s3://path/of/your/file')
b'hello world'
>>> # use the global backend instance
>>> backend1 = fileio.get_file_backend(backend_args={'backend': 'petrel'}, enable_singleton=True)
>>> backend is backend1
True

>>> # Directly call unified I/O functions and use the global backend instance by default
>>> fileio.get('s3://path/of/your/file')
b'hello world'

Validate the modification in downstream repos

Validate whether the old version and current version works fine as expected in MMDetection.

Read training data from Petrel

  • Old version (Tested)
file_client_args = dict(
    backend='petrel',
    path_mapping=dict({
        './data/': 's3://openmmlab/datasets/detection/',
        'data/': 's3://openmmlab/datasets/detection/'})
)
  • New Version

This new version requires mmcv and mmdet support accepting backend_args argument so this new version will not be tested.

backend_args = dict(
    backend='petrel',
    path_mapping=dict({
        './data/': 's3://openmmlab/datasets/detection/',
        'data/': 's3://openmmlab/datasets/detection/'
  }))

Load or resume checkpoints from Petrel (Tested)

load_from = 'myself:s3://zhouzaida/mmdet/xxxx.ckpt'

Save checkpoints to Petrel

  • Old version (Tested)
default_hooks = dict(
    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=2, out_dir='myself:s3://zhouzaida/mmdet/', file_client_args={'backend': 'petrel'}),
)
  • New version (Tested)
default_hooks = dict(
    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=2, out_dir='myself:s3://zhouzaida/mmdet/', backend_args={'backend': 'petrel'}),
)

Save logs to Petrel

  • Old version (Tested)
default_hooks = dict(
    logger=dict(type='LoggerHook', interval=50, out_dir='myself:s3://zhouzaida/mmdet/', file_client_args={'backend': 'petrel'}),
)
  • New version (Tested)
default_hooks = dict(
    logger=dict(type='LoggerHook', interval=50, out_dir='myself:s3://zhouzaida/mmdet/', backend_args={'backend': 'petrel'}),
)

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@zhouzaida zhouzaida marked this pull request as draft September 14, 2022 15:33
Copy link
Collaborator

@C1rN09 C1rN09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically good.
Will xxxBackend codes in file_client.py be removed in the future, since they are moved to backends directory?

@zhouzaida
Copy link
Member Author

Basically good. Will xxxBackend codes in file_client.py be removed in the future, since they are moved to backends directory?

They had been moved in the newest commit.

@codecov
Copy link

codecov bot commented Sep 16, 2022

Codecov Report

Base: 77.68% // Head: 77.95% // Increases project coverage by +0.27% 🎉

Coverage data is based on head (5582bd7) compared to base (ca0364b).
Patch coverage: 86.57% of modified lines in pull request are covered.

❗ Current head 5582bd7 differs from pull request most recent head c6d2234. Consider uploading reports for the commit c6d2234 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
+ Coverage   77.68%   77.95%   +0.27%     
==========================================
  Files         116      125       +9     
  Lines        8612     8983     +371     
  Branches     1778     1840      +62     
==========================================
+ Hits         6690     7003     +313     
- Misses       1623     1664      +41     
- Partials      299      316      +17     
Flag Coverage Δ
unittests 77.95% <86.57%> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmengine/runner/runner.py 84.01% <23.52%> (-1.72%) ⬇️
mmengine/runner/checkpoint.py 44.63% <35.71%> (+0.63%) ⬆️
mmengine/hooks/checkpoint_hook.py 87.11% <60.00%> (-2.73%) ⬇️
mmengine/model/utils.py 63.46% <77.27%> (+3.70%) ⬆️
mmengine/fileio/backends/memcached_backend.py 79.16% <79.16%> (ø)
mmengine/fileio/handlers/registry_utils.py 80.00% <80.00%> (ø)
mmengine/fileio/parse.py 90.47% <80.00%> (-9.53%) ⬇️
mmengine/fileio/io.py 82.31% <80.80%> (-3.63%) ⬇️
mmengine/fileio/backends/base.py 84.61% <84.61%> (ø)
mmengine/fileio/backends/lmdb_backend.py 93.10% <93.10%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ZwwWayne ZwwWayne added this to the 0.2.0 milestone Sep 17, 2022
@zhouzaida zhouzaida marked this pull request as ready for review September 18, 2022 17:18
C1rN09
C1rN09 previously approved these changes Sep 22, 2022
mmengine/fileio/backends/lmdb_backend.py Show resolved Hide resolved
mmengine/fileio/io.py Show resolved Hide resolved
mmengine/runner/runner.py Outdated Show resolved Hide resolved
@ZwwWayne ZwwWayne merged commit ed84dfd into open-mmlab:main Sep 26, 2022
@zhouzaida zhouzaida deleted the refactor-fileio branch September 26, 2022 09:05
@C1rN09 C1rN09 mentioned this pull request Oct 13, 2022
MeowZheng pushed a commit to open-mmlab/mmsegmentation that referenced this pull request Feb 1, 2023
## Motivation

Use the new fileio from mmengine
open-mmlab/mmengine#533

## Modification

1. Use `mmengine.fileio` to repalce FileClient  in mmseg/datasets
2. Use `mmengine.fileio` to repalce FileClient in
mmseg/datasets/transforms
3. Use `mmengine.fileio` to repalce FileClient in mmseg/visualization

## BC-breaking (Optional)

we modify all the dataset configurations, so please use the latest config file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants