Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Upload checkpoints and logs to ceph #1375

Merged
merged 103 commits into from
Oct 24, 2021
Merged
Show file tree
Hide file tree
Changes from 102 commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
b2a2257
[Feature] Choose storage backend by the prefix of filepath
zhouzaida Sep 9, 2021
073f73e
refactor FileClient and add unittest
zhouzaida Sep 10, 2021
dfb9fc4
support loading from different backends
zhouzaida Sep 11, 2021
48cfdad
polish docstring
zhouzaida Sep 21, 2021
c2c9fc0
fix unittet
zhouzaida Sep 21, 2021
d641a8c
rename attribute str_like_obj to is_str_like_obj
zhouzaida Sep 22, 2021
bb45ee5
[Docs] Upload checkpoint to petrel oss
zhouzaida Sep 22, 2021
68f0ab6
add infer_client method
zhouzaida Sep 23, 2021
2f56b3c
merge load-from-backend
zhouzaida Sep 23, 2021
3fe48b8
Support uploading checkpoint to petrel oss
zhouzaida Sep 23, 2021
31caf8e
add check_exist method
zhouzaida Sep 23, 2021
d202465
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Sep 23, 2021
f75511e
refactor CheckpointHook
zhouzaida Sep 23, 2021
8461a37
support uploading logs to ceph
zhouzaida Sep 24, 2021
7e7a80f
rename var client to file_client
zhouzaida Sep 24, 2021
44829ca
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Sep 24, 2021
aa8274b
polish docstring
zhouzaida Sep 26, 2021
fd70556
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Sep 26, 2021
6af56b1
enhance load_from_ceph
zhouzaida Sep 26, 2021
9600d18
refactor load_from_ceph
zhouzaida Sep 26, 2021
6983002
refactor TextLoggerHook
zhouzaida Sep 26, 2021
e91c93e
change the meaning of out_dir argument
zhouzaida Sep 27, 2021
5940864
fix test_checkpoint_hook.py
zhouzaida Sep 27, 2021
bb4712d
add join_paths method
zhouzaida Sep 27, 2021
2409531
Merge branch 'master' of https://github.com/open-mmlab/mmcv into load…
zhouzaida Sep 27, 2021
c1ae3ef
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Sep 27, 2021
d4b6d96
remove join_paths and add _format_path
zhouzaida Sep 28, 2021
824cff3
Merge branch 'master' of https://github.com/open-mmlab/mmcv into load…
zhouzaida Oct 3, 2021
767f7fb
enhance unittest
zhouzaida Oct 3, 2021
a929e90
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 3, 2021
b930678
refactor unittest
zhouzaida Oct 3, 2021
6704a15
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 3, 2021
8020802
add a unittest for EvalHook when file backend is petrel
zhouzaida Oct 4, 2021
1752698
singleton pattern
zhouzaida Oct 4, 2021
fb9567c
fix test_clientio.py
zhouzaida Oct 4, 2021
00505f8
deprecate CephBackend
zhouzaida Oct 4, 2021
5fdcedc
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 4, 2021
9d0b6ec
add warning in load_from_ceph
zhouzaida Oct 4, 2021
6980bf3
fix type of out_suffix
zhouzaida Oct 5, 2021
225d3a6
enhance docstring
zhouzaida Oct 6, 2021
22644da
refactor unittest for petrel
zhouzaida Oct 6, 2021
058b7e8
refactor unittest for disk backend
zhouzaida Oct 6, 2021
1692678
update io.md
zhouzaida Oct 6, 2021
01b9807
add concat_paths method
zhouzaida Oct 6, 2021
9491bac
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 6, 2021
b44bfc6
fix CI
zhouzaida Oct 6, 2021
016c879
mock check_exist
zhouzaida Oct 7, 2021
fed5a39
improve docstring
zhouzaida Oct 8, 2021
4959687
improve docstring
zhouzaida Oct 8, 2021
fb0e21f
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 8, 2021
9a0c535
improve docstring
zhouzaida Oct 8, 2021
e0dcad9
improve docstring
zhouzaida Oct 8, 2021
aea920a
add isdir and copyfile for file backend
zhouzaida Oct 10, 2021
4eda86f
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 10, 2021
6412103
delete copyfile and add get_local_path
zhouzaida Oct 11, 2021
c557ca3
Merge branch 'master' of https://github.com/open-mmlab/mmcv into load…
zhouzaida Oct 12, 2021
eeda74c
remove isdir method of petrel
zhouzaida Oct 12, 2021
ad52428
fix typo
zhouzaida Oct 12, 2021
b846cdb
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 12, 2021
54597f4
rename check_exists to exists
zhouzaida Oct 12, 2021
097bae5
refactor code and polish docstring
zhouzaida Oct 12, 2021
9f78448
fix windows ci
zhouzaida Oct 12, 2021
941a884
add comment and polish docstring
zhouzaida Oct 13, 2021
3105687
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 13, 2021
198a465
polish docstring
zhouzaida Oct 14, 2021
cda02b7
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 14, 2021
7ad8ab7
polish docstring
zhouzaida Oct 14, 2021
e0d6a83
rename _path_mapping to _map_path
zhouzaida Oct 15, 2021
59aa354
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 15, 2021
ae0cdd3
polish docstring and fix typo
zhouzaida Oct 15, 2021
bd3b322
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 15, 2021
a2e0162
refactor get_local_path
zhouzaida Oct 16, 2021
bfc5ecc
fix conflict
zhouzaida Oct 16, 2021
50ba26f
add list_dir_or_file for FileClient
zhouzaida Oct 17, 2021
4ad3bf5
add list_dir_or_file for PetrelBackend
zhouzaida Oct 18, 2021
df207d1
fix windows ci
zhouzaida Oct 18, 2021
d29a88d
Add return docstring
zhouzaida Oct 19, 2021
f18a779
polish docstring
zhouzaida Oct 19, 2021
7b7a380
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 19, 2021
669cfee
fix typo
zhouzaida Oct 19, 2021
b6eb5d1
fix typo
zhouzaida Oct 19, 2021
0c8fbc3
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 19, 2021
150d504
fix typo
zhouzaida Oct 19, 2021
b371c83
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 19, 2021
23ba993
fix error when mocking PetrelBackend
zhouzaida Oct 20, 2021
208ff82
deprecate the conversion from Path to str
zhouzaida Oct 20, 2021
9ecfc12
add docs for loading checkpoints with FileClient
zhouzaida Oct 22, 2021
947d549
rename keep_log to keep_local
zhouzaida Oct 22, 2021
e921770
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 22, 2021
38559f1
refactor map_path
zhouzaida Oct 22, 2021
e75685c
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 22, 2021
ea32388
add _ensure_methods to ensure methods have been implemented
zhouzaida Oct 22, 2021
f413480
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 22, 2021
a8cc11d
fix list_dir_or_file
zhouzaida Oct 22, 2021
7ef1652
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 22, 2021
e66fe61
rename _ensure_method_implemented to has_method
zhouzaida Oct 23, 2021
196a56d
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 23, 2021
6987038
fix conflict
zhouzaida Oct 23, 2021
aa5611c
Merge branch 'load-from-backend' into upload-ckpt-to-ceph
zhouzaida Oct 23, 2021
100bf55
fix conflict
zhouzaida Oct 23, 2021
a13d078
refactor
zhouzaida Oct 24, 2021
26c8127
polish information
zhouzaida Oct 24, 2021
39a58a3
format information
zhouzaida Oct 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 54 additions & 19 deletions mmcv/fileio/file_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from typing import Iterable, Iterator, Optional, Tuple, Union
from urllib.request import urlopen

import mmcv
from mmcv.utils.misc import has_method
from mmcv.utils.path import is_filepath

Expand All @@ -23,6 +24,17 @@ class BaseStorageBackend(metaclass=ABCMeta):
as texts.
"""

# a flag to indicate whether the backend can create a symlink for a file
_allow_symlink = False

@property
def name(self):
return self.__class__.__name__

@property
def allow_symlink(self):
return self._allow_symlink

@abstractmethod
def get(self, filepath):
pass
Expand All @@ -41,8 +53,8 @@ class CephBackend(BaseStorageBackend):
will be replaced by ``dst``. Default: None.

.. warning::
:class:`CephBackend` will be deprecated, please use
:class:`PetrelBackend` instead
:class:`mmcv.fileio.file_client.CephBackend` will be deprecated,
please use :class:`mmcv.fileio.file_client.PetrelBackend` instead.
"""

def __init__(self, path_mapping=None):
Expand Down Expand Up @@ -266,8 +278,8 @@ def isfile(self, filepath: Union[str, Path]) -> bool:
filepath = self._format_path(filepath)
return self._client.contains(filepath)

def concat_paths(self, filepath: Union[str, Path],
*filepaths: Union[str, Path]) -> str:
def join_path(self, filepath: Union[str, Path],
*filepaths: Union[str, Path]) -> str:
"""Concatenate all file paths.

Args:
Expand Down Expand Up @@ -377,7 +389,7 @@ def _list_dir_or_file(dir_path, list_dir, list_file, suffix,
# is a directory, because `self.isdir` relies on
# `self._client.list`
if path.endswith('/'): # a directory path
next_dir_path = self.concat_paths(dir_path, path)
next_dir_path = self.join_path(dir_path, path)
if list_dir:
# get the relative path and exclude the last
# character '/'
Expand All @@ -388,7 +400,7 @@ def _list_dir_or_file(dir_path, list_dir, list_file, suffix,
list_file, suffix,
recursive)
else: # a file path
absolute_path = self.concat_paths(dir_path, path)
absolute_path = self.join_path(dir_path, path)
rel_path = absolute_path[len(root):]
if (suffix is None
or rel_path.endswith(suffix)) and list_file:
Expand Down Expand Up @@ -491,6 +503,8 @@ def get_text(self, filepath, encoding=None):
class HardDiskBackend(BaseStorageBackend):
"""Raw hard disks storage backend."""

_allow_symlink = True

def get(self, filepath: Union[str, Path]) -> bytes:
"""Read data from a given ``filepath`` with 'rb' mode.

Expand Down Expand Up @@ -524,10 +538,15 @@ def get_text(self,
def put(self, obj: bytes, filepath: Union[str, Path]) -> None:
"""Write data to a given ``filepath`` with 'wb' mode.

Note:
``put`` will create a directory if the directory of ``filepath``
does not exist.

Args:
obj (bytes): Data to be written.
filepath (str or Path): Path to write data.
"""
mmcv.mkdir_or_exist(osp.dirname(filepath))
with open(filepath, 'wb') as f:
f.write(obj)

Expand All @@ -537,12 +556,17 @@ def put_text(self,
encoding: str = 'utf-8') -> None:
"""Write data to a given ``filepath`` with 'w' mode.

Note:
``put_text`` will create a directory if the directory of
``filepath`` does not exist.

Args:
obj (str): Data to be written.
filepath (str or Path): Path to write data.
encoding (str): The encoding format used to open the ``filepath``.
Default: 'utf-8'.
"""
mmcv.mkdir_or_exist(osp.dirname(filepath))
with open(filepath, 'w', encoding=encoding) as f:
f.write(obj)

Expand Down Expand Up @@ -579,7 +603,7 @@ def isdir(self, filepath: Union[str, Path]) -> bool:
return osp.isdir(filepath)

def isfile(self, filepath: Union[str, Path]) -> bool:
"""Check a ``filepath`` whether it is a file.
"""Check whether a file path is a file.

Args:
filepath (str or Path): Path to be checked whether it is a file.
Expand All @@ -590,8 +614,8 @@ def isfile(self, filepath: Union[str, Path]) -> bool:
"""
return osp.isfile(filepath)

def concat_paths(self, filepath: Union[str, Path],
*filepaths: Union[str, Path]) -> str:
def join_path(self, filepath: Union[str, Path],
*filepaths: Union[str, Path]) -> str:
"""Concatenate all file paths.

Join one or more filepath components intelligently. The return value
Expand Down Expand Up @@ -714,7 +738,7 @@ class FileClient:
Note that It can also register other backend accessor with a given name,
prefixes, and backend class. In addition, We use the singleton pattern to
avoid repeated object creation. If the arguments are the same, the same
object is returned.
object will be returned.

Args:
backend (str, optional): The storage backend type. Options are "disk",
Expand Down Expand Up @@ -788,18 +812,21 @@ def __new__(cls, backend=None, prefix=None, **kwargs):
_instance = super().__new__(cls)
if backend is not None:
_instance.client = cls._backends[backend](**kwargs)
_instance.backend_name = backend
else:
_instance.client = cls._prefix_to_backends[prefix](**kwargs)
# infer the backend name according to the prefix
for backend_name, backend_cls in cls._backends.items():
if isinstance(_instance.client, backend_cls):
_instance.backend_name = backend_name
break

cls._instances[arg_key] = _instance

return _instance

@property
def name(self):
return self.client.name

@property
def allow_symlink(self):
return self.client.allow_symlink

@staticmethod
def parse_uri_prefix(uri: Union[str, Path]) -> Optional[str]:
"""Parse the prefix of a uri.
Expand Down Expand Up @@ -980,6 +1007,10 @@ def get_text(self, filepath: Union[str, Path], encoding='utf-8') -> str:
def put(self, obj: bytes, filepath: Union[str, Path]) -> None:
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved
"""Write data to a given ``filepath`` with 'wb' mode.

Note:
``put`` should create a directory if the directory of ``filepath``
does not exist.

Args:
obj (bytes): Data to be written.
filepath (str or Path): Path to write data.
Expand All @@ -989,6 +1020,10 @@ def put(self, obj: bytes, filepath: Union[str, Path]) -> None:
def put_text(self, obj: str, filepath: Union[str, Path]) -> None:
"""Write data to a given ``filepath`` with 'w' mode.

Note:
``put_text`` should create a directory if the directory of
``filepath`` does not exist.

Args:
obj (str): Data to be written.
filepath (str or Path): Path to write data.
Expand Down Expand Up @@ -1041,8 +1076,8 @@ def isfile(self, filepath: Union[str, Path]) -> bool:
"""
return self.client.isfile(filepath)

def concat_paths(self, filepath: Union[str, Path],
*filepaths: Union[str, Path]) -> str:
def join_path(self, filepath: Union[str, Path],
*filepaths: Union[str, Path]) -> str:
"""Concatenate all file paths.

Join one or more filepath components intelligently. The return value
Expand All @@ -1054,7 +1089,7 @@ def concat_paths(self, filepath: Union[str, Path],
Returns:
str: The result of concatenation.
"""
return self.client.concat_paths(filepath, *filepaths)
return self.client.join_path(filepath, *filepaths)

@contextmanager
def get_local_path(self, filepath: Union[str, Path]) -> Iterable[str]:
Expand Down
1 change: 0 additions & 1 deletion mmcv/fileio/handlers/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ class BaseFileHandler(metaclass=ABCMeta):
# str-like object or bytes-like object. Pickle only processes bytes-like
# objects but json only processes str-like object. If it is str-like
# object, `StringIO` will be used to process the buffer.

str_like = True

@abstractmethod
Expand Down
53 changes: 39 additions & 14 deletions mmcv/runner/checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,28 +323,43 @@ def load_from_pavi(filename, map_location=None):


@CheckpointLoader.register_scheme(prefixes='s3://')
def load_from_ceph(filename, map_location=None, backend='ceph'):
def load_from_ceph(filename, map_location=None, backend='petrel'):
"""load checkpoint through the file path prefixed with s3. In distributed
setting, this function download ckpt at all ranks to different temporary
directories.

Args:
filename (str): checkpoint file path with s3 prefix
map_location (str, optional): Same as :func:`torch.load`.
backend (str): The storage backend type. Options are "disk", "ceph",
"memcached" and "lmdb". Default: 'ceph'
backend (str, optional): The storage backend type. Options are 'ceph',
'petrel'. Default: 'petrel'.

.. warning::
:class:`mmcv.fileio.file_client.CephBackend` will be deprecated,
please use :class:`mmcv.fileio.file_client.PetrelBackend` instead.

Returns:
dict or OrderedDict: The loaded checkpoint.
"""

allowed_backends = ['ceph']
allowed_backends = ['ceph', 'petrel']
if backend not in allowed_backends:
raise ValueError(f'Load from Backend {backend} is not supported.')

fileclient = FileClient(backend=backend)
buffer = io.BytesIO(fileclient.get(filename))
checkpoint = torch.load(buffer, map_location=map_location)
if backend == 'ceph':
warnings.warn(
'CephBackend will be deprecated, please use PetrelBackend instead')

# CephClient and PetrelBackend have the same prefix 's3://' and the latter
# will be chosen as default. If PetrelBackend can not be instantiated
# successfully, the CephClient will be chosen.
try:
file_client = FileClient(backend=backend)
except ImportError:
allowed_backends.remove(backend)
file_client = FileClient(backend=allowed_backends[0])

with io.BytesIO(file_client.get(filename)) as buffer:
checkpoint = torch.load(buffer, map_location=map_location)
return checkpoint


Expand Down Expand Up @@ -506,7 +521,6 @@ def load_checkpoint(model,
pair of the regular expression operations. Default: strip
the prefix 'module.' by [(r'^module\\.', '')].


Returns:
dict or OrderedDict: The loaded checkpoint.
"""
Expand Down Expand Up @@ -616,7 +630,11 @@ def get_state_dict(module, destination=None, prefix='', keep_vars=False):
return destination


def save_checkpoint(model, filename, optimizer=None, meta=None):
def save_checkpoint(model,
filename,
optimizer=None,
meta=None,
file_client_args=None):
"""Save checkpoint to file.

The checkpoint will have 3 fields: ``meta``, ``state_dict`` and
Expand All @@ -627,6 +645,10 @@ def save_checkpoint(model, filename, optimizer=None, meta=None):
filename (str): Checkpoint filename.
optimizer (:obj:`Optimizer`, optional): Optimizer to be saved.
meta (dict, optional): Metadata to be saved in checkpoint.
file_client_args (dict, optional): Arguments to instantiate a
FileClient. See :class:`mmcv.fileio.FileClient` for details.
Default: None.
`New in version 1.3.16.`
"""
if meta is None:
meta = {}
Expand Down Expand Up @@ -654,6 +676,10 @@ def save_checkpoint(model, filename, optimizer=None, meta=None):
checkpoint['optimizer'][name] = optim.state_dict()

if filename.startswith('pavi://'):
if file_client_args is not None:
raise ValueError(
'file_client_args should be "None" if filename starts with'
f'"pavi://", but got {file_client_args}')
try:
from pavi import modelcloud
from pavi import exception
Expand All @@ -674,8 +700,7 @@ def save_checkpoint(model, filename, optimizer=None, meta=None):
f.flush()
model.create_file(checkpoint_file, name=model_name)
else:
mmcv.mkdir_or_exist(osp.dirname(filename))
# immediately flush buffer
with open(filename, 'wb') as f:
file_client = FileClient.infer_client(file_client_args, filename)
with io.BytesIO() as f:
torch.save(checkpoint, f)
f.flush()
file_client.put(f.getvalue(), filename)
Loading