-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File access V2 API #151
Merged
File access V2 API #151
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
1599ff6
Draft v2 API (local, zip)
kuenishi dfebf29
Draft v2 API (hdfs)
kuenishi 8f15cef
Add tiny S3 example
kuenishi eb36e87
Fix tiny bugs
kuenishi c81faed
URL opener
kuenishi 97e5252
Add recreate-on-fork wrapper
kuenishi 004b3f5
Add basic document on V2 API
kuenishi f17a455
Add boto3 to dependency
kuenishi b55d480
Fix flake8
kuenishi dc9bca7
Replace recreate_on_fork() with lazify() to support lazy open
kuenishi d186c9b
Add patch spec check
kuenishi b9519f1
More S3 API impl
kuenishi f63543b
FS.list() has trailing slash
kuenishi 972363d
Add a bit more document
kuenishi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,8 @@ Welcome to PFIO's documentation! | |
|
||
design | ||
reference | ||
v2 | ||
|
||
|
||
Indices and tables | ||
================== | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
.. module:: pfio.v2 | ||
|
||
V2 API | ||
====== | ||
|
||
.. note:: this is still in exprerimental phase. | ||
|
||
|
||
PFIO v2 API tries to solve the impedance mismatch between different | ||
local filesystem, NFS, and other object storage systems, with a lot | ||
simpler and cleaner code. | ||
|
||
It has removed several toplevel functions that seem to be less | ||
important. It turned out that they introduced more complexity than | ||
originally intended, due to the need of the global context. Thus, | ||
functions that depends on the global context such as ``open()``, | ||
``set_root()`` and etc. have been removed in v2 API. | ||
|
||
Instead, v2 API provides only two toplevel functions that enable | ||
direct resource access with full URL: ``open_url()`` and | ||
``from_url()``. The former opens a file and returns FileObject. The | ||
latter, creates a ``fs.FS`` object that enable resource access under | ||
the URL. The new class ``fs.FS``, is something close to handler object | ||
in version 1 API. ``fs.FS`` is intended to be as much compatible as | ||
possible, however, it has several differences. | ||
|
||
One notable difference is that it has the virtual concept of current | ||
working directory, and thus provides ``subfs()`` method. ``subfs()`` | ||
method behaves like ``chroot(1)`` or ``os.chdir()`` without actually | ||
changing current working directory of the process, but actually | ||
returns a *new* ``fs.FS`` object that has different working | ||
directory. All resouce access through the object automatically | ||
prepends the working directory. | ||
|
||
V2 API does not provide lazy resouce initialization any more. Instead, | ||
it provides simple wrapper ``lazify()``, which recreates the ``fs.FS`` | ||
object every time the object experiences ``fork(2)``. ``Hdfs`` and | ||
``Zip`` can be wrapped with it, and will be fork-tolerant object. | ||
|
||
|
||
|
||
Reference | ||
--------- | ||
|
||
.. autofunction:: open_url | ||
.. autofunction:: from_url | ||
.. autofunction:: lazify | ||
|
||
|
||
.. autoclass:: pfio.v2.fs.FS | ||
:members: | ||
|
||
Local file system | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: Local | ||
:members: | ||
|
||
HDFS (Hadoop File System) | ||
~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: Hdfs | ||
:members: | ||
|
||
S3 (AWS S3) | ||
~~~~~~~~~~~ | ||
|
||
.. autoclass:: S3 | ||
:members: | ||
|
||
Zip Archive | ||
~~~~~~~~~~~ | ||
|
||
.. autoclass:: Zip | ||
:members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,7 @@ | ||
import warnings | ||
|
||
from pfio.chainer_extensions.snapshot import load_snapshot # NOQA | ||
|
||
warnings.warn("Chainer extentions are deprecated and " | ||
"will be removed. Please use 'pfio' instead.", | ||
DeprecationWarning) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
import os | ||
import random | ||
import string | ||
from zipfile import ZipFile | ||
|
||
|
||
class ZipForTest: | ||
def __init__(self, destfile, data=None): | ||
if data is None: | ||
self.data = dict( | ||
file=b"foo", | ||
dir=dict( | ||
f=b"bar" | ||
) | ||
) | ||
else: | ||
self.data = data | ||
|
||
self._make_zip(destfile) | ||
self.destfile = destfile | ||
|
||
def content(self, path): | ||
d = self.data | ||
|
||
for node in path.split(os.path.sep): | ||
d = d.get(node) | ||
if not isinstance(d, dict): | ||
return d | ||
|
||
def _make_zip(self, destfile): | ||
with ZipFile(destfile, "w") as z: | ||
stack = [] | ||
self._write_zip_contents(z, stack, self.data) | ||
|
||
def _write_zip_contents(self, z, stack, data): | ||
for k in data: | ||
if isinstance(data[k], dict): | ||
self._write_zip_contents(z, stack+[k], data[k]) | ||
else: | ||
path = os.path.join(*stack, k) | ||
with z.open(path, 'w') as fp: | ||
fp.write(data[k]) | ||
|
||
|
||
def make_zip(zipfilename, root_dir, base_dir): | ||
pwd = os.getcwd() | ||
with ZipFile(zipfilename, "w") as f: | ||
try: | ||
os.chdir(root_dir) | ||
for root, dirs, filenames in os.walk(base_dir): | ||
for _dir in dirs: | ||
path = os.path.normpath(os.path.join(root, _dir)) | ||
f.write(path) | ||
for _file in filenames: | ||
path = os.path.normpath(os.path.join(root, _file)) | ||
f.write(path) | ||
finally: | ||
os.chdir(pwd) | ||
|
||
|
||
def make_random_str(n): | ||
return ''.join([random.choice(string.ascii_letters + string.digits) | ||
for i in range(n)]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
''' | ||
fs.FS> interface | ||
implementations: | ||
open_fs(URI, container=None|zip) => Local/HDFS/S3/Zip, etc | ||
- Local | ||
- subfs() -> Local | ||
- open_zip() -> Zip | ||
- open() -> FileObject | ||
- HDFS | ||
- subfs() -> HDFS | ||
- open_zip() -> Zip | ||
- open() -> FileObject | ||
- Zip | ||
- subfs() -> Zip | ||
- open_zip() -> Zip | ||
- open() -> FileObject | ||
- S3 (TBD) | ||
- GS (TBD) | ||
|
||
For example of globally switching backend file systems:: | ||
|
||
from pfio.v2 import local as pfio | ||
|
||
Or:: | ||
|
||
from pfio.v2 import Hdfs | ||
pfio = Hdfs() | ||
|
||
''' | ||
from .fs import from_url, lazify, open_url # NOQA | ||
from .hdfs import Hdfs, HdfsFileStat # NOQA | ||
from .local import Local, LocalFileStat # NOQA | ||
from .s3 import S3 # NOQA | ||
from .zip import Zip, ZipFileStat # NOQA | ||
|
||
local = Local() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using this name, are we not going to support containers other than zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any other container format in the scope? I would like being as specific as we need.