-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HdfsFileSystem.stat to return a compatible object to os.stat #114
Conversation
chainerio/filesystems/hdfs.py
Outdated
None, # st_ino | ||
None, # st_dev | ||
None, # st_nlink | ||
info['owner'], # st_uid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the info['owner']
and info['group']
are strings instead of uids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are! In contrast what os.stat
returns are int (uid/gid). It's fortunate (?) that os.stat_result
also accepts strings, but if a user expects the normal behavior (having int in st_uid and st_gid attributes), he or she will suffer.
In [4]: os.stat('pytest.ini')
Out[4]: os.stat_result(st_mode=33204, st_ino=27665508, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=1000, st_size=51, st_atime=1585040937, st_mtime=1583388582, st_ctime=1583388582)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I did not know os.stat_result
also accepts strings, I thought it just a wrapper of struct stat
. There is actually a way to convert the username to uid, and I used that when writing the hdfs fuse client. However, we still have the problem when the user or group does not exist on the local machine.
https://stackoverflow.com/questions/826082/python-finding-uid-gid-for-a-given-username-groupname-for-os-chown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests in case we are missing some fields?
chainerio/filesystems/hdfs.py
Outdated
stat.kind = info['kind'] | ||
stat.mode = info['permissions'] | ||
stat.size = info['size'] | ||
stat.owner = info['owner'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave some comments to mention that the stat.owner
and stat.group
can be string in hdfs case?
https://github.com/chainer/chainerio/blob/faf6d8bb5ad10526436ede13375646f842e7890c/chainerio/io.py#L134
can be a good place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for point it out! I have written docstrings of FileStats
s to clarify these things.
8e37e3b
to
5fdaca6
Compare
Sorry for the late update! Summary of the change is:
|
5fdaca6
to
61d3086
Compare
Can you resolve the conflict? |
61d3086
to
64eee71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for you contribution, I left some concerns.
stat = handler.stat(test_file_name) | ||
self.assertIsInstance(stat, pfio.containers.zip.ZipFileStat) | ||
self.assertTrue(stat.filename.endswith(test_file_name)) | ||
self.assertEqual(stat.size, 22) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we retrieve the data from the zip_info
so that these tests are more robust to changes?
self.assertIsInstance(stat, pfio.containers.zip.ZipFileStat) | ||
self.assertTrue(stat.filename.endswith(test_file_name)) | ||
self.assertEqual(stat.size, 22) | ||
self.assertEqual(stat.mode, 0o100664) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
stat = handler.stat(test_dir_name) | ||
self.assertIsInstance(stat, pfio.containers.zip.ZipFileStat) | ||
self.assertTrue(stat.filename.endswith(test_dir_name)) | ||
self.assertEqual(stat.size, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
self.assertIsInstance(stat, pfio.containers.zip.ZipFileStat) | ||
self.assertTrue(stat.filename.endswith(test_dir_name)) | ||
self.assertEqual(stat.size, 0) | ||
self.assertEqual(stat.mode, 0o40775) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
self.assertEqual(stat.mode, 0o100664) | ||
self.assertFalse(stat.isdir()) | ||
self.assertTrue(stat.last_modified) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also test the rest of ZipFileStat
stat = handler.stat(test_dir_name) | ||
self.assertIsInstance(stat, HdfsFileStat) | ||
self.assertTrue(stat.filename.endswith(test_dir_name)) | ||
self.assertEqual(stat.size, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
self.assertTrue(stat.filename.endswith(test_dir_name)) | ||
self.assertEqual(stat.size, 0) | ||
self.assertTrue(stat.isdir()) | ||
self.assertEqual(stat.mode, 0o40755) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
stat = handler.stat(test_file_name) | ||
self.assertIsInstance(stat, PosixFileStat) | ||
self.assertTrue(stat.filename.endswith(test_file_name)) | ||
self.assertEqual(stat.size, 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
stat = handler.stat(test_dir_name) | ||
self.assertIsInstance(stat, PosixFileStat) | ||
self.assertTrue(stat.filename.endswith(test_dir_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
self.group = info['group'] | ||
self.last_modified = info['last_modified'] | ||
self.last_accessed = info['last_accessed'] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be useful to add other fields from the hdfs.info
, like "replication" and "block_size"
707a96d
to
da82886
Compare
I have fixed the review points!
Also, it seems I forgot to push some modifications that I made before the review. Now they are already available.
What I have checked is:
I haven't been able to check document generation yet. |
da82886
to
eaeb20c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits in docstring
pfio/io.py
Outdated
|
||
`stat` of filesystem/container handlers return an object of | ||
subclass of `FileStat`. | ||
In addition to the common attribute that the `FileStat` abstract provides, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attributes
pfio/io.py
Outdated
In addition to the common attribute that the `FileStat` abstract provides, | ||
each `FileStat` subclass implements some additional attributes depending on | ||
what information the corresponding filesystem or container can handle. | ||
The common attributes behave almost the same in spite of filesystem or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost
is kind of ambiguous
pfio/io.py
Outdated
size = None | ||
|
||
def isdir(self): | ||
"""Distinguish whether the target is directory, based on the permission flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about return
Return whether the target is directory
/test |
Successfully created a job for commit eaeb20c: |
I have updated docstrings! (fixed the review points, also applied grammarly) |
/test |
Successfully created a job for commit e8469f6: |
@belltailjp Can you please fix the style? |
e8469f6
to
07dc703
Compare
Oops! Sorry I have fixed it right now. |
/test |
Successfully created a job for commit 07dc703: |
Done Sphinx fix! 🙇♂️ |
/test |
Successfully created a job for commit cdbd40a: |
I am using ChainerIO 0.1.1.
Currently
HdfsFileSystem.stat
returns a tiny dict such as{'size': 5, 'kind': 'file'}
,while
PosixFileSystem.stat
, which is equivalent toos.stat
, returns aos.stat_result
object.Users who try to transparently handle different type of filesystems by using ChainerIO filesystem handler suffer from this interface difference.
In this PR, 2 independent problems are handled.
stat
for bothxxxxFileSystem
return the same type of object (and I usedos.stat_result
)stat
returns poor information as mentioned above, so I useinfo
instead. At leastsize
andkind
are included in theinfo
too, so I guess it's a safe modification, unless there are some system dependent differences.There are a few issues remained.
st_ino
,st_dev
,st_nlink
are left as blank. I think it's affordable.owner
,group
are not id (int) but string in this PR. If there is an efficient way of conversion from string to uid/gid, I'd like to implement it.st_ctime
is not available in HDFS. I set the same value asst_mtime
It's just a PoC PR so I'm not confident the approach I implement is a good direction (specifically, using
os.stat_result
as a common return type). I'd like to hear your opinion.