-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuse #53
Fuse #53
Conversation
|
@mrocklin , you shouldn't need to make the directory, so it's doubly little weird that it doesn't exist when you try in the other terminal. Note that I am not passing any project to GCSFileSystem, which I should have been (perhaps as a command line argument, or just hard-code for now). |
|
I would try with a new directory name each time (sorry for pollution) as fuse does not necessarily release the previous mount. I'm sure there is a proper way to do that. |
|
Your recent commit helps. I suspect that I needed to specify a project |
This fails in an interesting way for me In [1]: import netCDF4
In [2]: netCDF4.Dataset?
In [3]: ds = netCDF4.Dataset('fuse4/newmann-met-ensemble-netcdf/conus_ens_001.nc')
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 139690544854784:
#000: H5L.c line 1183 in H5Literate(): link iteration failed
major: Symbol table
minor: Iteration failed
#001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
major: Symbol table
minor: Iteration failed
#002: H5Gobj.c line 693 in H5G__obj_iterate(): can't iterate over dense links
major: Symbol table
minor: Iteration failed
#003: H5Gdense.c line 1069 in H5G__dense_iterate(): error building table of links
major: Symbol table
minor: Can't get value
#004: H5Gdense.c line 863 in H5G__dense_build_table(): error iterating over links
major: Symbol table
minor: Can't move to next iterator location
#005: H5Gdense.c line 1060 in H5G__dense_iterate(): link iteration failed
major: Symbol table
minor: Iteration failed
#006: H5B2.c line 389 in H5B2_iterate(): node iteration failed
major: B-Tree node
minor: Unable to list node
#007: H5B2int.c line 2059 in H5B2_iterate_node(): unable to protect B-tree leaf node
major: B-Tree node
minor: Unable to protect metadata
#008: H5B2int.c line 1870 in H5B2_protect_leaf(): unable to protect B-tree leaf node
major: B-Tree node
minor: Unable to protect metadata
#009: H5AC.c line 1262 in H5AC_protect(): H5C_protect() failed.
major: Object cache
minor: Unable to protect metadata
#010: H5C.c line 3574 in H5C_protect(): can't load entry
major: Object cache
minor: Unable to load metadata into cache
#011: H5C.c line 7954 in H5C_load_entry(): unable to load entry
major: Object cache
minor: Unable to load metadata into cache
#012: H5B2cache.c line 875 in H5B2__cache_leaf_load(): wrong B-tree leaf node signature
major: B-Tree node
minor: Unable to load metadata into cache
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 139690544854784:
#000: H5L.c line 1183 in H5Literate(): link iteration failed
major: Symbol table
minor: Iteration failed
#001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
major: Symbol table
minor: Iteration failed
#002: H5Gobj.c line 693 in H5G__obj_iterate(): can't iterate over dense links
major: Symbol table
minor: Iteration failed
#003: H5Gdense.c line 1069 in H5G__dense_iterate(): error building table of links
major: Symbol table
minor: Can't get value
#004: H5Gdense.c line 863 in H5G__dense_build_table(): error iterating over links
major: Symbol table
minor: Can't move to next iterator location
#005: H5Gdense.c line 1060 in H5G__dense_iterate(): link iteration failed
major: Symbol table
minor: Iteration failed
#006: H5B2.c line 389 in H5B2_iterate(): node iteration failed
major: B-Tree node
minor: Unable to list node
#007: H5B2int.c line 2059 in H5B2_iterate_node(): unable to protect B-tree leaf node
major: B-Tree node
minor: Unable to protect metadata
#008: H5B2int.c line 1870 in H5B2_protect_leaf(): unable to protect B-tree leaf node
major: B-Tree node
minor: Unable to protect metadata
#009: H5AC.c line 1262 in H5AC_protect(): H5C_protect() failed.
major: Object cache
minor: Unable to protect metadata
#010: H5C.c line 3574 in H5C_protect(): can't load entry
major: Object cache
minor: Unable to load metadata into cache
#011: H5C.c line 7954 in H5C_load_entry(): unable to load entry
major: Object cache
minor: Unable to load metadata into cache
#012: H5B2cache.c line 875 in H5B2__cache_leaf_load(): wrong B-tree leaf node signature
major: B-Tree node
minor: Unable to load metadata into cache
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-3-74b3e4fd7f67> in <module>()
----> 1 ds = netCDF4.Dataset('fuse4/newmann-met-ensemble-netcdf/conus_ens_001.nc')
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__ (netCDF4/_netCDF4.c:13992)()
OSError: NetCDF: HDF error |
The caching (which also fixes in-filling when jumping about in a file) reduced the time to open an xarray dataset from 36s to 11s. |
Yeah I was running into that last night and was surprised that my machine was still downloading data an hour later :) |
I could use eyes on the logic in |
gcsfs/core.py
Outdated
new = _fetch_range(self.gcsfs.header, self.details, | ||
self.end, end + self.blocksize) | ||
self.end = end + self.blocksize | ||
self.cache = self.cache + new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we always add self.blocksize to end. What if end - start > self.blocksize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to read past the currently requested data by a blocksize, so that further (small) reads will not need additional requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might do something like max(start + self.blocksize, end)
so that we don't do this behavior on large reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, could do, however the little extra may make little difference for longer reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
Tests pass in direct mode, not with vcr. |
Looks like we treat the directory also as a file
|
For reference, here are some times from the Anaconda Austin office
I'm also somewhat concerned about these NaNs . cc @jhamman |
The NaNs should be fine - there is a |
It turns out that I wasn't fully updated. I just updated though and am getting similar results. I plan to work on a small click-based CLI and push it up here |
Correction: it really matters which file you are looking at for the timings, because of the layout of the chunks that hdf5 accesses. Again, this suggests we probably want to cache multiple small chunks of a given file for speed. |
@mrocklin , you are welcome to push a CLI here, but thoughts on how to sensibly test this are also welcome. |
Tests now pass following #62 - but there are no tests here of FUSE itself. I will try whether starting the interface in-process, with or without threads, works. |
@mrocklin , I believe you solved how to use FUSE on TravisCI? |
@martindurant I didn't solve FUSE on TravisCI. However I did solve fuse on a docker container, which is presumably harder. I recommend running travis with |
needed only for time parsing in gcsfuse
The simple test now passes on travis, thanks @mrocklin |
I haven't taken a deep look, but a cursory glance makes me pretty happy. |
Implemented some writing in FUSE too.
@mrocklin : the code now passes around File objects explicitly, rather than using a hand-rolled cache. This might make things more efficient, but won't make anything worse, and I think is cleaner. Also, you can now write, including opening files in text mode from python (which is what the test does now). |
gcsfs/tests/test_fuse.py
Outdated
fuse.FUSE( | ||
GCSFS(TEST_BUCKET, gcs=gcs), mountpath, nothreads=False, | ||
foreground=True)) | ||
th.daemon = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach works in Python 2 or Python 3. I recommend using it in both cases.
assert 'hello' in files | ||
with open(os.path.join(mountpath, 'hello'), 'r') as f: | ||
# NB this is in TEXT mode | ||
assert f.read() == 'hello' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see this example in tests.
Is there a way to clean up the thread after this test? Not a big deal, just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to achieve that. Calling FUSE in the thread causes it to block, so join()
would never return, and I don't see how to use an interrupt signal, which is what FUSE is expecting. That is the reason for daemon
above, to make sure that the thread and the process does exit when the testing is done and the main thread ends.
Now the tests pass locally but not on travis, and I don't know why.
gcsfs/tests/test_fuse.py
Outdated
@@ -29,6 +29,7 @@ def test_fuse(token_restore): | |||
with open(os.path.join(mountpath, 'hello'), 'w') as f: | |||
# NB this is in TEXT mode | |||
f.write('hello') | |||
time.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically i would do something like the following:
start = time()
while 'hello' not in os.listdir(mountpath):
sleep(0.1)
assert time() < start + 5
However this may not work well with VCR. In general I'm surprised that VCR wouldn't result in entirely equivalent results on travis-ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good idea if this works, but I don't know the basic reason it's different. Maybe fuse on travis works somehow differently, laggy or with different caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but it still doesn't work :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that test does pass in docker linux. Should that be good enough for merger?
You maintain this library. Up to you :)
…On Mon, Jan 15, 2018 at 12:02 PM, Martin Durant ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In gcsfs/tests/test_fuse.py
<#53 (comment)>:
> @@ -29,6 +29,7 @@ def test_fuse(token_restore):
with open(os.path.join(mountpath, 'hello'), 'w') as f:
# NB this is in TEXT mode
f.write('hello')
+ time.sleep(1)
Note that test does pass in docker linux. Should that be good enough for
merger?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#53 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszFhs1pjmmTEKXdWT7Va30ZEM1JGEks5tK4S6gaJpZM4RRKyK>
.
|
Woot. Thanks @martindurant
…On Mon, Jan 15, 2018 at 12:49 PM, Martin Durant ***@***.***> wrote:
Merged #53 <#53>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#53 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AASszCngyd_7VGUYRUF1PV7EmM8hziEfks5tK4-9gaJpZM4RRKyK>
.
|
supersedes #51
Example:
In one temrinal
in another