Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect domains on external links #22

Closed
rayosborn opened this issue Mar 4, 2017 · 6 comments
Closed

Incorrect domains on external links #22

rayosborn opened this issue Mar 4, 2017 · 6 comments

Comments

@rayosborn
Copy link

rayosborn commented Mar 4, 2017

We have an h5serv server running, and loading regular HDF5 files works well. However, if the file contains an external link, it cannot access the external file because the file path is not converted to a valid domain.

Here's some output when accessing a file with path mullite/mullite_300K.nxs with respect to the h5serv datapath, with the root domain name of exfac (the config file sets the file extension to be .nxs on this server):

>>> b=h5pyd.File('mullite_300K.mullite.exfac',  mode='r',  endpoint='http://some.server:5000')
>>> c=b['/entry/transform/v']

KeyErrorTraceback (most recent call last)
<ipython-input-5-610e75ab34f0> in <module>()
----> 1 c=b['/entry/transform/v']

/Users/rosborn/anaconda/envs/py27/lib/python2.7/site-packages/h5pyd/_hl/group.pyc in __getitem__(self, name)
    327             except IOError:
    328                 # unable to find external link
--> 329                 raise KeyError("Unable to open file: " + link_json['h5domain'])
    330             return f[link_json['h5path']]
    331 

KeyError: u'Unable to open file: 300K/transform.nxs' 

Presumably, h5pyd should convert the external file path to a valid domain string. In this case, the file path is relative to the parent HDF5 file - I'm not sure what a correct domain name would be if the file path was absolute.

@jreadey
Copy link
Member

jreadey commented Mar 9, 2017

I've implemented some fixes for this in h5serv - please update your h5serv repo and try it out.
My change always returns an absolute DNS-style name. Relative filenames should be ok.
It should also work with absolute filenames in the link that point to the correct location in the server data directory. Again it will return a DNS name.

There are a bunch of edge cases, but I think this should be good for common usage.

@rayosborn
Copy link
Author

It works, although it is tripped up if the external file has a different extension than the default server extension. In my example, the file mullite/mullite_300K.h5 has an external link to 300K/transform.nxs. With .h5 as the default extension, I get:

>>> import h5pyd as h5
>>> a=h5.File('mullite_300K.mullite.exfac', mode='r', endpoint='http://34.193.81.207:5000')
>>> a['/entry/transform/v']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-c6429c132039> in <module>()
----> 1 a['/entry/transform/v']

/Users/rosborn/anaconda/envs/py27/lib/python2.7/site-packages/h5pyd/_hl/group.pyc in __getitem__(self, name)
    349             except IOError:
    350                 # unable to find external link
--> 351                 raise KeyError("Unable to open file: " + link_json['h5domain'])
    352             return f[link_json['h5path']]
    353 

KeyError: u'Unable to open file: transform.nxs.300K.mullite.exfac'

So the non-default extension, .nxs is being included in the filename. If I rename the parent file to mullite/mullite_300K.nxs and restart the server with .nxs as the default extension, i.e., so it matches the extension of the linked file, then it works:

>>> a['/entry/transform/v']
<HDF5 dataset "v": shape (801, 901, 901), type "<f4">
>>> a.filename
 'mullite_300K.mullite.exfac'
>>> a['/entry/transform/v'].file
<HDF5 file "transform.300K.mullite.exfac" (mode r)>

@rayosborn
Copy link
Author

There seems to be another problem. If a group contains an externally linked dataset and I call the group's items() method, the external link does not get resolved in the returned value, even though it does get resolved when referencing the dataset directly.

>>> a=h5.File('mullite_300K.mullite.exfac', mode='r', endpoint='http://34.193.81.207:5000')
>>> a['/entry/transform'].items()
[(u'Qk', <HDF5 dataset "Qk": shape (901,), type "<f8">),
 (u'Ql', <HDF5 dataset "Ql": shape (801,), type "<f8">),
 (u'Qh', <HDF5 dataset "Qh": shape (901,), type "<f8">),
 (u'v', None)] 
>>> a['/entry/transform/v']
<HDF5 dataset "v": shape (801, 901, 901), type "<f4">

@jreadey
Copy link
Member

jreadey commented Mar 9, 2017

Ah, looks like I did the transform for the link operation, but not the links one.
I've fixed that now, update your h5serv repo.

@rayosborn
Copy link
Author

The items function now returns all the values, including the external links. Fixing this has uncovered another possible inconsistency with h5py, which I will post as another issue. Thanks for all the work.

@jreadey
Copy link
Member

jreadey commented Mar 9, 2017

Good. I'll close this one then.

@jreadey jreadey closed this as completed Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants