New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add meta attribute to maps #1291

Merged
merged 8 commits into from Feb 12, 2018

Conversation

Projects
None yet
5 participants
@JouvinLea
Contributor

JouvinLea commented Feb 7, 2018

Add meta attribut to the Map class.
For the moment I only added the possibilty to write the meta attribut.
For the read options, it's a little more complicated because for example you don't want just read the header and put it as the meta attribut since the header could contain the wcs info that is already present in the MAP object.

@woodmd

Looks good overall but we should address what we think the best scheme is for serializing meta data to FITS.

hdulist = [fits.PrimaryHDU(), self.make_hdu(**kwargs)]
hdu = self.make_hdu(**kwargs)
header = hdu.header
header.update(self.meta)

This comment has been minimized.

@woodmd

woodmd Feb 7, 2018

Member

My preferred solution for serializing meta data would be to dump the meta dictionary into a single header keyword (see also my comment to #1250):

import json
header['META'] = json.dumps(self.meta)

This has several advantages over the FITS keyword approach:

  • No potential for collisions with other header keywords.
  • No restriction on the length of keywords. Writing meta data keywords to the FITS header directly will trigger HIERARCH cards for all meta keywords longer than 8 characters.
  • Meta data keywords can be case sensitive.
  • Supports meta data with a hierarchical structure via nested dictionaries.

The only real disadvantage I can see is that you lose some readability when manually looking at the header contents (e.g. in fv).

This comment has been minimized.

@cdeil

cdeil Feb 7, 2018

Member

@woodmd - Did we ever discuss abandoning FITS completely in favour of something with proper metadata and grouping support, concretely HDF5?

Personally I think it's worth considering, or if we stick with FITS, to use some "hack" like what Matthew proposed to get nicer meta-data support and to get rid of the problem of merging on write / disentangling on write two headers, within FITS.

@JouvinLea @registerrier - Let's discuss tomorrow and get some more opinions from the people involved in CTA metadata at the Gammapy meeting before making a decision here.

This comment has been minimized.

@woodmd

woodmd Feb 8, 2018

Member

I'm a big fan of hdf5 myself so I think having some hdf5 support built into gammapy would be a great idea. However I think abandoning FITS entirely is not realistic in the short-term given the huge amount of existing astro software and tools that rely on this format. What I could imagine is supporting hdf5 as an alternative output format option (e.g. triggered by a keyword argument to a class write method). At least initially you probably want to write a generic FITS to HDF5 conversion utility so that you don't have to rewrite all of your existing FITS serialization methods. Maybe a utility like this already exists in another package?

This comment has been minimized.

@jlenain

jlenain Feb 8, 2018

Contributor

Hi @woodmd, @cdeil, maybe one could rely on the external package fits2hdf (http://fits2hdf.readthedocs.io/en/latest/) to generically convert from FITS to HDF5 at the moment? I have never given it a try myself, though, so I don't know if that would suit our needs.

This comment has been minimized.

@JouvinLea

JouvinLea Feb 8, 2018

Contributor

Agree that we have to think to maybe support other format but not in this PR...

The thing with this line header['META'] = json.dumps(self.meta) is that really to acces then it will be a mess but I agree that you can have collisions with other header keywords.

This comment has been minimized.

@woodmd

woodmd Feb 8, 2018

Member

I agree that we should defer the discussion of supporting HDF5 (but maybe someone should open an issue thread on this topic).

Regarding the meta data serialization one point I still don't understand is how you will reconstitute the meta dictionary when reading a map back from a FITS HDU. Did you plan to ingest the whole FITS header into the meta dict? How will you distinguish between FITS keywords that were part of the meta data and those that are part of the map format spec? I think this is another place where the META keyword approach has some advantages. To deserialize the meta data you would simply do:

self.meta = json.loads(header['META'], object_pairs_hook=OrderedDict)

This comment has been minimized.

@cdeil

cdeil Feb 9, 2018

Member

I put "HDF5?" on the agenda of the gammapy.maps call next week:
https://github.com/gammapy/gammapy-meetings/tree/master/2018-02-15

I think HDF5 serialisation for our maps would be nice, but for IRFs it would be even nicer (simpler to implement, more likely to be used). In both cases it would help us have a good internal data model and an I/O that can deal with more than one format, the HDF5 that is nicer and something as bad as FITS.

For http://fits2hdf.readthedocs.io/en/latest/ I'm not sure if it's a good idea. If you just want to support HDF5 to get the performance benefits, then it might be the way to go. If you want to also take advantage of better metadata and dataset grouping support in HDF5, then some automatic translation from FITS to HDF5 is not what you want. I didn't really look at it yet, whoever starts to add HDF5 support in Gammapy would have to do some research before putting something ...

I'm +1 to use the JSON META hack for now. It allows us to move on with using META in science analysis, and if someone thinks of a better solution or something emerges in CTA we can switch to that. My understanding is that also in ctapipe they don't want to live with the limitations of FITS, and just store metadata as a JSON serialised string, either in the header or as a binary blob.

if data is None:
shape = tuple([np.max(geom.npix)] + [ax.nbin for ax in geom.axes])
data = SparseArray(shape[::-1], dtype=dtype)
elif isinstance(data, np.ndarray):
data = SparseArray.from_array(data)
super(HpxSparseMap, self).__init__(geom, data)
if meta is None:

This comment has been minimized.

@woodmd

woodmd Feb 7, 2018

Member

Why are you initializing the meta dictionary here? Doesn't it already get initialized in the Map constructor?

Lea Jouvin and others added some commits Feb 8, 2018

@cdeil cdeil added the feature label Feb 9, 2018

@cdeil cdeil added this to the 0.8 milestone Feb 9, 2018

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 9, 2018

@JouvinLea and I tried to implement the suggestion with JSON META. We did some work in 33088c1 and I think write works. But read is failing and we are having difficulty implementing in. We tried adding something in HpxNDMap.from_hdu, but it's not working yet, and probably the from_hdu methods in the three concrete classes we have should be refactored anyways to be simpler (have the cls() call at the end)? and common functionality like this JSON META handling in a base class method or a helper function?

@woodmd - do you have time to continue here? Or alternatively, let us know how it should be done and we'll try to implement it.

@woodmd

This comment has been minimized.

Member

woodmd commented Feb 9, 2018

Ok I think I have a working implementation for reading meta data using a base class method as you suggested. For the moment I've left the rest of the from_hdu methods as they are. These methods are always going to be somewhat complex as they encode much of the logic for translating from the GADF format to our internal memory representation. Refactoring them would probably be better left to a separate PR.

Note that we should probably also update the GADF maps spec to include the META keyword. Are we sure that META isn't already in use by other software? Do we prefer META to METADATA?

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 9, 2018

Refactoring them would probably be better left to a separate PR.

The main thing that I found weird was that you created the map in the middle, and then do stuff to data and geom. Is it not possible / cleaner to build the geom, then the data, then the map on the last line via return cls(geom, data, meta)?

Note that we should probably also update the GADF maps spec to include the META keyword. Are we sure that META isn't already in use by other software? Do we prefer META to METADATA?

How about MAPMETA? There the probability of name collision with existing files / tools seems very low. I'm +1 to mention that we do that in the spec. Maybe by careful wording, saying that it's an option to use this, and a result of the inherent strong limitation of FITS to not have groups for data or metadata, it has a good chance of going in. (Or maybe no-one will object, not that many people follow the spec anyways).

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 9, 2018

I'm looking at 1d29d35 and don't understand how your working code differed from the one that gave us a test fail in test_map_meta_read_write earlier today. @woodmd - do you know what we did wrong?

Otherwise, this looks ready to merge to me.

My one suggestion / question would be if the make_test_map adds any value?
Can't we delete it and just have a map create call in the test?
I see that it adds two extra axes, but to test the map meta I/O, the axes don't really matter, no?

@woodmd

This comment has been minimized.

Member

woodmd commented Feb 9, 2018

The main thing that I found weird was that you created the map in the middle, and then do stuff to data and geom. Is it not possible / cleaner to build the geom, then the data, then the map on the last line via return cls(geom, data, meta)

The reason for the current structure is that the logic for mapping pixel values to the data array is contained within the map class itself so it seemed easier to instantiate a map and then fill it using the accessor methods rather than trying to duplicate the mapping logic within this method. Alternatively we might be able to use the geometry methods to fill the data array and then pass that to the map constructor. I'd need to review these methods more carefully to see how easy it would be to implement in this way.

How about MAPMETA?

I think there could be an advantage of using the same keyword for meta data everywhere in gammapy (assuming that you want to adopt the same convention for other classes) since this would allow users to more easily find the meta data when manually inspecting the file. On this basis I think META or METADATA would be a better choice.

@woodmd - do you know what we did wrong?

I think it may be because you were using json.load instead of json.loads.

Otherwise, this looks ready to merge to me.

It looks like we have a test failure under one of the python 3.6 builds. I think this occurs due to a bug in numpy 1.10 but I'm not completely sure. I'm not able to reproduce it locally in a python 3.6 environment. Is there a particular reason that you've pinned an older numpy version for this test? Should we mark this test with xfail?

Can't we delete it and just have a map create call in the test?
I see that it adds two extra axes, but to test the map meta I/O, the axes don't really matter, no?

Fine with me.

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 9, 2018

The reason for the current structure is that the logic for mapping pixel values to the data array is contained within the map class itself

What do you mean by "mapping pixel values to the data array"? I thought that would be exactly the job of geom, and the map would only hold the pixel values and be involved in pixel value computations. Although I guess the map re-exposes or even does some mapping logic. I'll familiarise myself more with this and then open an issue for discussion if I think it can be improved.


OK, let's leave META then (or change to METADATA if you prefer).


I see the fail on the numpy 1.10 build:
https://travis-ci.org/gammapy/gammapy/jobs/339635401#L3554

Weird that it only shows up now with this test, no?

Indeed we have on purpose a CI build for the last Numpy version we support, at the moment set to 1.10:

env: PYTHON_VERSION=3.6 NUMPY_VERSION=1.10 SETUP_CMD='test -V'

@woodmd - Are you sure it's an old Numpy bug, and not something related to astropy.io.fits? Is there a way to work around the issue? If no, do you know up to which Numpy version we have to bump to get this to work? Maybe xfail that test for the old Numpy with a comment, so that this can go in? I could investigate next week.


@woodmd - I'm offline now. Either you finish it up, or I'll do it in the next days.

@woodmd

This comment has been minimized.

Member

woodmd commented Feb 9, 2018

What do you mean by "mapping pixel values to the data array"? I thought that would be exactly the job of geom, and the map would only hold the pixel values and be involved in pixel value computations.

I mean mapping pixel indices to data array indices. The logic for this mapping is implemented in the geometry classes so I'm fairly sure it should be possible to refactor these methods as you suggest.

Are you sure it's an old Numpy bug, and not something related to astropy.io.fits?

Yes the same bug already appeared once before (see #1193 and #1199). At that time I fixed it by adding at the top of some of the test modules:

pytest.importorskip('numpy', '1.12.0')

The reason we see failures here is that a skip statement was needed after adding new tests in test_base.py. I've added the line to test_base.py but if we want to avoid this in the future we might bump the pinned numpy version to 1.12.0. I also realized that HpxGeom now uses a method that first appeared in 1.13.0 so I've bumped the minimum numpy version in test_hpx.py as well.

@cdeil

cdeil approved these changes Feb 11, 2018

Looks good to me.

@registerrier @JouvinLea - Ready to merge?

@JouvinLea

This comment has been minimized.

Contributor

JouvinLea commented Feb 11, 2018

@cdeil si!

@registerrier

This comment has been minimized.

Contributor

registerrier commented Feb 12, 2018

Since I am the last one to agree, I am merging.

@registerrier registerrier merged commit 7063183 into gammapy:master Feb 12, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@cdeil cdeil changed the title from Add meta maps to Add meta attribute to maps Apr 11, 2018

@cdeil cdeil removed this from the 0.8 milestone Apr 11, 2018

@cdeil cdeil added this to the 0.7 milestone Apr 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment