Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MapDataset FITS I/O #2264

Merged
merged 7 commits into from Jul 2, 2019
Merged

Conversation

@adonath
Copy link
Member

@adonath adonath commented Jun 28, 2019

This PR implement a single file FITS I/O for MapDataset. It includes the following changes:

  • Add MapDataset.to_hdulist() method.
  • Add MapDataset.from_hdulist() classmethod.
  • Add MapDataset.write() method.
  • Add MapDataset.read() classmethod.
  • Refactor and simplify cube/test_fit.py

The model serialisation is ignored for now. It's unclear whether this might be kept separate anyway.

@adonath adonath self-assigned this Jun 28, 2019
@adonath adonath added this to the 0.13 milestone Jun 28, 2019
Copy link
Member

@cdeil cdeil left a comment

@adonath - Thanks!

Some small suggestions inline.

Map dataset list of HDUs.
"""
# TODO: what todo about the model and background model parameters?
exclude_primary = slice(1, 3)
Copy link
Member

@cdeil cdeil Jul 1, 2019

Maybe put [1:] inline to remove the primary HDUs below? Or if you want to keep this variable, I'd suggest to put slice(1, None) instead of slice(1, 3) to make it clearer that this really is only removing the first element.

Copy link
Member Author

@adonath adonath Jul 1, 2019

Done

assert_allclose(dataset.psf.data, dataset_new.psf.data)
assert_allclose(dataset.exposure.data, dataset_new.exposure.data)
assert_allclose(dataset.mask_fit, dataset_new.mask_fit)
assert_allclose(dataset.mask_safe, dataset_new.mask_safe)
Copy link
Member

@cdeil cdeil Jul 1, 2019

Extend this test a bit? Just asserting on the data, the following things could still go wrong and you won't notice:

  • some HDU name changes and existing files become unreadable
  • somehow the energy HDUs clobber or have non-unique names and thus aren't read back OK

So my suggestion is to add more asserts: on the HDU names in the FITS HDUList, and also asserts that check if the energy axes are read back OK.

Copy link
Member Author

@adonath adonath Jul 1, 2019

Done

Copy link
Member

@cdeil cdeil Jul 1, 2019

I'm pretty sure we will get issues in the coming months and years because we keep changing the content and HDU names for our FITS serialisations. Probably we'll have to introduce versions at some point, to be able to read back old files with newer Gammapy versions.

Would be really good to establish via a test what we store at the moment, i.e. to add an assert on what [_.name for _ in hdulist] is. Then we'll be more conscious of data file content changes, because the test fails. If you don't have that, it's easy to change whatever you like and things keep working, because you don't have a stable reference file.

Copy link
Member Author

@adonath adonath Jul 1, 2019

I adde the additional asserts on the hdu names.

-------
dataset : `MapDataset`
Map dataset.
Copy link
Member

@cdeil cdeil Jul 1, 2019

Remove empty line

Copy link
Member Author

@adonath adonath Jul 1, 2019

Done

-------
dataset : `MapDataset`
Map dataset.
Copy link
Member

@cdeil cdeil Jul 1, 2019

remove empty line

Copy link
Member Author

@adonath adonath Jul 1, 2019

Done

@adonath
Copy link
Member Author

@adonath adonath commented Jul 1, 2019

Thanks for the review @cdeil! I added a few more asserts on energy axes and map geometries.

Copy link
Contributor

@registerrier registerrier left a comment

Thanks @adonath.

Having all maps into one single file is certainly appealing but might be troublesome because of the limited flexibility. Or you need to leave the user the ability to change the extension names.
Another approach is to have convenience methods such as load_psf or load_edisp on the Dataset.

A yaml summary of the MapDataset content is also an easy way to load everything at once.

I think the fit_mask should not be stored with the reduced data and IRFs. This is an analysis feature not data reduction.

hdulist.append(hdus["EDISP"])
hdulist.append(hdus["EDISP_EBOUNDS"])

if self.psf is not None:
Copy link
Contributor

@registerrier registerrier Jul 1, 2019

I would use PSF_KERNEL as extension name since there will/could be some PSFMap stored.

Copy link
Member Author

@adonath adonath Jul 1, 2019

Good point, I guess we'll allow for both (kernel as well as a map) in future.

Copy link
Member Author

@adonath adonath Jul 1, 2019

I implemented a distinction between the two cases for the PSF as well as EDISP:

  • If it's a PSFMap object it is stored as PSF, if it is a PSFKernel it is stored as PSF_KERNEL
  • If it's a EDispMap object it is stored as EDISP, if it is a EnergyDispersion object it is stored as EDISP_MATRIX

mask_safe_map = Map.from_geom(self.counts.geom, data=self.mask_safe.astype(int))
hdulist += mask_safe_map.to_hdulist(hdu="mask-safe")[exclude_primary]

if self.mask_fit is not None:
Copy link
Contributor

@registerrier registerrier Jul 1, 2019

Why would you store the mask_fit along with the data?

I can see the point of having a yaml file listing data FITS files, the model used and the fitting mask file, but I think that mixing the fit mask with the data is not a good idea according to the distinction between data reduction and data fitting.

Copy link
Member Author

@adonath adonath Jul 1, 2019

This I wasn't 100% sure either but eventually decided to add this information to the FITS file. I guess there are 2 options:

  1. Use the single FITS file serialisation only for data-reduction and only store the data associated with this.
  2. Aim for a full MapDataset serialisation in a single FITS file (maybe with a separate model file) and really serialise all the information.

I have a slight preference for the 2nd option. So try to have a full MapDataset serialisation, so that the state of an analysis can be stored / re-stored anytime. And this would include the fit mask as well. Maybe we can discuss this again.

@@ -172,6 +178,113 @@ def likelihood(self):

return stat

def to_hdulist(self):
Copy link
Contributor

@registerrier registerrier Jul 1, 2019

I think the extension names should be optional arguments here. This is the only way to get some flexibility and backward compatibility.

Copy link
Contributor

@registerrier registerrier Jul 1, 2019

How do you deal with bands HDUs? Shouldn't they have their own names? Counts, background and exposure don't have the same bands.

Copy link
Member Author

@adonath adonath Jul 1, 2019

The band HDUs are already handled correctly. They are separated by a corresponding prefix in the HDU name. Here is an example HDU list, resulting from a call to .to_hdulist():

Filename: (No file associated with this HDUList)
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()      
  1  COUNTS        1 ImageHDU        28   (100, 100, 2)   float64   
  2  COUNTS_BANDS    1 BinTableHDU     22   2R x 4C   ['I', 'E', 'E', 'E']   
  3  EXPOSURE      1 ImageHDU        28   (100, 100, 3)   float64   
  4  EXPOSURE_BANDS    1 BinTableHDU     22   3R x 4C   ['I', 'E', 'E', 'E']   
  5  BACKGROUND    1 ImageHDU        28   (100, 100, 2)   float64   
  6  BACKGROUND_BANDS    1 BinTableHDU     22   2R x 4C   ['I', 'E', 'E', 'E']   
  7  EDISP         1 BinTableHDU     31   3R x 6C   ['E', 'E', 'I', 'PI()', 'PI()', 'PE()']   
  8  EDISP_EBOUNDS    1 BinTableHDU     26   2R x 3C   ['I', 'D', 'D']   
  9  PSF           1 ImageHDU        28   (51, 51, 3)   float32   
 10  PSF_BANDS     1 BinTableHDU     22   3R x 4C   ['I', 'E', 'E', 'E']   
 11  MASK_FIT      1 ImageHDU        28   (100, 100, 2)   int64   
 12  MASK_FIT_BANDS    1 BinTableHDU     22   2R x 4C   ['I', 'E', 'E', 'E']   

I would be hesitant to introduce the flexibility to let users choose the HDU name. I'd prefer convention over configuration here, until we find a real short-coming of the current naming scheme. I agree that it's useful for backwards compatibility, but I guess we don't want to allow that HDU names are chosen freely either? Otherwise will end up with a lot of incompatible FITS file, where HDUs have to be assigned everytime by hand, because somebody changed the name from COUNTS to ON or EDISP to ENERGY_DISPERSION. I'll check how we handle this elsewhere in Gammapy and comment again.

Copy link
Member

@cdeil cdeil Jul 1, 2019

I think the extension names should be optional arguments here. This is the only way to get some flexibility and backward compatibility.

Similar to what @adonath replied, I would also not introduce that flexibility, because the more diversity we have, the harder it will be to support old files.

My suggestion would be to merge as-is (with an assert on current HDU names added to make it clear what we have), and maybe, if you want, open up a follow-up issue where we can discuss options how to possibly add versioning or a level of indirection (like a YAML file, or a dict stored as a JSON blob in the primary header) that would allow us to evolve the content and names in the future, while keeping support for older formats. (Not immediately, but after v1.0, for the next few months changes are allowed)

return hdulist

@classmethod
def from_hdulist(cls, hdulist):
Copy link
Contributor

@registerrier registerrier Jul 1, 2019

Same thing about arguments here.

@adonath adonath force-pushed the map_dataset_fits_io branch from 5bd5a9f to 540f39a Jul 1, 2019
@adonath adonath force-pushed the map_dataset_fits_io branch from 540f39a to ae8f3b5 Jul 1, 2019
Copy link
Member

@cdeil cdeil left a comment

I find test_fit.py confusing. Why did you remove the @pytest.fixture for most cases, but left it for sky_model? PyCharm shows e.g. def background(geom) as "shadows name from outer scope" also, because you have a geom function above.
Suggest to change back to fixtures, and if you can't for some reason, then call the functions make_geom, make_background etc so that then you can write geom = make_geom(...), background = make_background(...) etc in the test functions

assert "MASK_FIT" in hdulist
assert "MASK_FIT_BANDS" in hdulist
assert "MASK_SAFE" in hdulist
assert "MASK_SAFE_BANDS" in hdulist
Copy link
Member

@cdeil cdeil Jul 1, 2019

There could be other stuff in hdulist (like e.g. extra "garbage" PrimaryHDU objects) and we won't notice. Suggest to put actual = [_.name for _ in hdulist] and expected = ["COUNTS", ...] and then assert actual == expected.

Copy link
Member Author

@adonath adonath Jul 2, 2019

Done

@adonath adonath merged commit 16d43ea into gammapy:master Jul 2, 2019
3 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants