Introduce container readers. #704

stscieisenhamer · 2015-07-13T18:51:11Z

In part thinking about issue #642 and in dealing with datasets with many FITS extensions, came up with a generic FITS loader.

Haven't done tests/docs yet, but wanted to throw this out for thoughts and possibly becoming the default for FITS files.

FITS is the first.

ChrisBeaumont · 2015-07-15T20:10:30Z

@stscieisenhamer can you add some comments here about what this change does? It looks the main difference from the current fits loader is that it loads all extensions, creating as many datasets as needed so that each dataset has the same shape.

This also seems relevant to @astrofrog's work on adding UI to let user's configure how datasets and components are organized at file load time

stscieisenhamer · 2015-07-15T20:41:38Z

Sorry, Yes I indeed need to be more verbose, in life in general.

However, @ChrisBeaumont has it all: a bulk reader that reads as many extensions as it can, groups like dimensionality into common datasets, with some sort of reasonable name guessing, though that could use some work. Having such "bulk" or, using the coinged term, container loaders, would match with the spirit of data exploration: for those situations where one does not know what the data really is, such loaders would allow a low-barrier way of allowing to learn about a given container.

This would work in conjunction with UI that allows specification of different subsets of a container to be read, once one knows what to look for and what is needed.

With HDF5 and the future ASDF formats, both of which can contain many sets of data, not all necessarily of like dimensionality, having such a class of "container loaders" seems to be advantageous, if practical.

astrofrog · 2015-08-15T17:34:59Z

@stscieisenhamer - this container idea is great. Would you have time in the coming few days to finish this up? The main things that need to be done:

Get the tests to pass on AppVeyor and Travis
Add a changelog entry

(if you don't have time, I can copy over these changes to another PR and finish it up)

Another comment I have, which we can address after this PR is merged, is that I think that we should prompt the user about whether to merge datasets or not rather than doing it automatically. The issue is that even if the shape matches, the WCS may not. For example, the WFCAM data files (for the UKIDSS survey) include four different images from different CCD chips as extensions. The shapes match, but the WCS do not. It's surprisingly difficult to check if two WCSs are exactly the same, hence why I'm suggesting that later we should make sure that we allow the user to decide whether or not to merge extensions. But this is quite a bit more work and this PR is already a big improvement over the current situation.

stscieisenhamer · 2015-08-17T14:47:52Z

@astrofrog I should be able to finish this up by this Friday, 21Aug.

I agree about the WCS issue, and just in general, the user should have some input. This actually came up in a slightly different context, described below, during the last sprint. There were two approaches:

Never merge: just make a data set with each extension.
Merge UNLESS there is an explicit WCS defined.

Option 1. is, to my sensibilities, really dumb. However, it has the advantage that people know their FITS files by their extensions, and this would make it very explicit what went where. I prefer 2. and, unless otherwise, very easily, persuaded, will attempt that approach.

The context in which we hit this occurred when there was already an existing data set with matching shape. But, in this context, I was going to actually make an issue. When datasets are merged, it appears that the WCS of the new data takes over the dataset. In this situation, if a WCS has actually already been set, it should remain set. OR, the user has the option of choosing which WCS to use, OR, the merge is just not allowed to happen if both the existing dataset and the data to be merged both have explicit WCS definitions.

astrofrog · 2015-08-17T14:56:41Z

@stscieisenhamer - I'm fine with option 2 for now. As I said, I think the 'right' thing in future will probably be to have a pop-up window that helps the user customize the merging or not of datasets. But for now I like option 2.

Remove use of __factories__, use @data_factory for all data factories instead

astrofrog · 2015-08-19T06:27:01Z

glue/core/data_factories/containers.py

+fits_container.label = "FITS full file"
+fits_container.identifier = is_fits
+__factories__.append(fits_container)
+set_default_factory('fits', fits_container)


As done in #724, please now register the data factory using the @data_factory decorator. If you want this to take precedence over any other FITS reader, you can just set the priority in that decorator to say 100 (it has to be below 1000 because that is for the dendrogram reader, which is a more specialized format and should take precedence).

FITS is the first.

The new data_factory decorator is used. Also brought in the updated code from the original development in another project.

Only if an extension has no WCS defined will the Generic Fits loader combine extensions into the same Data container. Otherwise, the user is prompted, through the standard glue framework, to combine like-shaped data. Hence the user has the option of preserving wcs information or not. The naming scheme is also changed from shape-based to extension-based. Whether this is a positive change or not will remain to be seen.

… multi_loader

stscieisenhamer · 2015-08-20T21:20:02Z

Did the check on whether a wcs exists or not. If one exists, a new Data element will be created.

I happen to have a good dataset where this can potentially blow up. There is a cube where three extensions exists for the data, error, and mask. The error and mask extensions do not have wcs, and the expected happens.

However, there are also 12 other extensions, all the same shape and all of wcs information. So, the loader makes no attempt to merge. However, this causes the standard prompting from glue about whether to merge the datasets. This, I believe, actually gets the same effect as creating a dedicated UI for this situation. Of course, the generic request make no mention of the same/different wcs information, but it is at least now consistent with any other data format read, and you do get the user input.

There is also another change that an opinion would be good on. Previously, the Data labels included the shape. Now, because different datasets may still have the same shape, I have gone to an extension-name-based labeling. For FITS users, this may (or may not) be better. Thoughts.

I will come back and check testing state later this evening and fix as need be.

astrofrog · 2015-08-20T22:09:34Z

@stscieisenhamer - given your example, I'm actually inclined to say that we should not merge datasets by default, but instead just let the default prompt come up and let the user choose. The merging will be very context-dependent, and I can also find examples of datasets where I would not want merging to happen but it would happen automatically in the current PR. At the end of the day, it's easier to merge datasets than to unmerge them.

So for now, would it be ok to simply turn off the auto-merging?

Another quick note - I noticed that you merged master into your branch, but to keep the history clean, we normally rebase the branch on master instead. Would you mind rebasing this branch against the upstream master to get rid of the merge commit?

Thanks for your work on this!

astrofrog · 2015-08-20T22:13:45Z

Actually let me make a small modification to my suggestion: if you prefer maybe you can add an option to the data factory called auto_merge that defaults to False. But if set to True, then it follows your current merging rules. Then we can see in future whether we can expose that option, but for now it would have the effect of never merging automatically, while having the benefit of keeping the code there for auto-merging.

stscieisenhamer · 2015-08-25T14:49:36Z

srysly: 2.6? grrr...

astrofrog · 2015-08-25T16:12:14Z

@stscieisenhamer - it looks like this needs to be rebased as it includes commits that are not related to this pull request (and also to get rid of the merge commit). Make sure you rebase against the glue-viz/glue repository, not your fork.

If you have any issues with doing this, I'm also happy to do the rebasing and open another pull request - just let me know.

stscieisenhamer · 2015-08-25T16:18:18Z

My apologies. I've been trying to do it, but there seems to be some serious confusion. In particular, in doing the squashing, something is meeting a horrible death. So, I'm going to take you up on your offer; a new PR seems to be the only way out.

stscieisenhamer · 2015-08-25T16:19:07Z

Otherwise, I did implement the one-to-one importing of extensions to data objects, fixed current tests, and implement a specific test for this container.

astrofrog · 2015-08-25T16:26:31Z

@stscieisenhamer - thanks!

Note that you don't need to squash when rebasing, but it looks like you may have rebased against your fork's master rather than the upstream master. But in any case, I'll open a new PR with a rebased version, and will put the commands I use here for future reference.

stscieisenhamer · 2015-08-25T16:29:09Z

maybe, but it must have happened sometime earlier than today...hence the weird state. my apologies again. my next step was to basically do the new PR...

astrofrog · 2015-08-25T16:35:15Z

@stscieisenhamer - ah sorry, I mis-understood, would you prefer to open the new PR? (if not, then I can do so)

stscieisenhamer · 2015-08-25T16:37:05Z

Sure! I think I can get that part right at least. Will do this later today.

astrofrog · 2015-08-25T16:45:09Z

@stscieisenhamer - ok, thanks. Just to let you know, I did decide to do the rebase to better understand what the issue was, and I have a cleaned up version in the following branch

https://github.com/astrofrog/glue/tree/fits-containers

If you like, we can just open a pull request from that branch (unless you would prefer to try and do it)

astrofrog · 2015-08-25T16:49:44Z

Just based on the rebase, I think what might have happened is that you merged the latest master into your branch and possibly then rebased on a version of master that wasn't the latest upstream one.

In future, I would recommend basically never merging master into your branch. Instead, always just rebase using:

git fetch upstream
git rebase -i upstream/master

astrofrog · 2015-08-25T16:51:14Z

However, I should note that the rebase now is very difficult, and it took me a while to get right, so if you want to open a new PR not based on the branch I pushed, you would be better off making a completely new branch.

stscieisenhamer · 2015-08-25T16:54:18Z

First, lets just go with the pr on your branch; I've messed up enough today.

But, good news for me: that process is exactly what I've been doing. But, bad news, I must have done something bad along the way; i'm guessing a too-quick-on-the-bash-history execution... 👎 Probably real late one night.

My apologies again. -1 on the git-fu

astrofrog · 2015-08-25T17:00:36Z

No problem! The new PR is at #732

stscieisenhamer · 2015-08-25T18:56:02Z

Now in PR #732

stscieisenhamer added 2 commits July 13, 2015 14:20

Introduce container readers.

108706a

FITS is the first.

Reset gridded to what is in master.

693209c

Merge pull request #724 from astrofrog/decorate-data-factories

e869b62

Remove use of __factories__, use @data_factory for all data factories instead

astrofrog reviewed Aug 19, 2015
View reviewed changes

stscieisenhamer added 5 commits August 20, 2015 17:07

Introduce container readers.

df2f71b

FITS is the first.

Added dragging for dynamic spectrum updating.

3c78b9e

Bring the fits containter up-to-date with plugins

6106bcf

The new data_factory decorator is used. Also brought in the updated code from the original development in another project.

Merge branch 'multi_loader' of github-stsci:stscieisenhamer/glue into…

e0bb122

… multi_loader

astrofrog and others added 2 commits August 24, 2015 21:04

Introduce container loaders, starting with FITS

3322d95

Fix fits module inclusion for tests

88c75d5

Fix return value of the fits_container data loader

18f9222

astrofrog mentioned this pull request Aug 25, 2015

FITS containers [rebased] #732

Merged

stscieisenhamer closed this Aug 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce container readers. #704

Introduce container readers. #704

stscieisenhamer commented Jul 13, 2015

ChrisBeaumont commented Jul 15, 2015

stscieisenhamer commented Jul 15, 2015

astrofrog commented Aug 15, 2015

stscieisenhamer commented Aug 17, 2015

astrofrog commented Aug 17, 2015

astrofrog Aug 19, 2015

stscieisenhamer commented Aug 20, 2015

astrofrog commented Aug 20, 2015

astrofrog commented Aug 20, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

astrofrog commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

Introduce container readers. #704

Introduce container readers. #704

Conversation

stscieisenhamer commented Jul 13, 2015

ChrisBeaumont commented Jul 15, 2015

stscieisenhamer commented Jul 15, 2015

astrofrog commented Aug 15, 2015

stscieisenhamer commented Aug 17, 2015

astrofrog commented Aug 17, 2015

astrofrog Aug 19, 2015

Choose a reason for hiding this comment

stscieisenhamer commented Aug 20, 2015

astrofrog commented Aug 20, 2015

astrofrog commented Aug 20, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

astrofrog commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015

astrofrog commented Aug 25, 2015

stscieisenhamer commented Aug 25, 2015