Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add observation sanity check method to DataStore #1082

Merged
merged 4 commits into from Aug 3, 2017
Merged

Conversation

@lmohrmann
Copy link
Contributor

@lmohrmann lmohrmann commented Jul 3, 2017

This simple routine performs some sanity checks on the events and IRF files of the observations in the DataStore.

@cdeil cdeil self-assigned this Jul 3, 2017
@cdeil cdeil added the feature label Jul 3, 2017
@cdeil cdeil added this to the 0.7 milestone Jul 3, 2017
@cdeil
Copy link
Member

@cdeil cdeil commented Jul 3, 2017

@lmohrmann - Thanks!

Ideally, I think it would be much better if we always implement checks by putting results in dicts like in
https://github.com/gammapy/gammapy/blob/master/gammapy/irf/psf_check.py
because then those can be logged, but could also be e.g. summarised in a table or more easily used in processing pipelines.

Another suggestion I have is that it would be nice if you add a test that exercises the new code at least once to make sure it's working (and keeps working as we refactor Gammapy). There's this CTA example
https://github.com/gammapy/gammapy-extra/tree/master/test_datasets/cta_1dc
and from HESS we have datasets/hess-crab4-hd-hap-prod2 and datasets/hess-crab4-pa and if none of those works for some reason I could add a more recent one from HAP.

@lmohrmann - Let me know if you want to implement either of these suggestions here, or just merge as-is and leave those suggestions to the future.

@@ -24,6 +24,8 @@
'ObservationList',
]

format = "%(levelname)8s %(filename)s:%(lineno)s - %(funcName)s --- %(message)s"
logging.basicConfig(level=logging.INFO, format=format)

This comment has been minimized.

@cdeil

cdeil Jul 3, 2017
Member

Python libraries should never configure loggers. That makes it very hard for users to do so and confusing when import gammapy.data changes the global logging behaviour.

So if you keep the logging in check_observations, please remove those two lines here and instead put them in the script doing the processing (having it in cli scripts like hap-data-fits-export is OK).

@@ -284,6 +286,29 @@ def load_many(self, obs_ids, hdu_type=None, hdu_class=None):

return things

def check_observations(self):
"""Check all observations regarding their values in events, aeff, edisp, psf

This comment has been minimized.

@cdeil

cdeil Jul 3, 2017
Member

Say in the docstring what the method does. I.e. if you keep as-is, say that it's logging the results.

@lmohrmann
Copy link
Contributor Author

@lmohrmann lmohrmann commented Jul 4, 2017

@cdeil I tried to implement your suggestions, see the new commit. The test in test_data_store.py fails if I use the CTA 1DC data or datasets/hess-crab4-pa because it cannot find the files at the expected locations.

@@ -211,3 +211,10 @@ def test_make_mean_edisp(tmpdir):
i = np.where(rmf.data.evaluate(e_reco=Energy(40, "TeV")) != 0)[0]
i2 = np.where(rmf2.data.evaluate(e_reco=Energy(40, "TeV")) != 0)[0]
assert_equal(i, i2)


@requires_data('gammapy-extra')

This comment has been minimized.

@cdeil

cdeil Jul 19, 2017
Member

You have to add @requires_dependency('yaml') here: https://travis-ci.org/gammapy/gammapy/jobs/249970758#L1794


# Loop over all obs_ids in obs_table
for obs_id in self.obs_table['OBS_ID']:
messages = []

This comment has been minimized.

@cdeil

cdeil Jul 19, 2017
Member

Everything in the for loop should be moved to a separate method. Would it fit better on the obs class?

results[obs_id] = messages
nfail += 1

print('Checks failed for {:d} out of {:d} observations'.format(nfail, len(self.obs_table['OBS_ID'])))

This comment has been minimized.

@cdeil

cdeil Jul 19, 2017
Member

If you print here, then it's very hard for callers to suppress (e.g. from pipelines).
I would suggest you leave this print to the caller, or if you want to do it from Gammapy, make a separate methods where you print the report.

Copy link
Member

@cdeil cdeil left a comment

@lmohrmann - sorry for not being responsive. I've left a few inline comments. Let me know if they make sense or if you have any questions.

@lmohrmann
Copy link
Contributor Author

@lmohrmann lmohrmann commented Jul 20, 2017

I tried to implement your suggestions again, see the latest commit. Let me know if this looks OK now.

@cdeil
Copy link
Member

@cdeil cdeil commented Aug 3, 2017

@lmohrmann - Thanks!

@cdeil cdeil merged commit 9e8b671 into gammapy:master Aug 3, 2017
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@cdeil
Copy link
Member

@cdeil cdeil commented Aug 3, 2017

@lmohrmann - I see at https://github.com/gammapy/gammapy/pull/1082/commits that you merged master into this feature branch: b6495e4

This is not a problem in this case, we'll leave as-is.

But generally merge commits into feature branches means that overall the git commit history of the project is complex, and makes it harder to review / understand of when which changes happened. So for the future, please rebase on upstream master instead of merging it into your feature branch. We don't have git / developer docs for Gammapy, but it's the same as for Astropy (see http://astropy.readthedocs.io/en/latest/development/workflow/development_workflow.html#rebase-but-only-if-asked) or most other projects on Github.

@cdeil cdeil changed the title Add routine to DataStore that performs sanity checks on observations Add observation sanity check method to DataStore Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants