Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access of data in notebooks #1784

Closed
cdeil opened this issue Sep 12, 2018 · 8 comments
Closed

Access of data in notebooks #1784

cdeil opened this issue Sep 12, 2018 · 8 comments
Labels
Milestone

Comments

@cdeil
Copy link
Contributor

cdeil commented Sep 12, 2018

@Bultako - I've started to change to relative paths to access data from notebook that I'm editing, such as https://github.com/gammapy/gammapy-extra/blob/master/notebooks/background_model.ipynb just now.

So I put this:

data_store = DataStore.from_dir("../datasets/hess-dl3-dr1")

instead of this:

data_store = DataStore.from_dir("$GAMMAPY_EXTRA/datasets/hess-dl3-dr1")

This will work with the new gammapy download solution, right?
If yes, OK if I change all notebooks to this way to access data files today?

@cdeil cdeil added the question label Sep 12, 2018
@cdeil cdeil added this to the 0.8 milestone Sep 12, 2018
@cdeil cdeil added this to To Do in DOCUMENTATION via automation Sep 12, 2018
@cdeil
Copy link
Contributor Author

cdeil commented Sep 12, 2018

I guess the alternative is this:

data_store = DataStore.from_dir("$GAMMAPY_DATA/hess-dl3-dr1")

@Bultako - Have you decided on this already, whether to introduce an env var for data access or not?

@Bultako
Copy link
Member

Bultako commented Sep 12, 2018

@cdeil
Yes, you can continue with this.

I had decided to declare the datasets required for each notebook in the notebooks.yaml file, so I could download only the datasets needed for the tutorials with gammapy download tutorials.

I have seen env variables like $GAMMACAT inside the code of some classes in gammapy.catalogs. These functions will continue to fail if the user has not declared the corresponding env variable.

@Bultako
Copy link
Member

Bultako commented Sep 12, 2018

@cdeil

I have modified notebooks.yaml in gammapy/gammapy-extra@225fe01 so we can declare also the datasets used for each notebook.

You will see that the notebook fermi_lat needs $GAMMAPY_FERMI_LAT_DATA env var to access datasets that I have not found in $GAMMAPY_EXTRA/datasets. My idea was to store all datasets needed in notebooks in $GAMMAPY_EXTRA/datasets. I think this it is the only notebook using data not present in this folder.

@cdeil
Copy link
Contributor Author

cdeil commented Sep 12, 2018

It's good to have a record which tutorials use which data.

I think we could just record "cta-1dc" instead of "cta-1dc/caldb/data/cta/1dc/bcf/South_z20_50h/irf_file.fits", i.e. work under the assumption that example datasets are fetched as a whole, not individual files from there.
A listing of files within a given dataset would anyways exist in a separate index file, no?

@cdeil
Copy link
Contributor Author

cdeil commented Sep 12, 2018

Coming back to the question what to put for now in notebooks above, the options being:

data_store = DataStore.from_dir("../datasets/hess-dl3-dr1")
data_store = DataStore.from_dir("$GAMMAPY_DATA/hess-dl3-dr1")

Actually talking about this with @adonath at lunch we realised that ../datasets/hess-dl3-dr1 has a problem and $GAMMAPY_DATA/hess-dl3-dr1 might be better.

If we maintain the notebooks in the Gammapy code repo, then probably devs want to execute and work on them there in-place, no? But then with ../datasets/hess-dl3-dr1 we would land either somewhere else inside the Gammapy code repo or even outside it, and having to put the data exactly there is inconvenient, no? I guess we could also copy notebooks back and forth during development, but maybe the env var solution is simpler overall?

@Bultako
Copy link
Member

Bultako commented Sep 12, 2018

I would prefer to keep tutorials in the Gammapy repo and also development notebooks in the Gammapy-extra repo.

The tutorials should be seen as something fixed to the doc (like the RST files), more or less fixed, where main changes occur when publishing a new stable release. If we want to modify the notebooks for the tutorials/docs we do gammapy download tutorials and work in the created local folder, copying the notebooks to the Gammapy local git repo after stripped output and avoiding access to datasets with env vars.

On the contrary, development notebooks may be highly variable, as it has been the case up to now.

@Bultako
Copy link
Member

Bultako commented Sep 12, 2018

I think we could just record "cta-1dc" instead of "cta-1dc/caldb/data/cta/1dc/bcf/South_z20_50h/irf_file.fits

Ok.
At this moment, still working with GitHub API, when declaring folders please add a trailing slash / at the end.

@cdeil
Copy link
Contributor Author

cdeil commented Sep 21, 2018

This is resolved. We are using GAMMAPY_DATA.
http://docs.gammapy.org/dev/getting-started.html#download-tutorials

Closing issue.

@cdeil cdeil closed this as completed Sep 21, 2018
DOCUMENTATION automation moved this from To Do to Done Sep 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
DOCUMENTATION
  
Done
Development

No branches or pull requests

2 participants