Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic retrieval of multiple regions #100

Open
facusapienza21 opened this issue Jul 9, 2020 · 0 comments
Open

Automatic retrieval of multiple regions #100

facusapienza21 opened this issue Jul 9, 2020 · 0 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@facusapienza21
Copy link
Contributor

facusapienza21 commented Jul 9, 2020

the problem

The current implementation of icepyx is such that we need to access different regions independently, login into Earthdata, and then download individual h5 files for each request. Even when it is possible to use a for loop to make all these requests (see https://github.com/ICESAT-2HackWeek/data-access/blob/master/ICESat-2Hackweek_tutorial_locations.ipynb), this has the difficulties of (i) having to login into Earthdata for each individual request, (ii) having to download each request in individuals h5 files.

a sketch of a solution

We (@fperez, @lheagy, @espg, @tsnow03, @mrsiegfried, @alicecima, @jonathan-taylor) think that there is a way to partially bypass (i) if we store our credentials, but even so, we have to make multiple calls to NSIDC, which is time-consuming, and it does not solve (ii). Then, the bottleneck appears to be at the NSIDC API level (@asteiker May have some ideas here?), not just in the icepyx code.

A different workflow could be something like this:

import icepyx
icepyx.login(email, password)
request_list = []
for lat, lon, dat:
    request_list.append(
        icepyx.request(polygon(lat, lon, date))
    )
# this should do some smart parsing - figuring out 
# which files have common data 
data = icepyx.request(request_list)
# first loop: metadata query 
#   loop over and figure out which h5 files are needed
# then only request needed files 

Here, we download the required h5 files with one single call to NSIDC and this is implemented efficiently such that different regions could be stored in the same h5 file. This will be an important contribution to the case where we want to take a look at ATL03 data in many different localized regions in a large area without having to retrieve the full dataset for the large area.

I am aware that there are many challenges in solving this problem, but it could be a great contribution to icepyx and I am happy to help on this front.

@JessicaS11 JessicaS11 added enhancement New feature or request help wanted Extra attention is needed labels Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants