You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of icepyx is such that we need to access different regions independently, login into Earthdata, and then download individual h5 files for each request. Even when it is possible to use a for loop to make all these requests (see https://github.com/ICESAT-2HackWeek/data-access/blob/master/ICESat-2Hackweek_tutorial_locations.ipynb), this has the difficulties of (i) having to login into Earthdata for each individual request, (ii) having to download each request in individuals h5 files.
a sketch of a solution
We (@fperez, @lheagy, @espg, @tsnow03, @mrsiegfried, @alicecima, @jonathan-taylor) think that there is a way to partially bypass (i) if we store our credentials, but even so, we have to make multiple calls to NSIDC, which is time-consuming, and it does not solve (ii). Then, the bottleneck appears to be at the NSIDC API level (@asteiker May have some ideas here?), not just in the icepyx code.
A different workflow could be something like this:
importicepyxicepyx.login(email, password)
request_list= []
forlat, lon, dat:
request_list.append(
icepyx.request(polygon(lat, lon, date))
)
# this should do some smart parsing - figuring out # which files have common data data=icepyx.request(request_list)
# first loop: metadata query # loop over and figure out which h5 files are needed# then only request needed files
Here, we download the required h5 files with one single call to NSIDC and this is implemented efficiently such that different regions could be stored in the same h5 file. This will be an important contribution to the case where we want to take a look at ATL03 data in many different localized regions in a large area without having to retrieve the full dataset for the large area.
I am aware that there are many challenges in solving this problem, but it could be a great contribution to icepyx and I am happy to help on this front.
The text was updated successfully, but these errors were encountered:
the problem
The current implementation of icepyx is such that we need to access different regions independently, login into Earthdata, and then download individual h5 files for each request. Even when it is possible to use a for loop to make all these requests (see https://github.com/ICESAT-2HackWeek/data-access/blob/master/ICESat-2Hackweek_tutorial_locations.ipynb), this has the difficulties of (i) having to login into Earthdata for each individual request, (ii) having to download each request in individuals h5 files.
a sketch of a solution
We (@fperez, @lheagy, @espg, @tsnow03, @mrsiegfried, @alicecima, @jonathan-taylor) think that there is a way to partially bypass (i) if we store our credentials, but even so, we have to make multiple calls to NSIDC, which is time-consuming, and it does not solve (ii). Then, the bottleneck appears to be at the NSIDC API level (@asteiker May have some ideas here?), not just in the icepyx code.
A different workflow could be something like this:
Here, we download the required h5 files with one single call to NSIDC and this is implemented efficiently such that different regions could be stored in the same h5 file. This will be an important contribution to the case where we want to take a look at ATL03 data in many different localized regions in a large area without having to retrieve the full dataset for the large area.
I am aware that there are many challenges in solving this problem, but it could be a great contribution to icepyx and I am happy to help on this front.
The text was updated successfully, but these errors were encountered: