Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrieve databundle depends on build cutout settings #853

Open
2 tasks done
martacki opened this issue Sep 5, 2023 · 8 comments
Open
2 tasks done

retrieve databundle depends on build cutout settings #853

martacki opened this issue Sep 5, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@martacki
Copy link
Collaborator

martacki commented Sep 5, 2023

Checklist

  • I am using the current main branch or the latest release. Please indicate.
  • I am running on an up-to-date pypsa-earth environment. Update via conda env update -f envs/environment.yaml.

Describe the Bug

When rule retrieve_databundle_light is executed, while build_cutout is set to False, it tries to download the file cutouts/cutout-2013-era5 which eventually fails.
I'm not sure if this is intentional, but it is very annoying and hard to spot. Build_cutout at this stage is not even executed, and the cutout is not needed.

Maybe I'm misinterpreting some intentional behavior here, but I'm sure there is a bug somewhere because retrieve_databundle_light should execute regardless of the build_cutout settings, in my opinion.

Error Message

MissingOutputException in rule retrieve_databundle_light in file */pypsa-earth/Snakefile, line 147:
Job 0 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
cutouts/cutout-2013-era5.nc
@martacki martacki added the bug Something isn't working label Sep 5, 2023
@martacki
Copy link
Collaborator Author

martacki commented Sep 5, 2023

Not sure if this links to #812

@davide-f
Copy link
Member

davide-f commented Sep 5, 2023

Hello! Thanks for posting!
What country were you testing?
May you have the complete log of the log?
Sometimes for regions outside Africa, Google drive, the only source of those files, limits the number of downloads and may cause that issue

@Emre-Yorat89
Copy link
Contributor

I suppose I have the same issue for Türkiye. The retrieve databundle fetches sandbox links. However, it does not even download cutout bundles which has google drive links. It directly gives below error. I can download cutout bundles manually therefore number of download limit should not be the reason for the error I believe. I am also not sure if it is connected to the build cutout setting in the config file.
bundles_to_be_downloaded_

@ekatef
Copy link
Collaborator

ekatef commented Sep 11, 2023

Thanks a lot for reporting, @martacki and @Emre-Yorat89!

I can reproduce the issue for Türkiye (@Emre-Yorat89 thank you so much for providing the detailed analysis of the issue!). The problem is in fact linked with loading from google drive and caused by the fact that gdd.download_file_from_google_drive() returns an empty zip file which leads to further troubles when trying to unzip it.

Not sure if it is connected with a daily quota, as in this case we should have 403 error, according to google documentation. Can it be probably the case that google has changed the behaviour but not updated the docs? 🤔

As for the effect of build_cutout, setting build_cutout: true by-passes loading the cutout, which is currently the only data type loaded from google drive instead of zenodo.

As a temporal fix it can be suggested to load the cutout manually using urls specified in configs/bundle_config.yaml

@Emre-Yorat89
Copy link
Contributor

Hello,
I have made a couple of simple experiments with the googledrivedownloader package with the below code. When I first tried it the downloaded was a corrupt zip file. After changing the sharing option from "Restricted" to "Anyone with the link" on google drive solved the issue. Hopefully this is also the case for our problem.
gdd

@ekatef
Copy link
Collaborator

ekatef commented Sep 11, 2023

Hello, I have made a couple of simple experiments with the googledrivedownloader package with the below code. When I first tried it the downloaded was a corrupt zip file. After changing the sharing option from "Restricted" to "Anyone with the link" on google drive solved the issue. Hopefully this is also the case for our problem. gdd

Thanks for testing @Emre-Yorat89! Have checked "General access" options for bundle_cutouts_northamerica and bundle_cutouts_asia, and it looks like sharing by link is on: Anyone with link corresponds to Viewer rights. Which should also allow to download file... Although, I feel that your idea leads to a right direction.

@ekatef
Copy link
Collaborator

ekatef commented Sep 13, 2023

Update after some additional testing: the reason of the troubles seems to be in fact a number of downloads. While an initial request to gdisk returns status 200 (== everything is fine), an authorised request

https://github.com/ndrplz/google-drive-downloader/blob/be1aba9e2e43b2375475f19d8214ca50a8621bd6/google_drive_downloader/google_drive_downloader.py#L58-L61

returns 429 which means exactly too many requests.

At the time being, a quick fix is to load a cutout file manually by the links provides in /configs/bundle_config.yaml

Would be probably nice to add a check of server status response and add a meaningful warning or error.

@ekatef ekatef mentioned this issue Sep 13, 2023
8 tasks
@ekatef
Copy link
Collaborator

ekatef commented Jan 6, 2024

Hello @martacki! Thank you for reporting this issue. It has been investigated in more details by #866 and fixed by #911. So, it data retrival should work properly now. Do you have any additionally comments or can we count this issue as completed? 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants