Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download_hemicycle.pl fails or misses files (not confirmed) #131

Closed
paulineleon opened this issue Feb 27, 2020 · 8 comments
Closed

download_hemicycle.pl fails or misses files (not confirmed) #131

paulineleon opened this issue Feb 27, 2020 · 8 comments

Comments

@paulineleon
Copy link
Contributor

I did not investigate and it may not be an issue with the script itself. But I figured better to just write this down for when (if) it happens again.

  • Two files were not downloaded although they exist http:__www.assemblee-nationale.fr_15_cri_2019-2020_20200051.asp http:__www.assemblee-nationale.fr_15_cri_2018-2019_20190274.asp (maybe they are not in the page)
  • One file http:__www.assemblee-nationale.fr_15_cri_2019-2020_20200129.asp was downloaded but contained binary instead of the expected HTML (maybe the server misbehaved)
@RouxRC
Copy link
Member

RouxRC commented Mar 1, 2020

That's weird, maybe there should be more tests within the script to ensure the content is actual html and retry the download a couple times otherwise
For the first issue I'm not sure I get the problem: you got the files but they were empty? The code does a test to check whether files have a size so I would expect such behavior not to happen, but maybe the files did weight a little something?

@paulineleon
Copy link
Contributor Author

For the first issue I'm not sure I get the problem: you got the files but they were empty?

The files were missing. Maybe they are also missing from the page, I did not check. If you don't mind I propose we leave this open for a month or two and close it if nobody ran into this again by then. What do you think?

@RouxRC
Copy link
Member

RouxRC commented Mar 1, 2020

As you like, although I don't think many other people will use by then ;)

@paulineleon
Copy link
Contributor Author

I don't think many other people will use by then ;)

No human indeed! But a gentle bot will add to a repository daily. And another bot runs consistency checks like this one and will notice if something is missing or corrupt.

@RouxRC
Copy link
Member

RouxRC commented Mar 1, 2020

sounds good then!

@RouxRC
Copy link
Member

RouxRC commented Mar 1, 2020

Ha, and since you version the compte-rendus, you might want to use the script differently: AN regularily updates these CR, so versioning it could be interesting (for instance you could delete the 50 last ones before running the script everytime). We don't have the manpower to handle these updates and we rather wait a few hours/days before loading them and only reload if something critical is pointed to us here.

@paulineleon paulineleon changed the title download_hemicycle.pl fails or misses files download_hemicycle.pl fails or misses files (to be confirmed) Mar 1, 2020
@paulineleon paulineleon changed the title download_hemicycle.pl fails or misses files (to be confirmed) download_hemicycle.pl fails or misses files (not confirmed) Mar 1, 2020
@paulineleon
Copy link
Contributor Author

Thanks for the hint 💯 . An issue was created to not forget about it.

@RouxRC
Copy link
Member

RouxRC commented May 9, 2022

obsolete since AN's redesign

@RouxRC RouxRC closed this as completed May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants