New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal event missing for some dossiers #49
Comments
interesting find! will investigate the reason and hopefully fix this asap! |
hah, yeah, those are old dossiers, probably scraped with an old version of the scraper that did not capture this data, or this data was not available at the time of scraping. normally we only scrape new docs, and only at special occasions do we scrape really everything. to fix this i need to run a full re-scrape of everything, if this is not urgent, then trigger me during the summer break of the EP and i'll trigger a full rescrape of everything. |
That's good to know -- I will get back to you in the summer break! Re. scraping new cases only, I assume this includes pending dossiers in addition to new dossiers? |
only new and unclosed dossiers, yes. |
@stef Would now be a good time to re-scrape? :) |
done. if you confirm, please close this issue. |
Thank you! I did a quick scan through the 2023-08-07
|
The problem seems to persist in recent dumps (with increasing tendency). I've looked into 2014/0285 as an example and it seems like the events gets changed ( |
if it is added/deleted repeatedly, then that means this is a problem with the webserver(s) of the european parliament, possibly some load-balancing synchronization problem. we also have the problem, that some server is misconfigured and returns french language output instead of english language output. if we can find out how to reproduce this, and indeed point at the european parliament we could raise this issue with them... |
The 'Legislative proposal published' event seems to be missing from the dumps for some dossiers.
I came across two cases so far:
https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?reference=2014/0285(COD)&l=en
https://oeil.secure.europarl.europa.eu/oeil//popups/ficheprocedure.do?reference=2003/0262(COD)&l=en
When scraping these pages manually using
scrapers.dossier.scrape()
, all events seem to be included in the resulting dict but not in theep_dosiers.json
data dump (2022-05-22).The text was updated successfully, but these errors were encountered: