Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor issue with get_stats19 and multiple years #168

Closed
agila5 opened this issue Jun 4, 2020 · 3 comments · Fixed by #169
Closed

Minor issue with get_stats19 and multiple years #168

agila5 opened this issue Jun 4, 2020 · 3 comments · Fixed by #169
Assignees

Comments

@agila5
Copy link
Collaborator

agila5 commented Jun 4, 2020

Hi! I don't know precisely all the internals of stats19 and details of STATS19 data but I simply wanted to point out a minor issue with get_stats19 and multiple years.

It seems that if the selected year is between 2005 and 2014, then the R function downloads all car crashes data from 2005 to 2014:

> library(stats19)
Data provided under OGL v3.0. Cite the source and link to:
www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
> get_stats19(2005)
No files of that type found for that year.
Files identified: Stats19_Data_2005-2014.zip

http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19_Data_2005-2014.zip
Attempt downloading from: 
trying URL 'http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19_Data_2005-2014.zip'
Content type 'application/x-zip-compressed' length 108358586 bytes (103.3 MB)

The problem is that if I ask for all car crashes between 2005 and 2010 (i.e. get_stats19(2005:2010)), then the same data are being read for 6 times. Obviously this is not a real problem and the solution is extremely easy (just ask for 2005), but maybe it's worth creating a warning message in these cases and change the input years to only 2005. What do you think?

@Robinlovelace
Copy link
Member

That is an issue. It's because there is a single file for all those years. Could a solution be to pre-check the years and if there are multiple years within that range remove all but one of them (the most recent)?

@agila5
Copy link
Collaborator Author

agila5 commented Jun 4, 2020

Could a solution be to pre-check the years and if there are multiple years within that range remove all but one of them (the most recent)?

IMO yes. I can work on a PR in the next days.

@Robinlovelace
Copy link
Member

If you're willing and able that would be amazing. Thanks for reporting and (if you find time and motivation) potentially fixing this pesky edge case issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants