Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

London dates not parsed correctly in bike_daily_trips() #85

Closed
ghost opened this issue Aug 9, 2018 · 4 comments
Closed

London dates not parsed correctly in bike_daily_trips() #85

ghost opened this issue Aug 9, 2018 · 4 comments

Comments

@ghost
Copy link

ghost commented Aug 9, 2018

It seems that London files in the csv or Excel sheets are dated Day / Month / Year, instead of Month / Day / Year as it's seen in other cities. It seems that bike_daily_trips() expects the dates to be the latter, resulting in erroneous dates parsing, as well as a huge amount of NAs. I haven't checked other functions, but I expect the issue to be present throughout.

In case it helps, I've run the following after downloading the files I needed to bring in the CSV files, change their date format, and save back to disk:

library(tidyverse)
files_lo <- list.files(path = file.path(getwd(), "bikedata/London"), pattern = ".csv")

all_lo_data <- files_lo %>% 
	map(function(x) {
		read_csv(paste0(file.path(getwd(), "bikedata/London"), "/", x)) %>% 
			mutate_at(vars(c(`Start Date`, `End Date`)), funs(lubridate::dmy_hm(.) %>%
										format("%m/%d/%Y %H:%M:%S")))
	})

walk2(all_lo_data, files_lo, ~ write_csv(.x, path = paste0(file.path(getwd(), "bikedata/London"), "/", .y)))
@mpadge
Copy link
Member

mpadge commented Aug 10, 2018

Thanks for that - the whole package recently got restructured to do more intelligent auto-parsing of dates, but I'm now reliant on people directly finding these kinds of inconsistencies. It should be relatively straightforward to fix, so I'll get on to it asap.

@mpadge
Copy link
Member

mpadge commented Aug 15, 2018

Actually seems okay. Can you make sure you have the latest version:

packageVersion("bikedata")
# [1] ‘0.2.0.100’

All dates for all systems should be DD/MM/YY or DD/MM/YYYY, or else other idosyncratic forms, but there are no systems which use MM/DD/YY(YY). Feel free to re-open is error recurs, but if so, could you please indicate which date ranges cause the errors (because London has thousands of files by now).

@mpadge mpadge closed this as completed Aug 15, 2018
@mpadge mpadge reopened this Aug 15, 2018
@mpadge
Copy link
Member

mpadge commented Aug 15, 2018

oh sorry, just noticed you got the error from bike_daily_trips(), which I can indeed reproduce. And the pattern changes (in SQL date form) from 2016-31-12 (YYYY-DD-MM) to 2017-01-01 (YYYY-MM-DD).

@mpadge mpadge closed this as completed in 9cfc8bd Aug 15, 2018
@mpadge
Copy link
Member

mpadge commented Aug 15, 2018

Sorry, misunderstood the problem myself there for a while. You were right: the dates for London weren't being processed properly at all. That PR should now fix it, and bike_daily_trips() should be free of NAs. Thanks for digging this up and helping fix this important 🐛 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant