Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ps_create_datetime, ps_create_date? #2

Open
sebdalgarno opened this issue Jan 18, 2018 · 17 comments
Open

ps_create_datetime, ps_create_date? #2

sebdalgarno opened this issue Jan 18, 2018 · 17 comments
Assignees

Comments

@sebdalgarno
Copy link
Member

opposite of ps_separate_date/datetime

Takes a vector of column names to create POSIXct or Date, assign tz, convert tz, automatically remove columns used to create datetime/date
something like:

ps_create_datetime <- function(data, col = "DateTime", from = c("Year", "Month", "Day", 
                                                                "Hour", "Min", "Second"), 
                               remove = TRUE, tz = tz_data, tz_convert = tz_analysis) {
  
  x[col] <- with_tz(lubridate::ymd_hms(paste(paste(x[[from[1]]], x[[from[2]]], x[[from[3]]], sep = '-'),
                                        paste(x[[from[4]]], x[[from[5]]], x[[from[6]]], sep = ":")),
                                  tz = tz),
               tz = tz_convert)
  
  if(remove) x <- x[, setdiff(colnames(x), from)]
  
  x
}
@joethorley
Copy link
Member

I like my only suggestions are that it should be function(data, year = "Year", month = "Month", day = "Day", hour = "Hour", minute = "Minute", second = "Second", col = "DateTime", remove = TRUE, tz = getOption(poisix.tz, "Etc/GMT+8")) because

  1. folks may only want to change the column name of 1 of the columns
  2. the conversion can occur afterwards
  3. the default time should be a global option

An interesting idea is if for example Second = 0L then the seconds are automatically set to 0 (useful if column missing)

@sebdalgarno
Copy link
Member Author

sebdalgarno commented Jan 19, 2018

I wonder if there should be two functions for datetime.

  1. takes 6 columns (year, month, day, hour, minute, second) - ps_ymdhms_datetime
  2. takes 2 columns (date, time) - ps_dt_datetime

Seems to me that these are the two most likely scenarios

@sebdalgarno
Copy link
Member Author

This opens a whole can of worms as datetimes can come in so many formats, but if we can capture the most common...

@joethorley
Copy link
Member

An elegant solutions seems to be to just have the one constructor function based on year, month, day, hour, minute, second etc plus a generic deconstruction function that takes a single data/time object and decomposes it into Year, Month, Day or whatever makes sense. The user can then break apart and put back together as they see fit. The only trick thing is what to do about the tz. We could add this to a column as well. Thoughts?

@sebdalgarno
Copy link
Member Author

I like the idea. I also like the construct verb better than create. Could the deconstruction function just deal with character instead of date/time?...that way tz only comes in once the construction happens.

@joethorley
Copy link
Member

I think the question of how to go from a character to a date time is different to the question of how to decompose a datetime (which is useful for combining) versus how to construct a date time from year, month, day etc. The character to date time functions in lubridate are well developed so may handle this problem?

@sebdalgarno
Copy link
Member Author

sebdalgarno commented Jan 19, 2018

yes the lubridate functions are very good and no need to reinvent the wheel. If there is a character Date and Time column it probably just makes most sense to do
lubridate::ymd_hms(paste(Date, Time))

I think constructing from year, month, day, hour, minute, second is still useful though.

perhaps there could be an option in ps_separate_datetime/date to add tz column, which might be useful if planning to deconstruct and reconstruct.

@joethorley
Copy link
Member

yes I think the tz argument which could be NULL by default indicating don't create when deconstructing is the best solution. also the deconstruction function should include year = "Year" etc to allow the user to name each of the name columns or in the case of year = NULL to not construct at all.

@joethorley
Copy link
Member

also it might be worth having the arguments suffix = "" and prefix = "" which as character scalars allow the user to quickly change all the column from the default of Year, Month, Day, etc to SiteYear, SiteMonth, SiteDay simply by setting suffix = "Site"

@sebdalgarno
Copy link
Member Author

yes great ideas. Do you want me to try to take this one on? The function seems to be almost done.

@joethorley
Copy link
Member

yes please

@sebdalgarno sebdalgarno self-assigned this Jan 19, 2018
@sebdalgarno
Copy link
Member Author

what is the best way to allow both a column name or an integer?:
e.g. second = "Second" or second = 0L
This?

if(check_string(second)) second -> data[[second]]
if(check_vector(second, 1L)) second -> second
# repeat for year, month, day, hour, minute and paste

@joethorley
Copy link
Member

checkor(check_string(second), check_count(second, coerce = TRUE)) checks that second is a string (character vector of length 1) or a count (non-negative integer of length 1) or a numeric of length 1 that can be coerced to a count

@joethorley
Copy link
Member

check_colnames(data, second) checks that data has a column with the same name as the value of second and check_vector(second, length = 1L, values = c(0L,59L,NA)) checks that second is a integer of length 1 with values between 0 and 59 and possibly missing values

@sebdalgarno
Copy link
Member Author

OK that's great, but then what is best way to take a combination of integer and column names and paste into a format that lubridate::ymd_hms can read?

@joethorley
Copy link
Member

if second = 0L for example create a new column in the data frame called say ..second (after checking that ..second is not already an existing column) and then set second <- "..second" and carry on as if the user passed second as a column name

@joethorley
Copy link
Member

note check_missing_colnames(data, "..second") may be useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants