# Dates

R usually treats dates as strings unless otherwise specified.

In [1]:
library(tidyverse)
library(lubridate)

reg_data <- read_csv("https://github.com/CALDISS-AAU/workshop_r-table-data/raw/master/data/bef_dream_2015.csv")

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.1.1       v purrr   0.3.2  
v tibble  2.1.1       v dplyr   0.8.0.1
v tidyr   0.8.3       v stringr 1.4.0  
v readr   1.3.1       v forcats 0.4.0  
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

Attaching package: 'lubridate'

The following object is masked from 'package:base':

    date

Parsed with column specification:
cols(
  .default = col_double(),
  PNR = col_character(),
  FOED_DAG = col_character()
)
See spec(...) for full column specifications.


In [3]:
head(reg_data)

PNR,KOEN,FOED_DAG,br_2010_01,br_2010_02,br_2010_03,br_2010_04,br_2010_05,br_2010_06,br_2010_07,...,br_2015_03,br_2015_04,br_2015_05,br_2015_06,br_2015_07,br_2015_08,br_2015_09,br_2015_10,br_2015_11,br_2015_12
5532,2,23may1942,,,,,,,,...,,,,,,,,,,
5562,2,28jun1971,,,,,,,,...,,,,,,,,,,
7589,1,21jan1955,110200.0,852010.0,851000.0,741010.0,422200.0,862100.0,889920.0,...,463500.0,910200.0,429900.0,581200.0,62000.0,,464100.0,390000.0,855300.0,771100.0
9287,1,29aug1968,,,,,,,,...,,,,,,,,,,
14523,1,08nov1957,881030.0,,853110.0,869020.0,852010.0,871020.0,873010.0,...,522300.0,431100.0,873020.0,856000.0,,931100.0,471130.0,856000.0,881020.0,461710.0
17543,1,24jun1952,869010.0,105100.0,889140.0,,,910110.0,464700.0,...,881010.0,854200.0,581410.0,522990.0,471120.0,421300.0,841100.0,,851000.0,855200.0


The data above is simulated [DREAM data](https://www.dst.dk/da/TilSalg/Forskningsservice/Data/Andre_Styrelser.aspx) about an individual's employment status.

The variable "FOED_DAG" contains the date of birth but R currently treats it as a string/character:

In [5]:
class(reg_data$FOED_DAG)

In [7]:
reg_data$FOED_DAG[1] > reg_data$FOED_DAG[2]

## Handling datetime with `lubridate`

There are base functions for handling datetime in R but the package [`lubridate`](https://lubridate.tidyverse.org/) from the tidyverse makes dealing with dates a lot simpler.

The main functions for converting are `ymd` (short for "year-month-date") and `ymd_hms` (short for "year-month-date_hours-minutes-seconds"). `lubridate` contains functions for a wide variety of datetime combinations, so one simply has to specify the order in which the datetime information is given with the function name itself:

In [11]:
test_date <- "1942-08-12"

print(test_date)
print(class(test_date))

[1] "1942-08-12"
[1] "character"


In [12]:
test_date <- ymd(test_date)

print(test_date)
print(class(test_date))

[1] "1942-08-12"
[1] "Date"


Notice that regarless of the original order, `lubridate` will change the display of the date to the format "YYYY-MM-DD".

In [13]:
test_date2 <- "31-07-1965"
test_date2 <- dmy(test_date2)

print(test_date2)
print(class(test_date2))

[1] "1965-07-31"
[1] "Date"


### Working with datetime

`lubridate` functions work on variables as well. To convert the date information, simply apply the appropriate functino matching the date format:

In [15]:
reg_data <- reg_data %>%
    mutate(date_of_birth = dmy(FOED_DAG)) %>%
    select(PNR, FOED_DAG, date_of_birth, KOEN)

head(reg_data)

PNR,FOED_DAG,date_of_birth,KOEN,br_2010_01,br_2010_02,br_2010_03,br_2010_04,br_2010_05,br_2010_06,...,br_2015_03,br_2015_04,br_2015_05,br_2015_06,br_2015_07,br_2015_08,br_2015_09,br_2015_10,br_2015_11,br_2015_12
5532,23may1942,1942-05-23,2,,,,,,,...,,,,,,,,,,
5562,28jun1971,1971-06-28,2,,,,,,,...,,,,,,,,,,
7589,21jan1955,1955-01-21,1,110200.0,852010.0,851000.0,741010.0,422200.0,862100.0,...,463500.0,910200.0,429900.0,581200.0,62000.0,,464100.0,390000.0,855300.0,771100.0
9287,29aug1968,1968-08-29,1,,,,,,,...,,,,,,,,,,
14523,08nov1957,1957-11-08,1,881030.0,,853110.0,869020.0,852010.0,871020.0,...,522300.0,431100.0,873020.0,856000.0,,931100.0,471130.0,856000.0,881020.0,461710.0
17543,24jun1952,1952-06-24,1,869010.0,105100.0,889140.0,,,910110.0,...,881010.0,854200.0,581410.0,522990.0,471120.0,421300.0,841100.0,,851000.0,855200.0


When converted to dates, one can easily extract date information from the variable:

In [16]:
print(reg_data$date_of_birth[1])
print(year(reg_data$date_of_birth[1]))
print(month(reg_data$date_of_birth[1]))
print(mday(reg_data$date_of_birth[1]))

[1] "1942-05-23"
[1] 1942
[1] 5
[1] 23
