<h1>Data Preprocessing with dplyr</h1>

The Seoul bike-sharing demand historical dataset is preprocessed in this notebook. This is the core dataset used to build a predictive model later.

It contains the following columns:

- `DATE` : Year-month-day
- `RENTED BIKE COUNT`- Count of bikes rented at each hour
- `HOUR`- Hour of he day
- `TEMPERATURE` - Temperature in Celsius
- `HUMIDITY` - Unit is `%`
- `WINDSPEED` - Unit is `m/s`
- `VISIBILITY` - Multiplied by 10m
- `DEW POINT TEMERATURE` - The temperature to which the air would have to cool down in order to reach saturation, unit is Celsius
- `SOLAR RADIATION` - MJ/m2
- `RAINFALL` - mm
- `SNOWFALL` - cm
- `SEASONS` - Winter, Spring, Summer, Autumn
- `HOLIDAY` - Holiday/No holiday
- `FUNCTIONAL DAY` - NoFunc(Non Functional Hours), Fun(Functional hours)

**`tidyverse` is used to perform the following data wrangling tasks:**

- Detect and handle missing values.
- Create indicator (dummy) variables for categorical variables.
- Normalise data.

Column names will need to be standardised again (as new columns will be added).

Import `tidyverse`.

In [1]:
# install `tidyverse` if needed
#require("tidyverse")
#install.packages("tidyverse")
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.3.0     ✔ purrr   0.3.4
✔ tibble  3.0.1     ✔ dplyr   0.8.5
✔ tidyr   1.0.2     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


## Download and review the dataset

Load the bike-sharing system data from the `raw_seoul_bike_sharing.csv`.


In [2]:
bike_sharing_df <- read_csv("raw_seoul_bike_sharing.csv")

Parsed with column specification:
cols(
  DATE = col_character(),
  RENTED_BIKE_COUNT = col_double(),
  HOUR = col_double(),
  TEMPERATURE = col_double(),
  HUMIDITY = col_double(),
  WIND_SPEED = col_double(),
  VISIBILITY = col_double(),
  DEW_POINT_TEMPERATURE = col_double(),
  SOLAR_RADIATION = col_double(),
  RAINFALL = col_double(),
  SNOWFALL = col_double(),
  SEASONS = col_character(),
  HOLIDAY = col_character(),
  FUNCTIONING_DAY = col_character()
)


In [3]:
# This may be downloaded again from here...
# url <- "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/datasets/raw_seoul_bike_sharing.csv"
# NB Some column names will need to be standardised.

Review the dataset.


In [4]:
summary(bike_sharing_df)
dim(bike_sharing_df)

     DATE           RENTED_BIKE_COUNT      HOUR        TEMPERATURE    
 Length:8760        Min.   :   2.0    Min.   : 0.00   Min.   :-17.80  
 Class :character   1st Qu.: 214.0    1st Qu.: 5.75   1st Qu.:  3.40  
 Mode  :character   Median : 542.0    Median :11.50   Median : 13.70  
                    Mean   : 729.2    Mean   :11.50   Mean   : 12.87  
                    3rd Qu.:1084.0    3rd Qu.:17.25   3rd Qu.: 22.50  
                    Max.   :3556.0    Max.   :23.00   Max.   : 39.40  
                    NA's   :295                       NA's   :11      
    HUMIDITY       WIND_SPEED      VISIBILITY   DEW_POINT_TEMPERATURE
 Min.   : 0.00   Min.   :0.000   Min.   :  27   Min.   :-30.600      
 1st Qu.:42.00   1st Qu.:0.900   1st Qu.: 940   1st Qu.: -4.700      
 Median :57.00   Median :1.500   Median :1698   Median :  5.100      
 Mean   :58.23   Mean   :1.725   Mean   :1437   Mean   :  4.074      
 3rd Qu.:74.00   3rd Qu.:2.300   3rd Qu.:2000   3rd Qu.: 14.800      
 Max.   :98.

Points of note:
- Columns `RENTED_BIKE_COUNT`, `TEMPERATURE`, `HUMIDITY`, `WIND_SPEED`, `VISIBILITY`, `DEW_POINT_TEMPERATURE`, `SOLAR_RADIATION`, `RAINFALL`, `SNOWFALL` are numerical variables/columns and require normalisation.
- `RENTED_BIKE_COUNT` and `TEMPERATURE` have some missing values (NA's) that need to be handled properly.
- `SEASONS`, `HOLIDAY`, `FUNCTIONING_DAY` are categorical variables which need to be converted into indicator columns or dummy variables.
- `HOUR` is read as a numerical variable, but it is in fact a categorical variable with levels ranging from 0 to 23.

## Detect and handle missing values

 - `RENTED_BIKE_COUNT` has 295 missing values.
 - `TEMPERATURE` has about 11 missing values.

 These missing values could be caused by not being recorded, or from malfunctioning bike-sharing systems or weather sensor networks.
 The identified missing values have to be properly handled.


### Handle missing values in the `RENTED_BIKE_COUNT` column:


`RENTED_BIKE_COUNT` is the **response/dependent variable**, which will be predicted using **other predictor/independent** variables later.
Missing values are not normally allowed for the response variable, and must either dropped or imputed properly. 

The NAs constitute only around 3% (295/8760) of the column. Therefore, rows with missing values in this column can be dropped.

In [5]:
# Drop rows with `RENTED_BIKE_COUNT` column == NA
bike_sharing_df %>%
  summarize(count = sum(is.na(RENTED_BIKE_COUNT)))
  
# bike_sharing_df_dropmissing <- bike_sharing_df %>% drop_na(RENTED_BIKE_COUNT)
bike_sharing_df_dropmissing <- bike_sharing_df[!(is.na(bike_sharing_df$RENTED_BIKE_COUNT)), ]

# ALternative approach, but the deletions do not match the number of missing values.
# bike_sharing_df_dropmissing <- na.omit(bike_sharing_df$RENTED_BIKE_COUNT)

count
<int>
295


In [6]:
# Get the dataset dimensions after dropping NA rows
bike_sharing_df_dropmissing %>%
  summarize(count = sum(is.na(RENTED_BIKE_COUNT)))
dim(bike_sharing_df_dropmissing)

count
<int>
0


### Handle missing values in the `TEMPERATURE` column:

`TEMPERATURE` is an important predictor variable, potentially with a positve correlation between `TEMPERATURE` and `RENTED_BIKE_COUNT`. (People may avoid riding a bike in winter, while being more likely to rent a bike in the summer.)

Missing values are imputed for `TEMPERATURE` should be relatively easy and reliable to estimate statistically.

Review the `TEMPERATURE` column for missing values.

In [7]:
bike_sharing_df_dropmissing %>% filter(is.na(TEMPERATURE))

DATE,RENTED_BIKE_COUNT,HOUR,TEMPERATURE,HUMIDITY,WIND_SPEED,VISIBILITY,DEW_POINT_TEMPERATURE,SOLAR_RADIATION,RAINFALL,SNOWFALL,SEASONS,HOLIDAY,FUNCTIONING_DAY
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>
07/06/2018,3221,18,,57,2.7,1217,16.4,0.96,0.0,0,Summer,No Holiday,Yes
12/06/2018,1246,14,,45,2.2,1961,12.7,1.39,0.0,0,Summer,No Holiday,Yes
13/06/2018,2664,17,,57,3.3,919,16.4,0.87,0.0,0,Summer,No Holiday,Yes
17/06/2018,2330,17,,58,3.3,865,16.7,0.66,0.0,0,Summer,No Holiday,Yes
20/06/2018,2741,19,,61,2.7,1236,17.5,0.6,0.0,0,Summer,No Holiday,Yes
30/06/2018,1144,13,,87,1.7,390,23.2,0.71,3.5,0,Summer,No Holiday,Yes
05/07/2018,827,10,,75,1.1,1028,20.8,1.22,0.0,0,Summer,No Holiday,Yes
11/07/2018,634,9,,96,0.6,450,24.9,0.41,0.0,0,Summer,No Holiday,Yes
12/07/2018,593,6,,93,1.1,852,24.3,0.01,0.0,0,Summer,No Holiday,Yes
21/07/2018,347,4,,77,1.2,1203,21.2,0.0,0.0,0,Summer,No Holiday,Yes


All the missing values are found in rows where `SEASONS == Summer`, so it is reasonable to impute those missing values with the **mean summer temperature** (as opposed to the dataset average).
NA values are wxcluded when calculating the mean.

_TODO:_ Impute missing values for the TEMPERATURE column using its mean value.


In [8]:
# Calculate the summer average temperature
# bike_sharing_df_dropmissing %>% filter(SEASONS == "Summer")

# Find mean TEMPERATURE for rows where SEASONS = "Summer", excluding NA values from the calculation
mean_summer_temp <- mean(bike_sharing_df_dropmissing$TEMPERATURE[bike_sharing_df_dropmissing$SEASONS == "Summer"], na.rm = TRUE)
mean_summer_temp

In [9]:
# Impute missing values for TEMPERATURE column with summer average temperature
bike_sharing_df_imputed <- bike_sharing_df_dropmissing %>% replace_na(list(TEMPERATURE = mean_summer_temp))
bike_sharing_df_imputed %>% filter(is.na(TEMPERATURE))

DATE,RENTED_BIKE_COUNT,HOUR,TEMPERATURE,HUMIDITY,WIND_SPEED,VISIBILITY,DEW_POINT_TEMPERATURE,SOLAR_RADIATION,RAINFALL,SNOWFALL,SEASONS,HOLIDAY,FUNCTIONING_DAY
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>


Review the dataset to ensure that there are no residual NAs.

In [10]:
# Print the summary of the dataset again to make sure no missing values in all columns
bike_sharing_df_imputed %>%
  summarize(count = sum(is.na(TEMPERATURE)))
dim(bike_sharing_df_imputed)

count
<int>
0


Save the dataset to `seoul_bike_sharing.csv`.

In [11]:
# Save the dataset as `seoul_bike_sharing.csv`
head(bike_sharing_df_imputed)
write.csv(bike_sharing_df, file="seoul_bike_sharing.csv", row.names=FALSE)

DATE,RENTED_BIKE_COUNT,HOUR,TEMPERATURE,HUMIDITY,WIND_SPEED,VISIBILITY,DEW_POINT_TEMPERATURE,SOLAR_RADIATION,RAINFALL,SNOWFALL,SEASONS,HOLIDAY,FUNCTIONING_DAY
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>
01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0,0,0,Winter,No Holiday,Yes
01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,100,5,-6.4,37,1.5,2000,-18.7,0,0,0,Winter,No Holiday,Yes


## Create indicator (dummy) variables for categorical variables.

Regression models can not process categorical variables directly. These need to be converted to indicator variables.

- In the bike-sharing demand dataset, `SEASONS`, `HOLIDAY`, `FUNCTIONING_DAY` are categorical variables.
- As previously indicated, `HOUR` is read as a numerical variable but is in fact a categorical variable with levels ranging from 0 to 23.

Convert the `HOUR` column from numeric into character.

In [12]:
# Use mutate() to convert HOUR into character type
bike_sharing_df_char <- bike_sharing_df_imputed %>% mutate(HOUR = as.character(HOUR))

Review the dataframe.

In [13]:
head(bike_sharing_df_char)

DATE,RENTED_BIKE_COUNT,HOUR,TEMPERATURE,HUMIDITY,WIND_SPEED,VISIBILITY,DEW_POINT_TEMPERATURE,SOLAR_RADIATION,RAINFALL,SNOWFALL,SEASONS,HOLIDAY,FUNCTIONING_DAY
<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>
01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0,0,0,Winter,No Holiday,Yes
01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0,0,0,Winter,No Holiday,Yes
01/12/2017,100,5,-6.4,37,1.5,2000,-18.7,0,0,0,Winter,No Holiday,Yes


`SEASONS`, `HOLIDAY`, `FUNCTIONING_DAY`,  `HOUR` are character columns now and can be converted into indicator variables.

`SEASONS` has four categorical values: `Spring`, `Summer`, `Autumn`, `Winter`. 4 indicator/dummy variables will be created (`Spring`, `Summer`, `Autumn`, and `Winter`), which will have the values 0 or 1 (**one hot encoding**).

**Example:** For the value `Spring` in the `SEASONS` column, the values for the new columns `Spring`, `Summer`, `Autumn`, and `Winter` will be set to 1 for `Spring` and 0 for the others:

|Spring|Summer|Autumn|Winter|
|----- |------|------|------|
|     1|     0|     0|     0| 

In [14]:
# Check if FUNCTIONING_DAY has missing values
bike_sharing_df_char %>%
  summarize(count = sum(is.na(FUNCTIONING_DAY)))

# Find number of categorical values
# bike_sharing_df_char %>% group_by(SEASONS) %>% summarize(count = n())
bike_sharing_df_char %>% count(SEASONS)

count
<int>
0


SEASONS,n
<chr>,<int>
Autumn,1937
Spring,2160
Summer,2208
Winter,2160


`FUNCTIONING_DAY` contains only one categorical value after removing missing values. Therefore, this does not need conversion to indicator variables.

In [15]:
# Convert SEASONS, HOLIDAY, FUNCTIONING_DAY, and HOUR columns into indicator columns.
bike_sharing_df_converted <- bike_sharing_df_char %>%
    mutate(dummy = 1) %>% spread(key = SEASONS, value = dummy, fill = 0) %>%
    mutate(dummy = 1) %>% spread(key = HOLIDAY, value = dummy, fill = 0) %>%
    mutate(dummy = 1) %>% spread(key = HOUR, value = dummy, fill = 0)

# FUNCTIONING_DAY does not have categorical data, therefore not converted

Review the dataset.

In [16]:
# Review the dataset summary again to make sure the indicator columns are created properly
head(bike_sharing_df_converted)
# summary(bike_sharing_df_converted)

# Summarise columns and their types
bike_sharing_df_converted %>% 
    summarize_all(class) %>%
    gather(variable, class)

DATE,RENTED_BIKE_COUNT,TEMPERATURE,HUMIDITY,WIND_SPEED,VISIBILITY,DEW_POINT_TEMPERATURE,SOLAR_RADIATION,RAINFALL,SNOWFALL,⋯,21,22,23,3,4,5,6,7,8,9
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
01/12/2017,254,-5.2,37,2.2,2000,-17.6,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
01/12/2017,204,-5.5,38,0.8,2000,-17.6,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
01/12/2017,173,-6.0,39,1.0,2000,-17.7,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
01/12/2017,107,-6.2,40,0.9,2000,-17.6,0,0,0,⋯,0,0,0,1,0,0,0,0,0,0
01/12/2017,78,-6.0,36,2.3,2000,-18.6,0,0,0,⋯,0,0,0,0,1,0,0,0,0,0
01/12/2017,100,-6.4,37,1.5,2000,-18.7,0,0,0,⋯,0,0,0,0,0,1,0,0,0,0


variable,class
<chr>,<chr>
DATE,character
RENTED_BIKE_COUNT,numeric
TEMPERATURE,numeric
HUMIDITY,numeric
WIND_SPEED,numeric
VISIBILITY,numeric
DEW_POINT_TEMPERATURE,numeric
SOLAR_RADIATION,numeric
RAINFALL,numeric
SNOWFALL,numeric


Save the dataset to `seoul_bike_sharing_converted.csv`.

In [17]:
# Save the dataset as `seoul_bike_sharing_converted.csv`
write_csv(bike_sharing_df_converted, "seoul_bike_sharing_converted.csv")

## Normalise the dataset.

`RENTED_BIKE_COUNT`, `TEMPERATURE`, `HUMIDITY`, `WIND_SPEED`, `VISIBILITY`, `DEW_POINT_TEMPERATURE`, `SOLAR_RADIATION`, `RAINFALL`, `SNOWFALL` are numerical variables with different value units and range.\
Columns with large values may adversely influence (bias) the predictive models and degrade model accuracy. Therefore, these need to be normalised to render them into a comparable range.

Min-max normalisation is used for this project. 

**Min-max** rescales each value in a column by first subtracting the minimum value of the column from each value, and then dividing the result by the difference between the maximum and minimum values of the column. The column gets re-scaled such that the minimum becomes 0 and the maximum becomes 1.

$$x_{new} = \frac{x_{old} - x_{min}}{x_{max} - x_{min}}$$


Apply min-max normalization on `RENTED_BIKE_COUNT`, `TEMPERATURE`, `HUMIDITY`, `WIND_SPEED`, `VISIBILITY`, `DEW_POINT_TEMPERATURE`, `SOLAR_RADIATION`, `RAINFALL`, `SNOWFALL`

In [18]:
# Define min-max normalisation function
norm_minmax <- function(x) (x - min(x)) / (max(x) - min(x)) # where x can be applied to a vector

In [19]:
# Use `mutate_at()` to apply min-max normalisation on each column.

# Define column list
cols = c("RENTED_BIKE_COUNT", "TEMPERATURE", "HUMIDITY", "WIND_SPEED", "VISIBILITY", "DEW_POINT_TEMPERATURE", "SOLAR_RADIATION", "RAINFALL", "SNOWFALL")

# Apply normalisation using mutate_at
bike_sharing_df_normalised <- bike_sharing_df_converted %>% 
    mutate_at(cols, norm_minmax)

Review dataset.

In [20]:
# Review the dataset again to ensure that the numeric columns range between 0 and 1.
head(bike_sharing_df_normalised)
summary(bike_sharing_df_normalised)

DATE,RENTED_BIKE_COUNT,TEMPERATURE,HUMIDITY,WIND_SPEED,VISIBILITY,DEW_POINT_TEMPERATURE,SOLAR_RADIATION,RAINFALL,SNOWFALL,⋯,21,22,23,3,4,5,6,7,8,9
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
01/12/2017,0.07090602,0.2202797,0.377551,0.2972973,1,0.2249135,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
01/12/2017,0.05683737,0.215035,0.3877551,0.1081081,1,0.2249135,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
01/12/2017,0.0481148,0.2062937,0.3979592,0.1351351,1,0.2231834,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
01/12/2017,0.02954418,0.2027972,0.4081633,0.1216216,1,0.2249135,0,0,0,⋯,0,0,0,1,0,0,0,0,0,0
01/12/2017,0.02138436,0.2062937,0.3673469,0.3108108,1,0.2076125,0,0,0,⋯,0,0,0,0,1,0,0,0,0,0
01/12/2017,0.02757456,0.1993007,0.377551,0.2027027,1,0.2058824,0,0,0,⋯,0,0,0,0,0,1,0,0,0,0


     DATE           RENTED_BIKE_COUNT  TEMPERATURE        HUMIDITY     
 Length:8465        Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
 Class :character   1st Qu.:0.05965   1st Qu.:0.3636   1st Qu.:0.4286  
 Mode  :character   Median :0.15194   Median :0.5472   Median :0.5816  
                    Mean   :0.20460   Mean   :0.5345   Mean   :0.5933  
                    3rd Qu.:0.30445   3rd Qu.:0.7080   3rd Qu.:0.7551  
                    Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
   WIND_SPEED       VISIBILITY     DEW_POINT_TEMPERATURE SOLAR_RADIATION   
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000        Min.   :0.000000  
 1st Qu.:0.1216   1st Qu.:0.4602   1st Qu.:0.4412        1st Qu.:0.000000  
 Median :0.2027   Median :0.8429   Median :0.6107        Median :0.002841  
 Mean   :0.2332   Mean   :0.7131   Mean   :0.5977        Mean   :0.161326  
 3rd Qu.:0.3108   3rd Qu.:1.0000   3rd Qu.:0.7924        3rd Qu.:0.264205  
 Max.   :1.0000   Max.   :1.0000   Max. 

Save the dataset to `seoul_bike_sharing_converted_normalized.csv`.

In [21]:
# Save the dataset as `seoul_bike_sharing_converted_normalized.csv`
write_csv(bike_sharing_df_normalised, "seoul_bike_sharing_converted_normalized.csv")

## Standardize the column names again for the preprocessed dataset

Since new indicator variables have been added, column names need to be standardised again.\
The function standardises the column names for each of the files saved using this notebook.

In [22]:
# Dataset list
dataset_list <- c('seoul_bike_sharing.csv', 'seoul_bike_sharing_converted.csv', 'seoul_bike_sharing_converted_normalized.csv')

for (dataset_name in dataset_list){
    # Read dataset
    dataset <- read_csv(dataset_name)
    # Standardized its columns:
    # Convert all columns names to uppercase
    names(dataset) <- toupper(names(dataset))
    # Replace any white space separators by underscore, using str_replace_all function
    names(dataset) <- str_replace_all(names(dataset), " ", "_")
    # Save the dataset back
    write.csv(dataset, dataset_name, row.names=FALSE)
}

Parsed with column specification:
cols(
  DATE = col_character(),
  RENTED_BIKE_COUNT = col_double(),
  HOUR = col_double(),
  TEMPERATURE = col_double(),
  HUMIDITY = col_double(),
  WIND_SPEED = col_double(),
  VISIBILITY = col_double(),
  DEW_POINT_TEMPERATURE = col_double(),
  SOLAR_RADIATION = col_double(),
  RAINFALL = col_double(),
  SNOWFALL = col_double(),
  SEASONS = col_character(),
  HOLIDAY = col_character(),
  FUNCTIONING_DAY = col_character()
)
Parsed with column specification:
cols(
  .default = col_double(),
  DATE = col_character(),
  FUNCTIONING_DAY = col_character()
)
See spec(...) for full column specifications.
Parsed with column specification:
cols(
  .default = col_double(),
  DATE = col_character(),
  FUNCTIONING_DAY = col_character()
)
See spec(...) for full column specifications.
