# Dataframes Exercise - Stock prices
### AJ Zerouali, 23/06/14

Plan:
- Download hourly data over 2 months for 5 stocks using *yahoofinancer*.
- Create a list of dates that we want in the dataset.
- For each stock, ensure there are values for the entire list of timestamps. 

**To do (23/06/14):**
These are operations/topics I haven't treated yet:
- Using *NULL* to drop columns (Kabacoff, 3.10.2)
- Using the *subset()* function for selections (Kabacoff, 3.10.4)
- Dealing with *NA*, missing data (Kabacoff, 3.5)
- Using the *dplyr* package to manage dataframes (Kabacoff, 3.11)
- Making pivot tables
- Applying functions to dataframes
- Front-fill and Back-fill of missing data


## Downloading and saving the data

In [2]:
# Import yahoo finance
library(yahoofinancer)

Instantiate the list of tickers and then the list of objects:

In [8]:
# List of tickers
ticker_list <- c("JNJ", "AAPL", "MSFT", "CVX", "PFE")
# Init. list of yahoofinancer::Ticker objects
TickerObj_list <- list(ticker_list)

Initialize each object with corresp. symbol:

In [9]:
# Instantiate a ticker object for each entry of the list
for (i in c(1:length(ticker_list))){
    new_Ticker <- Ticker$new(symbol = ticker_list[i])
    TickerObj_list[[ticker_list[i]]] <- new_Ticker
    }


In [14]:
TickerObj_list

[[1]]
[1] "JNJ"  "AAPL" "MSFT" "CVX"  "PFE" 

$JNJ
<Ticker>
  Public:
    asset_profile: active binding
    calendar_events: active binding
    clone: function (deep = FALSE) 
    company_officers: active binding
    earnings: active binding
    earnings_history: active binding
    earnings_trend: active binding
    esg_scores: active binding
    financial_data: active binding
    fund_bond_holdings: active binding
    fund_bond_ratings: active binding
    fund_equity_holdings: active binding
    fund_holding_info: active binding
    fund_holdings: active binding
    fund_ownership: active binding
    fund_performance: active binding
    fund_profile: active binding
    fund_section_weightings: active binding
    fund_top_holdings: active binding
    get_balance_sheet: function (frequency = c("annual", "quarter"), clean_names = TRUE) 
    get_cash_flow: function (frequency = c("annual", "quarter"), clean_names = TRUE) 
    get_history: function (period = "ytd", interval = "1d", start =

Init list of dataframes, parameters, and download data for each ticker

In [22]:
# Ticker$get_history() parameters
start_date <- "2023-03-01"
end_date <- "2023-06-01"
interval <- "1h"
# Init. list of dataframes
dataframes_list <- list(ticker_list)

In [23]:
# Downnload data for each ticker
for (i in c(1:length(ticker_list))){
    df_temp <- TickerObj_list[[ticker_list[i]]]$get_history(interval = interval,
                                                            start = start_date,
                                                            end = end_date,
                                                           )
    dataframes_list[[ticker_list[i]]] <- df_temp
    }

In [36]:
# Save to CSVs
path_dir <- "./datasets/"
fname_suffix <- "_30min_2303-2306.csv"
for (i in c(1:length(ticker_list))){
    # File name
    fname = paste(path_dir, ticker_list[i], fname_suffix, sep = "")
    # Save i-th dataframe to CSV
    write.csv(x = dataframes_list[[ticker_list[i]]], file = fname)
    }

In [30]:
# Check the lengths
for (i in c(1:length(ticker_list))){
    cat("nrow(dataframes_list$", ticker_list[i],
        ") = ", nrow(dataframes_list[[ticker_list[i]]]),
        "\n"
       )
    }

nrow(dataframes_list$ JNJ ) =  449 
nrow(dataframes_list$ AAPL ) =  449 
nrow(dataframes_list$ MSFT ) =  449 
nrow(dataframes_list$ CVX ) =  449 
nrow(dataframes_list$ PFE ) =  449 


## Working on a loaded CSV

Load a dataset and see what it looks like

In [1]:
df_load_test = read.csv(file = "./datasets/PFE_30min_2303-2306.csv")

In [2]:
head(df_load_test)

Unnamed: 0_level_0,X,date,volume,high,low,open,close
Unnamed: 0_level_1,<int>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,2023-03-01 14:30:00,4054585,40.76,40.4515,40.56,40.535
2,2,2023-03-01 15:30:00,2671482,40.71,40.39,40.535,40.489
3,3,2023-03-01 16:30:00,1729663,40.6,40.44,40.4815,40.445
4,4,2023-03-01 17:30:00,1549711,40.48,40.36,40.45,40.38
5,5,2023-03-01 18:30:00,2432733,40.38,40.14,40.38,40.155
6,6,2023-03-01 19:30:00,2197362,40.29,40.135,40.15,40.2295


Do the following manipulations:
1) Rename the columns with "Date", "Vol",... , "Close".
2) Discuss the row and column selection system of R dataframes.
3) Re-order the columns to "Date", "Open", "High", "Low", "Close", "Vol".
4) Drop the "X", "Open", "High", and "Low"  columns.
5) Remove the last date "2023-06-13", restrict timestamps to "16:30".
6) Adding columns: a "Cl_Rtns" column with the close returns, "Tic" column with ticker name.
7) Concatenating dataframes along rows/columns.
8) Merging dataframes.
9) Convert the date column to *POSIXct*.
10) Convert the date column to the dataframe name column (or idx)

In [3]:
# Assign to working dataset
df_PFE <- df_load_test

### (1) Renaming columns

First, create a copy of the working dataframe. To create one with different memory address(es), instantiate a new *data.frame* (do NOT use the "<-" assignment):

In [4]:
# Copy df_PFE to df_X
df_X <- data.frame(df_PFE)

In [5]:
head(df_X)

Unnamed: 0_level_0,X,date,volume,high,low,open,close
Unnamed: 0_level_1,<int>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,2023-03-01 14:30:00,4054585,40.76,40.4515,40.56,40.535
2,2,2023-03-01 15:30:00,2671482,40.71,40.39,40.535,40.489
3,3,2023-03-01 16:30:00,1729663,40.6,40.44,40.4815,40.445
4,4,2023-03-01 17:30:00,1549711,40.48,40.36,40.45,40.38
5,5,2023-03-01 18:30:00,2432733,40.38,40.14,40.38,40.155
6,6,2023-03-01 19:30:00,2197362,40.29,40.135,40.15,40.2295


Renaming columns is done with the *colnames()* function:

In [6]:
colnames(df_X)

In [7]:
colnames(df_X) <- c("X", "Date", "Vol", "High", "Low", "Open", "Close")

In [8]:
head(df_X)

Unnamed: 0_level_0,X,Date,Vol,High,Low,Open,Close
Unnamed: 0_level_1,<int>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,2023-03-01 14:30:00,4054585,40.76,40.4515,40.56,40.535
2,2,2023-03-01 15:30:00,2671482,40.71,40.39,40.535,40.489
3,3,2023-03-01 16:30:00,1729663,40.6,40.44,40.4815,40.445
4,4,2023-03-01 17:30:00,1549711,40.48,40.36,40.45,40.38
5,5,2023-03-01 18:30:00,2432733,40.38,40.14,40.38,40.155
6,6,2023-03-01 19:30:00,2197362,40.29,40.135,40.15,40.2295


### (2) R's row/column selection system



#### Selecting rows and columns with vectors

Given a *data.frame* *df_X*, the using the *[ **row indices**, **column indices**]* operator will select the desired rows and columns. 

If we select only the *Date* column and the rows 5 to 10:

In [9]:
df_X[c(5:10), "Date"]

The class here is that of the values:

In [65]:
class(df_X[c(5:10), "Date"])

This is the same as column selection using $:

In [67]:
(df_X$Date)[1:5]

In [10]:
class((df_X$Date)[1:5])

In [11]:
class(df_X[c(1:5), "Date"])

Now, **to select multiple columns**, we can use either **integer** indices, which refer to the indices of *colnames(df_X)*:

In [12]:
df_X[c(1:5), c(2:length(df_X))]

Unnamed: 0_level_0,Date,Vol,High,Low,Open,Close
Unnamed: 0_level_1,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
1,2023-03-01 14:30:00,4054585,40.76,40.4515,40.56,40.535
2,2023-03-01 15:30:00,2671482,40.71,40.39,40.535,40.489
3,2023-03-01 16:30:00,1729663,40.6,40.44,40.4815,40.445
4,2023-03-01 17:30:00,1549711,40.48,40.36,40.45,40.38
5,2023-03-01 18:30:00,2432733,40.38,40.14,40.38,40.155


or more conveniently, we can use **vectors of column labels**:

In [13]:
tail(df_X[, c("Date", "Close", "Vol")])

Unnamed: 0_level_0,Date,Close,Vol
Unnamed: 0_level_1,<chr>,<dbl>,<int>
444,2023-05-31 15:30:00,37.6999,2566348
445,2023-05-31 16:30:00,37.555,2036227
446,2023-05-31 17:30:00,37.8356,3932752
447,2023-05-31 18:30:00,37.97,3134408
448,2023-05-31 19:30:00,38.065,6364509
449,2023-06-13 20:00:00,40.28,0


As we can see from the last cell, we can in fact order the columns however we want:

In [14]:
df_X[c(200:205), c(2,6,4,5,7,3)]

Unnamed: 0_level_0,Date,Open,High,Low,Close,Vol
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
200,2023-04-11 16:30:00,41.9262,41.99,41.845,41.925,1520230
201,2023-04-11 17:30:00,41.9229,41.955,41.86,41.87,1785593
202,2023-04-11 18:30:00,41.88,41.92,41.8,41.88,2358117
203,2023-04-11 19:30:00,41.88,41.8962,41.76,41.79,2274593
204,2023-04-12 13:30:00,41.63,41.79,41.35,41.6652,2634618
205,2023-04-12 14:30:00,41.665,41.665,41.47,41.555,1548028


**Crucial comment:** Unlike Pandas dataframes, when using the *[**rows**, **columns**]* operator to select either rows only or columns only, we **must** use the "," to indicate *all columns* or *all rows* respectively. Suppose we want all rows and only columns "Date", "Close", "Vol", then we write:

        df_X[, c("Date", "Close", "Vol")]
 
and if we want only rows 100 to 200 and all columns, we write:

        df_X[c(100:200), ]

#### Conditional selections

For example, suppose we only want the rows of *df_X* where the close price is between 38 and 40. Just like in Pandas, we will use the condition:
    
    (df_X$Close >= 38) & (df_X$Close <= 40)
**on the rows, with the & instead of the Boolean &&**.

In [15]:
nrow(df_X[(df_X$Close >= 38) & (df_X$Close <= 40),])

In [108]:
min(df_X$Close)

In [109]:
max(df_X$Close)

For the "or" operator, we still use "|", so that if we want the rows where the close price is below 37 or above 41, we use:

        df_X[(df_X$Close >= 41) | (df_X$Close <= 37)

In [16]:
nrow(df_X[(df_X$Close >= 41) | (df_X$Close <= 37),])

Similarly, suppose we want to select all the 14:30 timestamps in the "Date" column. First, the values in the date column are all strings of length 19:

In [18]:
# Example string
str_ex <- df_X[1, "Date"]
nchar(str_ex)

and the time is expressed in characters 12 to 19:

In [19]:
# Target char
tgt_char <- "14:30:00"
unlist(gregexpr(tgt_char, str_ex))

In [20]:
substr(str_ex,12,nchar(str_ex)) == tgt_char

We can thus select all rows of *df_X*, for which the last 8 characters are "14:30:00":

In [21]:
# Get const
col_vec <- c("Date", "Close", "Vol")
start_idx <- unlist(gregexpr(tgt_char, str_ex))
end_idx <- nchar(str_ex)

In [22]:
# Assign new
df_Y <- df_X[(substr(df_X$Date,start_idx,end_idx)==tgt_char), 
             col_vec]

In [23]:
head(df_Y)

Unnamed: 0_level_0,Date,Close,Vol
Unnamed: 0_level_1,<chr>,<dbl>,<int>
1,2023-03-01 14:30:00,40.535,4054585
8,2023-03-02 14:30:00,40.015,3320980
15,2023-03-03 14:30:00,41.02,5496986
22,2023-03-06 14:30:00,41.075,3449676
29,2023-03-07 14:30:00,40.46,3784700
36,2023-03-08 14:30:00,40.145,2943005


In [135]:
nrow(df_Y)

### (3) Re-ordering and dropping columns

In view of the previous section, re-ordering columns is done with column selection, and assigning the resulting dataframe to a new one. Returning to the PFE dataframe loaded in the pre-amble, we will create a dataframe containing only the date, close and volume columns.

In [25]:
head(df_PFE)

Unnamed: 0_level_0,X,date,volume,high,low,open,close
Unnamed: 0_level_1,<int>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,2023-03-01 14:30:00,4054585,40.76,40.4515,40.56,40.535
2,2,2023-03-01 15:30:00,2671482,40.71,40.39,40.535,40.489
3,3,2023-03-01 16:30:00,1729663,40.6,40.44,40.4815,40.445
4,4,2023-03-01 17:30:00,1549711,40.48,40.36,40.45,40.38
5,5,2023-03-01 18:30:00,2432733,40.38,40.14,40.38,40.155
6,6,2023-03-01 19:30:00,2197362,40.29,40.135,40.15,40.2295


In [27]:
# Columns to display
cols_vec_old <- c("date", "close", "volume")
cols_vec_new <- c("Date", "Close", "Vol")
# Drop columns 
df_X <- df_PFE[, cols_vec_old]
# Rename columns
colnames(df_X)<-cols_vec_new
# Display
head(df_X)

Unnamed: 0_level_0,Date,Close,Vol
Unnamed: 0_level_1,<chr>,<dbl>,<int>
1,2023-03-01 14:30:00,40.535,4054585
2,2023-03-01 15:30:00,40.489,2671482
3,2023-03-01 16:30:00,40.445,1729663
4,2023-03-01 17:30:00,40.38,1549711
5,2023-03-01 18:30:00,40.155,2432733
6,2023-03-01 19:30:00,40.2295,2197362


Another way of dropping columns (Kabacoff, 3.10.2) is to use a set complement:

In [34]:
# Columns to drop
drop_cols <- colnames(df_PFE) %in% c("X", "open", "high", "low")
# Assign new df woth !drop_cols
df_Y <- df_PFE[!drop_cols]

In [35]:
head(df_Y)

Unnamed: 0_level_0,date,volume,close
Unnamed: 0_level_1,<chr>,<int>,<dbl>
1,2023-03-01 14:30:00,4054585,40.535
2,2023-03-01 15:30:00,2671482,40.489
3,2023-03-01 16:30:00,1729663,40.445
4,2023-03-01 17:30:00,1549711,40.38
5,2023-03-01 18:30:00,2432733,40.155
6,2023-03-01 19:30:00,2197362,40.2295


Some how, you do not need "," for Booleans. Remember the "%in%".

### (4) Restricting rows to specific timestamps

Again, this is done with conditional selection on the rows and re-assignment. First, let's make a smaller set of timestamps, one for each date, at 16:30:

In [6]:
# Init. timestamp vec (empty vec)
timestamp_vec <- c()
# PFE timestamps
pfe_timestamps <- df_PFE$date
# tgt_str 
tgt_str <- "16:30:00"

for (i in c(1:nrow(df_PFE))){
    if (substr(pfe_timestamps[i],12,19) == tgt_str){
        timestamp_vec<-append(timestamp_vec, pfe_timestamps[i])
        }
    }


Now we can easily extract our desired dataframe:

In [56]:
df_X <- df_PFE[df_PFE$date %in% timestamp_vec, c("date", "close", "volume")]

In [58]:
tail(df_X)

Unnamed: 0_level_0,date,close,volume
Unnamed: 0_level_1,<chr>,<dbl>,<int>
410,2023-05-23 16:30:00,39.5383,10235140
417,2023-05-24 16:30:00,38.9,2437953
424,2023-05-25 16:30:00,38.01,2218692
431,2023-05-26 16:30:00,37.735,1421199
438,2023-05-30 16:30:00,36.9985,2465908
445,2023-05-31 16:30:00,37.555,2036227


### (5) Adding columns

Continue with last df_X. Add a "tic" column:

In [59]:
df_X_ <- data.frame(df_X)

In [60]:
df_X <- data.frame(df_X, tic = "PFE")
head(df_X)

Unnamed: 0_level_0,date,close,volume,tic
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>
3,2023-03-01 16:30:00,40.445,1729663,PFE
10,2023-03-02 16:30:00,40.295,1849920,PFE
17,2023-03-03 16:30:00,40.965,1685599,PFE
24,2023-03-06 16:30:00,41.185,1870325,PFE
31,2023-03-07 16:30:00,40.35,3756535,PFE
38,2023-03-08 16:30:00,40.03,2351311,PFE


Make returns:

In [70]:
# Price vec
close_vec <- array(df_X$close, dim = length(df_X$close))
# Init retns vec
retns_vec <- array(0, dim = length(df_X$close))
# Compute returns
retns_vec[2:length(close_vec)] <- (close_vec[2:length(close_vec)] - 
                                   close_vec[1:length(close_vec)-1])/close_vec[1:length(close_vec)-1]
retns_vec <- 100*retns_vec

Add the column:

In [71]:
length(retns_vec) == length(df_X$date)

In [73]:
# Add returns column
df_X <- data.frame(df_X, rtrn = retns_vec)
df_X <- df_X[, c("date", "close", "rtrn", "tic")]
# Display
head(df_X)

Unnamed: 0_level_0,date,close,rtrn,tic
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<chr>
3,2023-03-01 16:30:00,40.445,0.0,PFE
10,2023-03-02 16:30:00,40.295,-0.3708778,PFE
17,2023-03-03 16:30:00,40.965,1.6627423,PFE
24,2023-03-06 16:30:00,41.185,0.5370468,PFE
31,2023-03-07 16:30:00,40.35,-2.0274441,PFE
38,2023-03-08 16:30:00,40.03,-0.79306,PFE


### (6) Merging dataframes

Load the "JNJ" prices

In [4]:
df_JNJ <- read.csv(file = "./datasets/JNJ_30min_2303-2306.csv")

In [7]:
class(timestamp_vec)

In [8]:
df_X1 <- data.frame(df_PFE[df_PFE$date %in% timestamp_vec, c("date", "close", "volume")])
df_X2 <- data.frame(df_JNJ[df_JNJ$date %in%timestamp_vec, c("date", "close", "volume")])

In [9]:
df_X1 <- data.frame(df_X1, tic = "PFE")
df_X2 <- data.frame(df_X2, tic = "JNJ")

In [10]:
head(df_X1)

Unnamed: 0_level_0,date,close,volume,tic
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>
3,2023-03-01 16:30:00,40.445,1729663,PFE
10,2023-03-02 16:30:00,40.295,1849920,PFE
17,2023-03-03 16:30:00,40.965,1685599,PFE
24,2023-03-06 16:30:00,41.185,1870325,PFE
31,2023-03-07 16:30:00,40.35,3756535,PFE
38,2023-03-08 16:30:00,40.03,2351311,PFE


In [11]:
colnames(df_X1)==colnames(df_X2)

Merge using *merge()* gives:

In [20]:
df_Y <- merge(df_X1, df_X2, by = c("date"))

In [21]:
head(df_Y)

Unnamed: 0_level_0,date,close.x,volume.x,tic.x,close.y,volume.y,tic.y
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<dbl>,<int>,<chr>
1,2023-03-01 16:30:00,40.445,1729663,PFE,152.22,610970,JNJ
2,2023-03-02 16:30:00,40.295,1849920,PFE,151.59,672695,JNJ
3,2023-03-03 16:30:00,40.965,1685599,PFE,153.425,398069,JNJ
4,2023-03-06 16:30:00,41.185,1870325,PFE,155.4283,603910,JNJ
5,2023-03-07 16:30:00,40.35,3756535,PFE,153.65,853414,JNJ
6,2023-03-08 16:30:00,40.03,2351311,PFE,153.1182,514777,JNJ


By default *merge()* performs horizontal concatenations. If we want it to add rows, we specify that with the *all.x* and *all.y* parameters. Merging along all columns with row additions gives the following

In [22]:
df_Y1 <- merge(df_X1, df_X2, by = colnames(df_X1), all.x = TRUE, all.y = TRUE)

In [23]:
head(df_Y1)

Unnamed: 0_level_0,date,close,volume,tic
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>
1,2023-03-01 16:30:00,40.445,1729663,PFE
2,2023-03-01 16:30:00,152.22,610970,JNJ
3,2023-03-02 16:30:00,40.295,1849920,PFE
4,2023-03-02 16:30:00,151.59,672695,JNJ
5,2023-03-03 16:30:00,40.965,1685599,PFE
6,2023-03-03 16:30:00,153.425,398069,JNJ


### (7) Concatenating dataframes and sorting

Vertical concatenation is done using the *rbind()* function.

In [122]:
df_Y <- rbind(df_X1, df_X2)

In [123]:
head(df_Y)

Unnamed: 0_level_0,date,close,volume,tic
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>
3,2023-03-01 16:30:00,40.445,1729663,PFE
10,2023-03-02 16:30:00,40.295,1849920,PFE
17,2023-03-03 16:30:00,40.965,1685599,PFE
24,2023-03-06 16:30:00,41.185,1870325,PFE
31,2023-03-07 16:30:00,40.35,3756535,PFE
38,2023-03-08 16:30:00,40.03,2351311,PFE


In [124]:
tail(df_Y)

Unnamed: 0_level_0,date,close,volume,tic
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>
4101,2023-05-23 16:30:00,156.88,422191,JNJ
4171,2023-05-24 16:30:00,156.61,334051,JNJ
4241,2023-05-25 16:30:00,154.365,453712,JNJ
4311,2023-05-26 16:30:00,154.445,636958,JNJ
4381,2023-05-30 16:30:00,153.8217,464895,JNJ
4451,2023-05-31 16:30:00,154.33,356531,JNJ


You sort using the *order()* function applied to the columns. This is equivalent to Pandas' *sort_by(**col_list**)*:

In [126]:
df_Y <- df_Y[order(df_Y$date, df_Y$tic),]

In [127]:
head(df_Y)

Unnamed: 0_level_0,date,close,volume,tic
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>
32,2023-03-01 16:30:00,152.22,610970,JNJ
3,2023-03-01 16:30:00,40.445,1729663,PFE
101,2023-03-02 16:30:00,151.59,672695,JNJ
10,2023-03-02 16:30:00,40.295,1849920,PFE
171,2023-03-03 16:30:00,153.425,398069,JNJ
17,2023-03-03 16:30:00,40.965,1685599,PFE


Suppose we want to concatenate along columns, which is done using *cbind()* in R. First, let's make a JNJ dataframe with data at 16:30 only

In [26]:
# Init. timestamp vec (empty vec)
timestamp_vec <- c()
# PFE timestamps
data_timestamps <- df_PFE$date
# tgt_str 
tgt_str <- "16:30:00"

for (i in c(1:nrow(df_PFE))){
    if (substr(data_timestamps[i],12,19) == tgt_str){
        timestamp_vec<-append(timestamp_vec, data_timestamps[i])
        }
    }
# Shorten dataframe
df_X2 <- data.frame(df_JNJ[df_JNJ$date %in% timestamp_vec,])
df_X2 <- df_X2[, colnames(df_X2) != "X"]
head(df_X2)

Unnamed: 0_level_0,date,volume,high,low,open,close
Unnamed: 0_level_1,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
3,2023-03-01 16:30:00,610970,152.69,152.2,152.34,152.22
10,2023-03-02 16:30:00,672695,152.07,151.54,151.585,151.59
17,2023-03-03 16:30:00,398069,153.72,153.21,153.28,153.425
24,2023-03-06 16:30:00,603910,155.65,155.03,155.21,155.4283
31,2023-03-07 16:30:00,853414,154.23,153.35,154.19,153.65
38,2023-03-08 16:30:00,514777,153.4799,152.85,153.26,153.1182


Next, make a dataframe with JNJ close return and ticker symbol columns:

In [27]:
# Make close returns vector for df_JNJ
clpr_vec <- array(df_X2$close, dim = nrow(df_X2))
rtrn_vec <- array(0, dim = nrow(df_X2))
n <- length(clpr_vec)
rtrn_vec[2:n] = (clpr_vec[2:n]- clpr_vec[1:n-1])/clpr_vec[1:n-1]

# New data frame
df_Z2 <- data.frame(cl_rtrns = rtrn_vec, tic = "JNJ")

In [28]:
head(df_Z2)

Unnamed: 0_level_0,cl_rtrns,tic
Unnamed: 0_level_1,<dbl>,<chr>
1,0.0,JNJ
2,-0.004138779,JNJ
3,0.012105065,JNJ
4,0.013057167,JNJ
5,-0.011441321,JNJ
6,-0.003461109,JNJ


In [29]:
# Concatenate horizontally/along columns
df_Y2 <- cbind(df_X2, df_Z2)
head(df_Y2)

Unnamed: 0_level_0,date,volume,high,low,open,close,cl_rtrns,tic
Unnamed: 0_level_1,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
3,2023-03-01 16:30:00,610970,152.69,152.2,152.34,152.22,0.0,JNJ
10,2023-03-02 16:30:00,672695,152.07,151.54,151.585,151.59,-0.004138779,JNJ
17,2023-03-03 16:30:00,398069,153.72,153.21,153.28,153.425,0.012105065,JNJ
24,2023-03-06 16:30:00,603910,155.65,155.03,155.21,155.4283,0.013057167,JNJ
31,2023-03-07 16:30:00,853414,154.23,153.35,154.19,153.65,-0.011441321,JNJ
38,2023-03-08 16:30:00,514777,153.4799,152.85,153.26,153.1182,-0.003461109,JNJ


### (8) Converting the date column to *POSIX*

(Continuing from sec. 5)

R makes this part particularly intuitive:

In [74]:
df_Y <- data.frame(df_X)

In [79]:
df_Y$date <- as.Date(df_Y$date)
df_Y$date <- as.POSIXlt(df_Y$date)

In [81]:
head(df_Y)

Unnamed: 0_level_0,date,close,rtrn,tic
Unnamed: 0_level_1,<dttm>,<dbl>,<dbl>,<chr>
3,2023-03-01,40.445,0.0,PFE
10,2023-03-02,40.295,-0.3708778,PFE
17,2023-03-03,40.965,1.6627423,PFE
24,2023-03-06,41.185,0.5370468,PFE
31,2023-03-07,40.35,-2.0274441,PFE
38,2023-03-08,40.03,-0.79306,PFE


### (9) Converting the date column to row names

This will coerce the dates to characters

In [82]:
df_X <- data.frame(df_Y)

In [85]:
rownames(df_X) <- df_X$date

In [87]:
df_X <- df_X[!(colnames(df_X) %in% c("date"))]

In [88]:
head(df_X)

Unnamed: 0_level_0,close,rtrn,tic
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>
2023-03-01,40.445,0.0,PFE
2023-03-02,40.295,-0.3708778,PFE
2023-03-03,40.965,1.6627423,PFE
2023-03-06,41.185,0.5370468,PFE
2023-03-07,40.35,-2.0274441,PFE
2023-03-08,40.03,-0.79306,PFE


In [89]:
class(rownames(df_X))