/
index.Rmd
293 lines (213 loc) · 12.3 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
---
title: New Package yfR
author:
- Marcelo S. Perlin
date: '2022-07-26'
slug: package-yfr
categories: []
tags:
- yfR
- yahoo-finance
- stocks
- community
- packages
- data-access
- data-extraction
- software-peer-review
package_version: 1.0.0
description: A simple tutorial for package yfR
twitterImg: blog/2022/07/26/package-yfr/manystocks-1.png
twitterAlt: Free stock price data with yfR
tweet: New package yfR by @msperlin!
output:
html_document:
keep_md: yes
---
```{r setup, include=FALSE}
# Options to have images saved in the post folder
# And to disable symbols before output
knitr::opts_chunk$set(fig.path = "", comment = "")
# knitr hook to make images output use Hugo options
knitr::knit_hooks$set(
plot = function(x, options) {
hugoopts <- options$hugoopts
paste0(
"{{<figure src=",
'"', x, '" ',
if (!is.null(hugoopts)) {
glue::glue_collapse(
glue::glue('{names(hugoopts)}="{hugoopts}"'),
sep = " "
)
},
">}}\n"
)
}
)
# knitr hook to use Hugo highlighting options
knitr::knit_hooks$set(
source = function(x, options) {
hlopts <- options$hlopts
paste0(
"```r ",
if (!is.null(hlopts)) {
paste0("{",
glue::glue_collapse(
glue::glue('{names(hlopts)}={hlopts}'),
sep = ","
), "}"
)
},
"\n", glue::glue_collapse(x, sep = "\n"), "\n```\n"
)
}
)
```
Package yfR recently passed [peer review at rOpenSci](https://github.com/ropensci/software-review/issues/523) and is all about downloading stock price data from [Yahoo Finance (YF)](https://finance.yahoo.com/). I wrote this package to solve a particular problem I had as a teacher: I needed a large volume of clean stock price data to use in my classes, either for explaining how financial markets work or for class exercises. While there are several R packages to import raw data from YF, none solved my problem.
Package yfR facilitates the importation of data, organizing it in the `tidy` format and speeding up the process using a cache system and parallel computing. yfR is a backwards-incompatible substitute of [BatchGetSymbols](https://CRAN.R-project.org/package=BatchGetSymbols), released in 2016 (see vignette [yfR and BatchGetSymbols](https://docs.ropensci.org/yfR/articles/diff-batchgetsymbols.html) for details).
# Introducing yfR
[Yahoo Finance](https://finance.yahoo.com/) provides a vast repository of stock price data around the globe. It covers a significant number of markets and assets, and is therefore used extensively in academic research and teaching. In order to import the financial data from YF, all you need is a ticker (id of a stock, e.g. "GM" for [General Motors](https://finance.yahoo.com/quote/GM?p=GM&.tsrc=fin-srch)) and a time period -- first and last date.
## Features of yfR
Package yfR distinguishes itself from other similar packages with the following features:
- Fetches daily/weekly/monthly/annual stock prices/returns from yahoo finance and outputs a dataframe (tibble) in the long format (stacked data);
- A feature called **collections** facilitates download of multiple tickers from a particular market/index. You can, for example, download data for all stocks in the SP500 index with a simple call to `yf_collection_get("SP500")`;
- A session-persistent smart cache system is available by default. This means that the data is saved locally and only missing portions are downloaded, if needed.
- All dates are compared to a benchmark index such as SP500 (^GSPC) and, whenever an individual asset does not have a sufficient number of dates, the software drops it from the output. This means you can choose to ignore tickers with a high proportion of missing dates.
- A customized function called `yf_convert_to_wide()` can transform the long dataframe into a wide format (tickers as columns), which is much used in portfolio optimization. The output is a list where each element is a different target variable (prices, returns, volumes).
- Parallel computing with package [furrr](https://furrr.futureverse.org/) is available, speeding up the data importation process.
## Available columns
The main function of the package, `yfR::yf_get`, returns a dataframe with the financial data. All price data is measured at the unit of the financial exchange. For example, price data for GM (NASDAQ/US) is measured in US dollars, while price data for PETR3.SA (B3/BR) is measured in Reais (Brazilian currency).
The returned data contains the following columns:
`ticker`: The requested tickers (ids of stocks);
`ref_date`: The reference day (this can also be year/month/week when using argument `freq_data`);
`price_open`: The opening price of the day/period;
`price_high`: The highest price of the day/period;
`price_close`: The closing/last price of the day/period;
`volume`: The financial volume of the day/period, in the unit of the exchange;
`price_adjusted`: The stock price adjusted for corporate events such as
splits, dividends and others -- this is usually what you want/need for studying
stocks as it represents the **real** financial performance of stockholders;
`ret_adjusted_prices`: The arithmetic or log return (see input `type_return`) for the adjusted stock
prices;
`ret_adjusted_prices`: The arithmetic or log return (see input `type_return`) for the closing stock
prices;
`cumret_adjusted_prices`: The accumulated arithmetic/log return for the period (starts at 100%).
# Installation
Package yfR is available in its stable version in CRAN, but you can also find the latest features and bug fixes in GitHub and rOpenSci repository. Below you can find the R commands for installation in each case.
```
# CRAN (stable)
install.packages('yfR')
# GitHub (dev version)
devtools::install_github('ropensci/yfR')
# rOpenSci
install.packages("yfR", repos = c("https://ropensci.r-universe.dev", "https://cloud.r-project.org"))
```
# Examples of usage
## The SP500 historical performance
In this example we are going to download price data for the SP500 index from 1950 to today (`r Sys.Date()`), analyze its financial performance and also visualize its prices using `ggplot2`.
```{r, message=FALSE}
library(yfR)
library(lubridate) # for date manipulations
library(dplyr) # for data manipulations
# set options for algorithm
my_ticker <- '^GSPC'
first_date <- "1950-01-01"
last_date <- Sys.Date()
# fetch data
df_yf <- yf_get(tickers = my_ticker,
first_date = first_date,
last_date = last_date)
# output is a tibble with data
glimpse(df_yf)
```
The output of yfR is a tibble (dataframe) with the stock price data. We can use it to 1) get the number of years within the data, and 2) calculate the annual financial performance of the index:
```{r}
n_years <- interval(min(df_yf$ref_date),
max(df_yf$ref_date))/years(1)
total_return <- last(df_yf$price_adjusted)/first(df_yf$price_adjusted) - 1
cat(paste0("n_years = ", n_years, "\n",
"total_return = ",total_return))
```
In `r min(df_yf$ref_date)`, the index was valued at `r dplyr::first(df_yf$price_adjusted)`. Today (`r Sys.Date()`), after roughly `r floor(n_years)` years, the value of the index is `r dplyr::last(df_yf$price_adjusted)`. The total return for the SP500, without accounting for inflation, is equivalent to an impressive `r scales::percent(total_return)`! Overall, anyone holding stocks for that long has done very well financially.
Additionally, we can also calculate performance as the compounded annual return, which is the usual figure reported when looking stocks in the long run:
```{r}
ret_comp <- (1 + total_return)^(1/n_years) - 1
cat(paste0("Comp Return = ",
scales::percent(ret_comp, accuracy = 0.01)))
```
Over the `r floor(n_years)` of existence, the SP500 index returned an annual compounded interest of `r scales::percent(ret_comp, accuracy = 0.01)`. This is quite in line with the roughly 8% per year reported in the media.
To visualize the data, we can use a log plot and see the value of the SP500 index over time:
```{r sp500-01, message = FALSE, hugoopts=list(alt="Black and white line graph showing the SP500 index value increasing over time.The x axis is time from 1950 to 2020 and the y axis is on a log scale and shows index values increasing from <30 to >3000.", caption="SP500 index value since 1950", width=600)}
library(ggplot2)
p <- ggplot(df_yf, aes(x = ref_date, y = price_adjusted)) +
geom_line() +
labs(
title = paste0("SP500 Index Value (",
year(min(df_yf$ref_date)), ' - ',
year(max(df_yf$ref_date)), ")"
),
x = "Time",
y = "Index Value",
caption = "Data from Yahoo Finance <https://finance.yahoo.com/>") +
theme_light() +
scale_y_log10()
p
```
## Performance of many stocks
In this second example, instead of using a single stock/index, we will investigate the financial performance of a set of ten stocks using `dplyr`. First, let's download the current composition of the SP500 index and select 10 random stocks.
```{r}
set.seed(20220713)
n_tickers <- 10
df_sp500 <- yf_index_composition("SP500")
rnd_tickers <- sample(df_sp500$ticker, n_tickers)
cat(paste0("The selected tickers are: ",
paste0(rnd_tickers, collapse = ", ")))
```
And now we fetch the data using `yfR::yf_get`:
```{r, message=FALSE}
df_yf <- yf_get(tickers = rnd_tickers,
first_date = '2010-01-01',
last_date = Sys.Date())
```
Out of the `r n_tickers` stocks, one was left out due to the high number of missing days. Internally, `yf_get` compares every ticker to a benchmark time series, in this case the SP500 index itself (see `yf_get`'s argument `bench_ticker`). Whenever the proportion of missing days is higher than the default case (`thresh_bad_data = 0.75`), the algorithm drops the ticker from the output. In the end, we are left with just nine stocks.
First, let's look at their accumulated return over time:
```{r manystocks, hugoopts=list(alt="Line graph showing the accumulated returns of 9 stocks on the SP500 index value. The x axis shows time running from 2010 to 2022, while the y axis shows accumulated return (from 100%) ranging from 0.1 to > 10 on a log scale. Three stocks show sharply increasing patterns, four show moderately increasing patterns and two show fluctuating horizontal trends.", caption="Accumulated Return of 9 stocks", width=600)}
library(ggplot2)
p <- ggplot(df_yf,
aes(x = ref_date,
y = cumret_adjusted_prices,
color = ticker)) +
geom_line() +
labs(
title = paste0("SP500 Index Value (",
year(min(df_yf$ref_date)), ' - ',
year(max(df_yf$ref_date)), ")"
),
x = "Time",
y = "Accumulated Return (from 100%)",
caption = "Data from Yahoo Finance <https://finance.yahoo.com/>") +
theme_light() +
scale_y_log10()
p
```
As we can see, some stocks, such as AMZN and AAPL, did much better than others. We can check this numerically by reporting their compounded return over the period:
```{r, message = FALSE}
library(dplyr)
tab_perf <- df_yf |>
group_by(ticker) |>
summarise(
n_years = interval(min(ref_date),
max(ref_date))/years(1),
total_ret = last(price_adjusted)/first(price_adjusted) - 1,
ret_comp = (1 + total_ret)^(1/n_years) - 1
)
tab_perf |>
mutate(n_years = floor(n_years),
total_ret = scales::percent(total_ret),
ret_comp = scales::percent(ret_comp)) |>
knitr::kable(caption = "Financial Performance of Several Stocks")
```
# Final thoughts
Package yfR was created to facilitate the importation and organization of YF data sets. In the examples of this post, we can see how easy it is to download the data and do some simple performance statistics. We only scratched the surface, there are many ways to analyze stock data, not just financial performance.
# Acknowledgements
Package yfR was [reviewed](https://github.com/ropensci/software-review/issues/523) by [Alexander Fischer](https://github.com/s3alfisc) and [Nic Crane](https://github.com/thisisnic), and I'm very grateful for their feedback, which improved the package significantly. I'm also grateful to [Joshua Ulrich](https://www.quantmod.com/), the maintainer of [quantmod](https://www.quantmod.com/), which wrote `quantmod::getSymbols`, the main function used by `yfR::yf_get`