This package allows users to simulate commodity futures data from two models, Schwartz and Smith two-factor model (Schwartz & Smith, 2000) and polynomial diffusion model (Filipovic & Larsson, 2016), through both GUI and R scripts. Additionally, it gives state variables and contract estimations through Kalman Filter (KF), Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
PDSim can be accessed in two ways:
-
You can use PDSim on the Shiny server. This way, you don't need to have R installed on your computer. Just go to https://peilunhe.shinyapps.io/pdsim/ and use it there.
-
Additionally, you can download and run PDSim locally, by running the following R code:
# install.packages("devtools") # uncomment if you do not have devtools installed devtools::install_github("peilun-he/PDSim", build_vignettes = TRUE) PDSim::run_app()
A tutorial of how to use this app is available by running the following code and select "PDSim app tutorial":
browseVignettes("PDSim")
For those users who do not want to modify your system, we also provide Docker installation. Please follow these steps:
-
Make sure you have Docker installed on you machine. Then download all files.
-
From your terminal, in the directory where the Dockerfile is located, run:
docker build -t pdsim .
to build the image. For Macbook M1/M2/M3 chip users, if you have a "no match for platform in manifest" error for building the image, please run this instead:
docker build -t pdsim . --platform=linux/amd64 --no-cache
-
To start a container, run
docker run -p 8787:8787 --name pdsim1 pdsim
-
Open browser and go to
localhost:8787
. The default username isrstudio
and the random password is available in terminal. -
Finally, you can use PDSim by running
PDSim::run_app()
in the opened R Studio server.
The graphical user interface (GUI) is a easy way for everyone to use PDSim package, even though you have no knowledge of programming. Just enter all necessary parameters, it will simulate data, and provide well-designed interactive visualisations. Currently, PDSim can simulate data from two models, Schwartz and Smith two-factor model (Schwartz & Smith, 2000), and polynomial diffusion model (Filipovic & Larsson, 2016). In this section, we will explain how to use GUI to simulate data. A detailed description of two models are available in Model Description.
Firstly, we establish certain global configurations, such as defining the number of observations (trading days) and contracts. Furthermore, we make a selection regarding the model from which the simulated data is generated.
For Schwartz-Smith model (Schwartz & Smith, 2000), we assume the logarithm of spot
price
If users have special needs for the standard errors, please use R script.
Finally, all the simulated data are downloadable. Please click Download prices
and Download maturities
buttons to download futures price and maturities data.
Please note, even though Schwartz and Smith (2000) models the logarithm of spot
price, all data downloaded or plotted are real price, they have been exponentiated.
The other button Generate new data
is designed for users who want to simulate
multiple realisations from the same set of parameters. Once clicking it,
PDSim will get another set of random noises, so the futures price
will change as well. This button is not compulsory if users only
need one realisations. The data will updated automatically when
you change any parameters.
The procedure for simulating data from the polynomial diffusion model (Filipovic & Larsson, 2016) closely resembles that of the Schwartz and Smith model (Schwartz & Smith, 2000). Nevertheless, it involves the specification of additional parameters.
Firstly, let's look at the difference between these two models. Both the
polynomial diffusion model (Filipovic & Larsson, 2016) and the Schwartz
and Smith model (Schwartz & Smith, 2000) assume that the spot price
All other procedures are the same as the Schwartz and Smith model (Schwartz & Smith, 2000).
- Once users enter all parameters, the data will be generated automatically. Users do NOT need to click any buttons. However, if users wish to generate more realisations under the same set of parameters, please click the 'Generate new data' button.
- The seed to generate random numbers is fixed, i.e., for the same set of parameters, users will get exactly the same data every time they use PDSim.
- Futures prices in all tables / plots are REAL prices (NOT the logarithm), no matter which model is used.
- The 95% confidence interval is shown as a grey ribbon on each plot.
- Because of the limitation of filtering methods, the standard error of the estimated futures price on the first day is extremely large. All plots of contracts estimation start from the second day.
The GUI should be suffice. However, if you want to have more control of the data simulated, you can use R script. In this section, we will discuss how to use exported functions from this package to simulate data, as well as how to use Kalman Filter (KF), Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) to estimate the hidden state variables.
Firstly, load the package:
library(PDSim)
If you don't have PDSim installed, please refer Installation.
Next, we specify the necessary global setups:
n_obs <- 100 # number of observations
n_contract <- 10 # number of contracts
dt <- 1/360 # interval between two consecutive time points,
# where 1/360 represents daily data
Next, we specify parameters. For the Schwartz-Smith model (Schwartz & Smith, 2000), there is no model coefficients.
par <- c(0.5, 0.3, 1, 1.5, 1.3, -0.3, 0.5, 0.3,
seq(from = 0.1, to = 0.01, length.out = n_contract)) # set of parameters
x0 <- c(0, 1/0.3) # initial values of state variables
n_coe <- 0 # number of model coefficient
The set of parameters are in the order of:
Then, we specify the measurement and state equations. You can use the
exported functions measurement_linear
and state_linear
directly,
or write you own functions.
# state equation
func_f <- function(xt, par) state_linear(xt, par, dt)
# measurement equation
func_g <- function(xt, par, mats) measurement_linear(xt, par, mats)
Finally, we can simulate the futures price, time to maturity, and hidden state variables:
dat <- simulate_data(par, x0, n_obs, n_contract,
func_f, func_g, n_coe, "Gaussian", 1234)
log_price <- dat$yt # logarithm of futures price
mats <- dat$mats # time to maturity
xt <- dat$xt # state variables
Please note, measurement_linear
returns the logarithm of futures price
(which is required by the Schwartz and Smith model), so the data simulated
is also the logarithm.
Additionally, we can estimate the hidden state variables through Kalman Filter (KF):
# delivery_time is unnecessary as we don't have seasonality
est <- KF(par = c(par, x0), yt = log_price, mats = mats,
delivery_time = 0, dt = dt, smoothing = FALSE,
seasonality = "None")
For the polynomial diffusion model (Filipovic & Larsson, 2016), we have to specify both parameters and model coefficients:
par <- c(0.5, 0.3, 1, 1.5, 1.3, -0.3, 0.5, 0.3,
seq(from = 0.1, to = 0.01, length.out = n_contract)) # set of parameters
x0 <- c(0, 1/0.3) # initial values of state variables
n_coe <- 6 # number of model coefficient
par_coe <- c(1, 1, 1, 1, 1, 1) # model coefficients
Currently, PDSim can deal with a polynomial with order 2, i.e., 6 model coefficients.
Then, we specify the measurement and state equations. Again,
you can use the exported functions state_linear
and
measurement_polynomial
.
# state equation
func_f <- function(xt, par) state_linear(xt, par, dt)
# measurement equation
func_g <- function(xt, par, mats) measurement_polynomial(xt, par, mats, 2, n_coe)
Finally, simulate the data:
dat <- simulate_data(c(par, par_coe), x0, n_obs, n_contract,
func_f, func_g, n_coe, "Gaussian", 1234)
price <- dat$yt # measurement_polynomial function returns the futures price
mats <- dat$mats # time to maturity
xt <- dat$xt # state variables
measurement_polynomial
returns the actual price,
rather than the logarithm.
We can also estimate the hidden state variables through Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF):
est_EKF <- EKF(c(par, par_coe, x0), price, mats, func_f, func_g, dt, n_coe, "Gaussian")
est_UKF <- UKF(c(par, par_coe, x0), price, mats, func_f, func_g, dt, n_coe, "Gaussian")
The spot price
Theoretically, there are few constraints on parameters, apart from those outlined
above, where
Under the arbitrage-free assumption, the futures price
where
and
and we assume
Moreover, we assume
Under the polynomial diffusion framework, the spot price
Now, consider any processes that follow the stochastic differential equation
Theorem 1: Let
Obviously, the hidden state vector
The basis
Then, by Theorem 1, the futures price
Therefore, we have the non-linear state-space model
In this section, we explore various tests that users can employ to validate the full functionality of PDSim. Firstly, we introduce unit tests, which are accessible within the PDSim application. Next, we present replications of Schwartz and Smith's results, followed by individual tests for each model utilizing an R script. Finally, we offer real-world data applications to demonstrate the accuracy of PDSim.
Users can undergo a unit test under the "Unit Tests" navigaion bar of PDSim
to ensure that all functionalities of PDSim are operating correctly. This
test sequence entails several key steps: initially, users define the desired
number of trajectories and relevant parameters. Subsequently, PDSim executes
simulations based on these specifications, generating simulated trajectories.
Upon simulation completion, we employ KF/EKF/UKF methodologies to estimate
trajectories alongside their 95% confidence intervals. The coverage rate,
indicating the proportion of trajectories where over 95% of points fall
within the confidence interval, is then computed. We expect the coverage rate
is above 95%, but it is affected by the measurement noise
Users receive detailed feedback under the 'Results' tab panel. Moreover, PDSim generates two plots: one illustrating the trajectory with the highest coverage rate and another depicting the trajectory with the lowest coverage rate. Additionally, a table presents the coverage rate for each trajectory.
It's important to note a few considerations: firstly, for simplicity, only a single contract is simulated, regardless of the number specified by the user. Secondly, if the coverage rate falls below 95%, users are advised to either increase the number of trajectories or adjust parameters. Lastly, users are informed that extensive simulations may lead to longer processing times; for instance, generating results for 100 trajectories typically requires around 15 seconds on a standard laptop.
In this section, we reproduce Figure 1 and Figure 4 from Schwartz and Smith's paper using our own implementation.
The figure below displays the replication of Figure 1 from Schwartz and
Smith's paper. This figure illustrates the mean simulated spot price
(
Below is a plot depicting the polynomial diffusion model. In this model,
the spot price is represented as
Below are two plots replicating Figure 4 from Schwartz and Smith's paper. Since we lack access to their original data, we simulate trajectories using their estimated parameters instead. The first plot illustrates the simulated spot price alongside the estimated spot price, both within the 95% confidence interval of estimation. The second plot displays the estimated long-term component, also within the 95% confidence interval.
Below are two plots presenting the estimated spot price and long-term component for the polynomial diffusion model. Each point falls within the 95% confidence interval, providing a comprehensive visualization of the model's estimations.
Users can utilise the following codes snippets to evaluate the performance of simulation and estimation for the Schwartz and Smith model:
library(ggplot2)
n_obs <- 100 # number of observations
n_contract <- 10 # number of contracts
dt <- 1/360 # interval between two consecutive time points,
# where 1/360 represents daily data
seed <- 1234 # seed for random number
# In the order of: kappa, gamma, mu, sigma_chi, sigma_xi,
# rho, lambda_chi, lambda_xi, measurement errors
par <- c(1.5, 1.3, 1, 1.5, 1.3, -0.3, 0.5, 0.3,
seq(from = 0.01, to = 0.001, length.out = n_contract)) # set of parameters
x0 <- c(0, 1/0.3) # initial values of state variables
n_coe <- 0 # number of model coefficient
# state equation
func_f <- function(xt, par) state_linear(xt, par, dt)
# measurement equation
func_g <- function(xt, par, mats) measurement_linear(xt, par, mats)
sim <- simulate_data(par, x0, n_obs, n_contract,
func_f, func_g, n_coe, "Gaussian", seed)
log_price <- sim$yt # logarithm of futures price
mats <- sim$mats # time to maturity
xt <- sim$xt # state variables
# delivery_time is unnecessary as we don't have seasonality
est <- KF(par = c(par, x0), yt = log_price, mats = mats,
delivery_time = 0, dt = dt, smoothing = FALSE,
seasonality = "None")
yt_hat <- data.frame(exp(func_g(t(est$xt_filter), par, mats)$y))
# rmse should be:
# 0.1979, 0.1502, 0.1198, 0.0799, 0.0526
# 0.0422, 0.0283, 0.0195, 0.0102, 0.0031
rmse <- sqrt( colMeans((exp(sim$yt) - yt_hat)^2) )
round(rmse, 4)
The code should execute without encountering any errors, and the resulting RMSE should match the specified values precisely. Moreover, users have the option to examine the plot of simulated and estimated (1st available) contracts using these codes:
cov_y <- est$cov_y # covariance matrix
contract <- 1 # 1st available contract
CI_lower <- qlnorm(0.025,
meanlog = log(yt_hat[, contract]),
sdlog = sqrt(cov_y[contract, contract, ]))
CI_upper <- qlnorm(0.975,
meanlog = log(yt_hat[, contract]),
sdlog = sqrt(cov_y[contract, contract, ]))
trunc <- 2 # truncated the data at the second observation
colors <- c("Simulated" = "black", "Estimated" = "red")
ggplot(mapping = aes(x = trunc: n_obs)) +
geom_line(aes(y = exp(log_price[trunc: n_obs, contract]),
color = "Simulated")) +
geom_line(aes(y = yt_hat[trunc: n_obs, contract], color = "Estimated")) +
geom_ribbon(aes(ymin = CI_lower[trunc: n_obs],
ymax = CI_upper[trunc: n_obs]),
alpha = 0.2) +
labs(x = "Dates", y = "Futures prices (SS)", color = "",
title = "Simulated vs estimated futures price
from Schwartz and Smith model") +
scale_color_manual(values = colors)
Users should expect to generate the following plot, which mirrors the "Simulated vs Estimated Contract" plot found within the application. In the plot, the black curve denotes the simulated futures price, while the red curve denotes the estimated futures price. The grey ribbon visually encapsulated the 95% confidence interval.
Users can utilise the following codes snippets to evaluate the performance of simulation and estimation for the polynomial diffusion model:
library(ggplot2)
n_obs <- 100 # number of observations
n_contract <- 10 # number of contracts
dt <- 1/360 # interval between two consecutive time points,
# where 1/360 represents daily data
seed <- 1234 # seed for random number
# In the order of: kappa, gamma, mu, sigma_chi, sigma_xi,
# rho, lambda_chi, lambda_xi, measurement errors
par <- c(0.5, 0.3, 1, 1.5, 1.3, -0.3, 0.5, 0.3,
seq(from = 0.1, to = 0.01, length.out = n_contract)) # set of parameters
x0 <- c(0, 1/0.3) # initial values of state variables
n_coe <- 6 # number of model coefficient
par_coe <- c(1, 1, 1, 1, 1, 1) # model coefficients
# state equation
func_f <- function(xt, par) state_linear(xt, par, dt)
# measurement equation
func_g <- function(xt, par, mats) measurement_polynomial(xt, par, mats, 2, n_coe)
sim <- simulate_data(c(par, par_coe), x0, n_obs, n_contract,
func_f, func_g, n_coe, "Gaussian", seed)
price <- sim$yt # measurement_polynomial function returns the futures price
mats <- sim$mats # time to maturity
xt <- sim$xt # state variables
est <- EKF(c(par, par_coe, x0), price, mats, func_f, func_g, dt, n_coe, "Gaussian")
yt_hat <- data.frame(func_g(t(est$xt_filter), c(par, par_coe), mats)$y)
# rmse should be:
# 0.0897, 0.0841, 0.0837, 0.0676, 0.0535,
# 0.0477, 0.0369, 0.0307, 0.0189, 0.0093
rmse <- sqrt( colMeans((sim$yt - yt_hat)^2) )
round(rmse, 4)
The code should execute without encountering any errors, and the resulting RMSE should match the specified values precisely. Moreover, users have the option to examine the plot of simulated and estimated (1st available) contracts using these codes:
cov_y <- est$cov_y # covariance matrix
contract <- 1 # 1st available contract
CI_lower <- yt_hat[, contract] - 1.96 * sqrt(cov_y[contract, contract, ])
CI_upper <- yt_hat[, contract] + 1.96 * sqrt(cov_y[contract, contract, ])
trunc <- 2 # truncated the data at the second observation
colors <- c("Simulated" = "blue", "Estimated" = "green")
ggplot(mapping = aes(x = trunc: n_obs)) +
geom_line(aes(y = price[trunc: n_obs, contract], color = "Simulated")) +
geom_line(aes(y = yt_hat[trunc: n_obs, contract], color = "Estimated")) +
geom_ribbon(aes(ymin = CI_lower[trunc: n_obs],
ymax = CI_upper[trunc: n_obs]),
alpha = 0.2) +
labs(x = "Dates", y = "Futures prices (PD)", color = "",
title = "Simulated vs estimated futures price
from polynomial diffusion model") +
scale_color_manual(values = colors)
Users should expect to generate the following plot, which mirrors the "Simulated vs Estimated Contract" plot found within the application. In the plot, the blue curve denotes the simulated futures price, while the green curve denotes the estimated futures price. The grey ribbon visually encapsulated the 95% confidence interval.
In this section, we will illustrate the simulation accuracy through the following figure by demonstrating that with appropriate parameters, our simulated data closely aligns with the real data.
The black curve represents the WTI crude oil futures with a 1-month maturity
spanning from 1 November 2014 to 30 June 2015. Initially, we estimated parameters
using real data. Unfortunately, this parameter estimation falls outside the
scope of PDSim. Therefore, we will not delve into the details here. However,
if you are interested, you can refer to the works of Ames et al. (2020),
Cortazar et al. (2019), Cortazar and Naranjo (2016), Kleisinger-Yu et al. (2020),
and Sørensen (2002) for further insights. The estimated futures by the Schwartz
Smith model and the polynomial diffusion model are represented by the solid and
dashed red lines, respectively. Subsequently, we utilised the estimated parameters
to simulate 1000 sample paths. In the generated plot, blue curves depict the
Schwartz Smith model, while green curves represent the polynomial diffusion model.
We utilized the following parameter values to simulate data from the
Schwartz Smith model:
From this plot, it is evident that regardless of the model used for simulation, the percentile band consistently encompasses the actual price. Addtionally, it is noteworthy that while Schwartz Smith model provides a more accurate point estimation, it also exhibits larger measurement errors. Consequently, the band associated with this model is wider compared to the band of the polynomial diffusion model.
If you find any bugs or want to make a contribution to this package, please create a GitHub issue at: https://github.com/peilun-he/PDSim/issues.
Additionally, you are very welcome to provide any kind of feedback and comments. Please send me an email at: peilun.he93@gmail.com.
If you have questions about how to use this package, please also send me an email. I will get back to you as soon as possible.
We would like to thank Sam Forbes, Blake Rayfield and Mark Van de Vyver for testing PDSim and providing valuable feedback and suggestions.
Version 3.0.0 (current version):
- Incorporate Original Schwartz and Smith model where
$\gamma = 0$ . - Add a new tab panel for unit test.
- Docker installation is added.
Version 2.1.2:
- Main functions are exported, with short executable examples.
- Add Contributions and Supports section.
Version 2.1.1:
- Add a vignette.
Version 2.1:
- PDSim is packaged into an R package. Some structures is changed to achieve this.
- A exported function "run_app" is added to run PDSim.
- Add some documentation.
Version 2.0:
- Add navigation bar: welcome page, app, user guide, team members.
- Descriptions of models and some hints are added to the user guide page.
- Allow users to download simulated data as csv files.
- Add 95% confidence intervals to all estimations.
- Add a 3D surface of data.
- Allow users to generate new realisations of data using same set of parameters.
- Bugs fixed.
Version 1.0: basic functions and UI
Ames, M., Bagnarosa, G., Matsui, T., Peters, G. W., & Shevchenko, P. V. (2020). Which risk factors drive oil futures price curves? Energy Economics, 87, 104676.
Aspinall, T., Gepp, A., Harris, G., Kelly, S., Southam, C., & Vanstone, B. (2022). NFCP: N-factor commodity pricing through term structure estimation. The Comprehensive R Archive Network. https://cran.rstudio.com/web/packages/NFCP/index.html.
Cortazar, G., Millard, C., Ortega, H., & Schwartz, E. S. (2019). Commodity price forecasts, futures prices, and pricing models. Management Science, 65(9), 4141-4155.
Cortazar, G., & Naranjo, L. (2006). An N‐factor Gaussian model of oil futures prices. Journal of futures markets: futures, options, and other derivative products, 26(3), 243-268.
Filipovic, D., & Larsson, M. (2016). Polynomial diffusions and applications in finance. Finance and Stochastics, 20(4), 931–972.
Harvey, A. C. (1990). Forecasting, structural time series models and the kalman filter. Cambridge University Press.
Julier, S. J., & Uhlmann, J. K. (1997). New extension of the kalman filter to nonlinear systems. Signal Processing, Sensor Fusion, and Target Recognition VI, 3068, 182–193.
Julier, S. J., & Uhlmann, J. K. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3), 401–422.
Kleisinger-Yu, X., Komaric, V., Larsson, M., & Regez, M. (2020). A multifactor polynomial framework for long-term electricity forwards with delivery period. SIAM Journal on Financial Mathematics, 11(3), 928–957.
Peters, G. W., Briers, M., Shevchenko, P., & Doucet, A. (2013). Calibration and filtering for multi factor commodity models with seasonality: incorporating panel data from futures contracts. Methodology and Computing in Applied Probability, 15, 841-874.
Risk.net. (n.d.). No arbitrage pricing. Retrieved from https://www.risk.net/definition/no-arbitrage-pricing.
Schwartz, E. S., & Smith, J. E. (2000). Short-term variations and long-term dynamics in commodity prices. Management Science, 46(7), 893–911.
Sørensen, C. (2002). Modeling seasonality in agricultural commodity futures. Journal of Futures Markets: Futures, Options, and Other Derivative Products, 22(5), 393-426.
Wan, E. A., & Van Der Merwe, R. (2000). The unscented kalman filter for nonlinear estimation. Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), 153–158.