# Whiskey price predictions: Feb 2022

## Problem Statement

People in the investment management industry are constantly looking for new sources of alpha. In recent times, family offices, asset managers & hedge funds in London have been seeking alpha in luxury wine and fine art. We are interested in knowing whether this can be applicable to the luxury whiskey market.

Note: the project was completed on early Mar 2022 and revisited on 3rd Aug 2022.

### Suggested improvements for the project:
- A dynamically updating model is preferred for this project to ensure that live auction data stays up to date.
- Model deployment.

## Univariate Time Series

### Augmented Dickey Fuller: Stationarity

#### Raw

![adf_test_statistic.png](data/adf_test_statistic.png)

#### Average (taking the average of OHLC data)

![adf_average.png](data/adf_average.png)


Using a p-value cutoff of 0.05, all prices were found to be stationary (p < 0.05), with the exceptions of Hibiki (average) and Yamazaki 18 (average), shown above. 2nd order differencing would be required to enforce stationary in these datasets, while first order differencing would be used for the rest.

### Holt Winters Exponential Smoothing

One of the first technical indicators that a trader performing old-school technical analysis is introduced to is the moving average. The moving average comes in several forms, including simple moving average, exponential moving average, and weighted moving average. The Holt-Winters Exponential Smoothing is a triple moving average function that performs smoothing and also checks for seasonality - the presence of cyclical trends in the data. A multiplicative model will be used because of the resilience of its results.

## Multivariate Time Series

### Granger's

A granger's causality matrix was constructed with the following data:
- OHLC prices of all 4 whiskeys
- Bitcoin
- Ethereum
- Whiskey stocks

If the p-value of the granger causality test is less than 0.05, we reject the null hypothesis that X does not granger cause Y. (This is a double negative. A p-value of less than 0.05 implies that X granger causes Y.)

For Macallan high:
- None of the variables granger cause Macallan high.
- Yamazaki 25 bid and ethereum's OHLC prices granger cause Macallan's average price.

For Hibiki:
- Ethereum, Macallan bid and Yamazaki 25 bid granger causes Hibiki high.
- Bitcoin volume and DEO volume granger causes Hibiki high.

for Yamazaki 18 high:
- Macallan average, Bitcoin OHLC + volume, Ethereum OHLC + volume and DEO OHLC + volume Granger causes it.

Yamazaki 25:
- cryptocurrencies (bitcoin + ethereum) OHLC data granger causes it.
- Yamazaki 18 low granger causes yamazaki 25 high.

#### Conclusions:
- Interestingly, Bitcoin and Ethereum's OHLC data granger causes Yamazaki 18 and 25 high prices.
- Only Ethereum granger causes Hibiki.

### VAR

VAR model was run for 4 whiskeys, bitcoin, ethereum, and whiskey stocks like DEO.

Results:
- For monthly data, VAR(1) is the best model (AIC = -1400)
- For weekly data, VAR(10) yielded the best results, with an AIC of -1208.

![var.png](data/var.png)

### Durbin Watson (after VAR)

![dw.png](data/dw.png)

The Durbin-Watson test checks for autocorrelation between the residuals, or error terms of the autoregressive model. If the p-value < 1.5, positive autocorrelation between residuals is present; if the p-value > 2.5, negative autocorrelation between residuals is present.

Checking and eliminating autocorrelation is pivotal because the f-statistic will be inflated under positive autocorrelation; and MSE and RMSE will be artificially lower, resulting in spurious results.

Autocorrelation present in: macallan high and yamazaki 18 average


## Results of 116 XGBR Models

###  Best performing models (RMSE):

![rmse.png](data/rmse.png)

RMSE was used as the metric for model selection. The data transformation used was log price, followed by RobustScaler. Best models were the raw data for Macallan, Hibiki and Yamazaki 18, and the average data for Yamazaki 25, which produced an RMSE of 0.162.

Hyperparameter tuning may be used to reduce overfitting, notably regularization and early stopping.


### Discoveries
Addition of time dummies and lagged data drastically worsened the predictability of Macallan prices (RMSE of 0.97 instead of the 0.289 shown here). This, along with the Granger causality matrix for Macallan high and average prices, suggests that past prices of other factors do not predict Macallan high prices, and that Macallan high prices may depend on other exogenous factors not accounted for by the model (eg. sentiment, hype). The branding and sentiment of the whiskey is likely to have greater influence over its price as compared to quantitative factors.

For Hibiki, the addition of time dummies + lagged data caused a better performance than adding lagged data alone (best score for lagged data: 0.132, vs 0.1233 for time dummies) - suggesting that seasonality and other time factors may be pertinent in price determination/prediction. Further analysis will be elucidated under "Feature Importances".

For Yamazaki 18, the addition of lagged data only performed better than time dummies + lagged data (0.2307 vs 0.2472).

Note that for both Hibiki and Yamazaki 18, modelling raw data yielded the best RMSE results (0.0805 and 0.196 respectively).

# Feature Importances & Hedonic Regression

### Visual interpretation

#### Macallan

![macallan_importances.png](data/macallan_importances.png)

#### Hibiki

![hibiki_importances.png](data/hibiki_importances.png)

#### Yamazaki 18

![yamazaki_18_importances.png](data/yamazaki_18_importances.png)

#### Yamazaki 25 

![yamazaki_25_importances.png](data/yamazaki_25_importances.png)

According to the XGBoostRegression model, prices of each brand can be estimated using the following equations:

 - Macallan_price = 𝛽0 + 𝛽1lot + 𝛽2res + 𝛽3cask_type_sherry + 𝛽4 cask_type_sherry_wood + 𝛽5bottler_squaldron_malt + 𝛽6bottler_douglas_lang + 𝛽7bottler_squadron_malts_eigen + 𝛽8bottler_ wilson_and_morgan.
 - Hibiki_price = 𝛽0 + 𝛽1 age + 𝛽2res + 𝛽3date + 𝛽4quantity + 𝛽5lot+ 𝛽6bid + 𝛽7vintage
 - Yamazaki_18_price = 𝛽0 + 𝛽1ABV + 𝛽2res + 𝛽3vintage + 𝛽4date + 𝛽5bottler_official + 𝛽6lot + 𝛽7quantity + 𝛽8bid
 - Yamazaki_25_price = 𝛽0 + 𝛽1abv + 𝛽2Bottler_official + 𝛽3res + 𝛽4bid + 𝛽5 date

# Time dummies

![td_fi.png](data/td_fi.png)

Seasonality found in Hibiki and Yamazaki:

- month 4, 3 and 7 (April, March and July) are when Hibiki prices peak.
- month 4 (April) is when Yamazaki 18 and Yamazaki 25 prices peak.

## Investment recommendations

- Look into possible seasonal arbitrage opportunities for whiskeys.
- According to HWES, Yamazaki 18 high prices are poised to increase further and that makes it a suitable investment.

- Macallan prices will experience a correction from the peak of 80k at 2021 to a more reasonable price of 10k or 20k. The best strategy is to conduct short term seasonal arbitrage by obtaining Macallan at months where prices are expected to be lower (eg. July) and sell during months where its prices are expected to be higher (eg. December).