# Commodity Return Replication – Walkthrough
This notebook documents the process of generating an approximate replication of the commodity return section from He, Kelly, and Manela (2017).

## Methodology
In the original paper, the authors construct monthly returns for **23 or 24** commodities, selected based on data availability from the Commodity Research Bureau (CRB). The return calculation follows the method outlined in Yang’s earlier paper, which uses the longest available contract maturing in ≤12 months and the shortest contract maturing in ≥1 month. **(refine this description with the exact formula later.)**

He, Kelly, and Manela modified Yang's approach slightly by limiting the contract selection to a maximum of 4 months ahead, likely to reduce maturity-driven variation and align with the idea of short-term roll-based returns.

## Data Limitations
In our attempt to replicate their results, we encountered several data issues. The CRB dataset used in both Yang’s and HKM’s papers is no longer publicly accessible or consistently maintained. Moreover, Yang’s paper includes a detailed summary table (Table 1) listing the number of available contracts per commodity, and we found that our available dataset barely matched these counts. This made direct replication of Yang’s results – let alone HKM’s – impractical.

## Alternative Approach
Fortunately, He, Kelly, and Manela note in their paper that they also tried a methodology proposed by Koijen, Moskowitz, Pedersen, and Vrugt (2018, henceforth KMPV), which produced a very similar result. Importantly, the KMPV paper provides Bloomberg tickers for commodity futures, allowing us to extract consistent price data.

Given this, our replication follows the KMPV approach, using Bloomberg-derived data. Although KMPV do not explicitly provide a return formula – as they directly use monthly return series from Bloomberg – we computed monthly returns manually where necessary and matched them to the anonymous HKM series via correlation.

## Replication Summary
We pulled commodity price data using Bloomberg tickers provided in KMPV.

Monthly returns were calculated using either:

The % change in front-month contracts, or

A rolling method using a predefined rolling schedule (e.g., 1M-2M-3M repeat).

We then compared our generated return series against the anonymous HKM commodity portfolios by computing a correlation matrix.

To align the two sets of portfolios (our labeled commodities vs. HKM’s anonymous ones), we used a linear assignment algorithm to find the optimal one-to-one mapping that maximizes total correlation.

## Correlation Testing
Using the Pearson correlation matrix, we tested which commodities in our KMPV-based dataset best approximate each of the HKM anonymous return series. A heatmap and table summary are included in the correlation section of this notebook.

## Graphing and Historical Comparison
To further validate our replication, we plotted the historical return series for the best-matched pairs and compared their trends. These graphs show that many of the KMPV-based commodity returns closely track the HKM counterparts over time.

