# Robust Identification of Investor Beliefs

by [Xiaohong Chen](https://economics.yale.edu/people/faculty/xiaohong-chen), [Lars Peter Hansen](http://larspeterhansen.org/) and [Peter G. Hansen](https://mitsloan.mit.edu/phd/students/peter-hansen).

The latest version of the paper can be found [here](http://larspeterhansen.org/research/papers/).

Notebook by: Han Xu, Zhenhuan Xie.

## 1. Overview

This notebook provides the source code and explanations for how we solve the dynamic problem in Section 3 of the paper. It also provides the source code for the figures in Section 4 as well as additional results that we did not report in the paper. Before we describe and implement the computation, let's first install and load necessary `Python packages` (and set up the server environment if you are running this notebook on `Goolge Colab`) by running the following cell.

In [None]:
# Check if the notebook is open in Google Colab
try:
    from google.colab import drive
    IN_COLAB = True
except:
    IN_COLAB = False

# Set up Google Colab environment
if IN_COLAB:
    import os
    # Link your Goolge Drive to Goolge Colab
    drive.mount('/content/gdrive')
    %cd '/content/gdrive/My Drive'
    # Create a folder to store our project
    if 'Belief_project' in os.listdir():
        %cd '/content/gdrive/My Drive/Belief_project'
        ! git pull
    else:
        ! mkdir '/content/gdrive/My Drive/Belief_project/'
        %cd '/content/gdrive/My Drive/Belief_project/'
    # Clone GitHub repo to the folder and change working directory to the repo
    if 'Beliefs' not in os.listdir():
        ! git clone https://github.com/lphansen/Beliefs.git
    %cd '/content/gdrive/My Drive/Belief_project/Beliefs'

# Set up local environment
else:
    try:
        import plotly
    except:
        import sys
        !{sys.executable} -m pip install plotly

import time
import pandas as pd
import numpy as np
from source.preprocessing import preprocess_data
from source.solver import solve, find_ξ
print('----------Successfully Loaded Python Packages----------')

## 2. Moment Bounds

### 2.1 Relative Entropy Specification
Recall **Problem 3.2** in the paper.  For a real number $\mu$ and a random variable $v_0$, 

\begin{equation}
\mu = \min_{N_1 \ge 0} \mathbb{E}\left(N_1\left[g(X_1)+\xi\log N_1 + v_1\right]\mid \mathfrak{I}_0\right) - v_0
\end{equation}
*subject to constraints*:
\begin{align*}
\mathbb{E}\left[N_1 f(X_1)\mid\mathfrak{I}_0\right] &= 0\\
\mathbb{E}\left[N_1 \mid \mathfrak{I}_0\right] &= 1
\end{align*}
where $v_1$ is a version of $v_0$ shifted forward one time period.

By **Proposition 3.8**, this problem can be solved by finding the solution to:

\begin{equation}
\epsilon = \min_{\lambda_0}\mathbb E \left(\exp \left[-\frac{1}{\xi}g(X_1)+\lambda_0\cdot f(X_1)\right]\left( \frac{e_1}{e_0}\right) \mid \mathfrak{I}_0\right)
\end{equation}

*where*
\begin{align*}
\mu &= -\xi \log \epsilon,\\
v_0 &= -\xi \log e_0.
\end{align*}

The optimized results will depend on the choice of $\xi$. Alternative values of $\xi$ imply alternative bounds on the expectation of $g(X_1)$ and the corresponding relative entropy.  Below is an illustration of how the minimized objectives $\mu^*$ and $\epsilon^*$ change with $\xi$. Data and calculation details are described later in Section 3. 

In [None]:
from source.plots import objective_vs_ξ
time_start = time.time()
objective_vs_ξ(n_states=3) # Here we use relative entropy divergence
print('Time spent:', round(time.time()-time_start,2),'s')

The implied solution for the probablity distortion is:

\begin{equation}
N_1^* = \frac{\exp \left[-\frac{1}{\xi}g(X_1)+\lambda^*_0(Z_0)\cdot f(X_1)\right]e_1^*}{\epsilon^*e_0^*}
\end{equation}

where $\lambda^*_0$ is the optimizing choice for $\lambda_0$ and $\left(\epsilon^*,e_0^*\right)$ are selected so that the resulting $\sf Q$ induces stochastically stable. The conditional expectation implied by the bound is

\begin{equation}
\mathbb{E}\left[N_1^*g(X_1)\mid \mathfrak{I}_0\right]
\end{equation}

which in turn implies a bound on the unconditional expectation equal to

\begin{equation}
\int \mathbb{E}\left[N_1^*g(X_1)\mid\mathfrak{I}_0\right]d \sf Q_0^*
\end{equation}

The implied relative entropy is

\begin{equation}
\int \mathbb{E}\left(N_1^*\log N_1^*\mid \mathfrak{I}_0\right)d \sf Q_0^*
\end{equation}

### 2.2 Quadratic Specification

Intead of using relative entropy as the divergence measure, we can also use quadratic specification as discussed in the appendix of the paper. For a real number $\mu$ and a random variable $v_0$,
\begin{equation}
\mu = \min_{N_1\geq 0}\mathbb E \left(N_1 \left[g(X_1)+v_1\right]+\frac{\xi}{2}(N_1^2-N_1) \mid \mathfrak{I}_0\right) - v_0
\end{equation}

*subject to constraints*:
\begin{align*}
&\mathbb{E}\left[N_1f(X_1)\mid\mathfrak{I}_0\right] = 0\\
&\mathbb{E}\left[N_1\mid\mathfrak{I}_0\right] = 1
\end{align*}

Similarly, the problem can be solved by finding the solution to:

\begin{equation}
\mu = \max_{\lambda_1,\lambda_2} -\frac{\xi}{2}\mathbb{E}\left[\left(\left[\frac{1}{2}-\frac{1}{\xi}\left[g(X_1)+v_1+\lambda_1 \cdot f(X_1) + \lambda_2\right]\right]^+\right)^2\mid\mathfrak{I}_0\right]-\lambda_2-v_0
\end{equation}

The implied solution for the probablity distortion is:

\begin{equation}
N_1^* = \left[\frac{1}{2} - \frac{1}{\xi^*}\left[g(X_1)+v_1^*+\lambda_1^*\cdot f(X_1)+\lambda_2^*\right]\right]^+
\end{equation}

### 2.3 Chernoff Entropy

In addition, we show how to calculate the Chernoff entropy mentioned in the paper. Suppose $P$ and $\tilde{P}$ are transition probability matrices of two Markov processes.

- Fix 0<s<1. Calculate the matrix $H_s\left(P,\tilde{P}\right)$:
$$
H_s\left(P,\tilde{P}\right)_{ij} = [P_{ij}]^s [\tilde{P}_{ij}]^{1-s}
$$ 
for $1\leq i,j \leq 741$.


- Calculate the spectral radius of $H_s\left(P,\tilde{P}\right)$:
$$
r = \max_{1\leq i\leq 741} \left\{|\lambda_i|\right\}
$$
where $\{\lambda_i\}$ are the (possibly complex) eigenvalues for $H_s\left(P,\tilde{P}\right)$.


- Minimize $r$ with respect to $s$. Denote the minimized $r$ as $r^*$. Then $1-r^*$ is the Chernoff entropy.

The Chernoff measure is motivated by a common decay rate imposed on type I and type II errors of testing one model against another and is expected to be considerably smaller. We computed it using the approach described in Newman and Stuck (1979) for Markov processes. While symmetric, this measure is less tractable to implement and not included in the family of recursive divergences that we describe. We use it merely to provide, ex post, additional information about the magnitude of the bound.


To compute the bounds on the expected logarithmic return on market, we let the logarithm of this return on wealth be our $g$; 

To compute the bounds on risk premium and generalized volatility, we extend the previous approach as follows:

### 2.4 Bounding Risk Premia

- Set $g(X_1)=R^w_1-\zeta R^f_1$ where $\zeta$ is a "multiplier" that we will search over;


- for alternative $\zeta$, deduce $N_1^*(\zeta)$ and $\sf Q_0^*(\zeta)$ as described in the paper;


- compute:

$$
\log \int \mathbb{E}\left[N_1^*(\zeta)R^w_1\mid \mathfrak{I}_0\right]d \sf Q_0^*(\zeta) - \log \int \mathbb{E}\left[N_1^*(\zeta)R^f_1\mid \mathfrak{I}_0\right]d \sf Q_0^*(\zeta)
$$
and minimize with respect to $\zeta$;


- set $g(X_1)=-R^w_1+\zeta R^f_1$, repeat, and use the negative of the minimizer to obtain the upper bound.

### 2.5 Bounding Volatility

We show how to bound an entropic measure of volatility.  Other measures could be computed using a similar approch.  

- Set $g(X_1)=R^w_1-\zeta \log R^w_1$ where $\zeta$ is a "multiplier" that we will search over;


- for alternative $\zeta$, deduce $N_1^*(\zeta)$ and $\sf Q_0^*(\zeta)$ as described in the paper;


- compute:

$$
\log \int \mathbb{E}\left[N_1^*(\zeta)R^w_1\mid \mathfrak{I}_0\right]d {\sf Q_0}^*(\zeta) - \int \mathbb{E}\left[N_1^*(\zeta)\log R^w_1\mid \mathfrak{I}_0\right]d {\sf Q_0}^*(\zeta)
$$
and minimize with respect to $\zeta$;


- set $g(X_1)=-R^w_1+\zeta \log R^w_1$, repeat, and use the negative of the minimizer to obtain the upper bound.

## 3. Code Implementation

### 3.1 Data
The file “UnitaryData.csv” contains the following data from 1954-2016:

- The first four columns contain Euler equation errors from the unitary risk aversion model corresponding to the 3-month T-bill rate, the market excess return, the SMB excess return, and the HML excess return respectively. Under a feasible belief distortion, all four of these variables should have expectation of zero (conditional or unconditional).


- The column “d.p” contains the dividend-price ratio for the CRSP value-weighted index, computed at the start of the return period. Hence functions of d.p[i] (i.e. quantile indicator functions) are valid instruments for the returns in row i.


- The final column “log.RW” contains values of the logarithmic return on CRSP value-weighted index. We use this as a proxy for the logarithmic return on wealth. This is the random variable whose expectation we are intersted in bounding.

All returns are quarterly and inflation-adjusted.

In [None]:
# Load data
data = pd.read_csv('data/UnitaryData.csv')
# Show statistics of the data
data.describe()

Given our direct use of dividend-price measures, we purposefully choose a coarse conditioning of information and split the dividend price ratios into $n$ bins using the empirical percentiles. We take the dividend-price percentiles to be a $n$-state Markov process. Then we multiply each of the first four columns by each of the $n$ columns of the indicator function of dividend-price percentiles to form a $4n$-dimensional $f$. 

When bounding the expected logarithmic return on wealth, we take $\log R^w$ as our $g$. When bounding the risk premium and volatility, we define $g$ as discussed above in Section 2.4 and 2.5. 

### 3.2 Computational Strategy

Since we have $n$ distinct states in our application, we can represent the function $e(\cdot)$ as a $n$-dimensional vector. Additionally, we are free to impose the normalization $e_1=1$. We can solve the dual problem numerically by something analogous to value function iteration for $e=(1,e_2,...,e_n)$. Here is the iteration scheme:


1\. Guess $e={\mathbb{1}}_{n\times 1}$.

2\. For $k \in \{1,2,...,n\}$, solve
\begin{equation}
v_k = \min_{\lambda_0} \hat{\mathbb{E}}\left(\exp \left[-\frac{1}{\xi}g(X_1) + \lambda_0 \cdot f(X_1)\right]e(Z_0)\mid Z_0 = k\right)
\end{equation}

3\. Store
\begin{align*}
\hat{e} &= v/v_1 \\
\hat{\epsilon} &= v_1 \\
\text{error} &= \|\hat{e}-e\|
\end{align*}

4\. Set $e = \hat{e}$.

5\. Iterate steps 2-4 until error is smaller than $10^{-9}$.

Once we have (approximately) stationary values for $\epsilon^*$ and $e^*$ as well as the optimizing $\lambda_0^*$, we can form the conditional belief distortion
\begin{equation}
N_1 = \frac{1}{\epsilon^*} \exp \left[-\frac{1}{\xi}g(X_1)+\lambda_0^* \cdot f(X_1)\right]\frac{e^*(Z_1)}{e^*(Z_0)}
\end{equation}

To obtain the unconditional relative entropy, we need to average across states using the implied stationary distribution coming from the distorted probabilities. Define a $n\times n$ matrix $\tilde{P}$ by 
$$
\tilde{P}_{i,j} = \hat{\mathbb{E}}\left[N_1 \mathcal{1}\left(Z_1 = j\right)\mid Z_0 = i\right]
$$

We should have that $\tilde{P}$ is a transition probability matrix, so $\tilde{P}\mathbb{1}=\mathbb{1}$. Next, solve for the stationary distribution $\pi\in \mathbb{R}^n$ as the dominant left eigenvector of $\tilde{P}$, i.e.
\begin{equation}
\tilde{\pi}^\prime \tilde{P} = \tilde{\pi}^\prime
\end{equation}

Then, the unconditional relative entropy can be computed as
\begin{equation}
\text{RE}(\xi) = \sum_{k=1}^{n}\hat{\mathbb{E}}\left[N_1\log N_1 \mid Z_0 = k\right]\cdot \tilde{\pi}_k
\end{equation}

Note 1: the implementation is assuming a relative entropy divergence, but the iteration scheme also works for the quadratic specification of divergence.

Note 2: in the following code implementation, we set $n=3$ as used in the paper. Users can specify a different $n$ by changing the `n_states` argument.

### 3.3 Results

In [None]:
from source.plots import print_results
time_start = time.time()

n_states = 3 # User can set to other positive integers
quadratic = False # User can set to True to use quadratic divergence

div = 'QD' if quadratic else 'RE'
f, log_Rw, z0, z1, Rf, Rm, SMB, HML = preprocess_data(n_states)
g = log_Rw # Set g to be log return on wealth

# Minimum divergence case
result_min = solve(f, g, z0, z1, ξ=10., quadratic=quadratic,
                   tol=1e-9, max_iter=1000)

# 20% higher divergence case, lower bound problem
ξ_20_lower = find_ξ(solver_args=(f, g, z0, z1, quadratic, 1e-9, 1000),
                    min_div=result_min[div], pct=0.2, initial_guess=1.,
                    interval=(0, 10.), tol=1e-5, max_iter=100)
result_lower = solve(f, g, z0, z1, ξ=ξ_20_lower, quadratic=quadratic,
                     tol=1e-9, max_iter=1000)

# 20% higher divergence case, upper bound problem
ξ_20_upper = find_ξ(solver_args=(f, -g, z0, z1, quadratic, 1e-9, 1000),
                    min_div=result_min[div], pct=0.2, initial_guess=1.,
                    interval=(0, 10.), tol=1e-5, max_iter=100)
result_upper = solve(f, -g, z0, z1, ξ=ξ_20_upper, quadratic=quadratic,
                     tol=1e-9, max_iter=1000)

print_results(result_lower, result_upper, quadratic)

print('Time spent:', round(time.time()-time_start,2),'s')

In [None]:
from source.utilities import construct_transition_matrix, chernoff_entropy
time_start = time.time()
# Compute Chernoff entropy for empirical distribution and distorted distribution with min rel entropy
P_big, P_big_tilde = construct_transition_matrix(f, g, z0, z1, result_min['ξ'],
                                                 result_min['P'], result_min['P_tilde'],
                                                 result_min['λ'], result_min['v'],
                                                 result_min['μ'], quadratic)
decay_rate, optimal_s = chernoff_entropy(P_big, P_big_tilde, grid_size=1000)
print('Chernoff entropy at the minimum: ', np.around(decay_rate,4))
print('Time spent:', round(time.time()-time_start,2),'s')

In [None]:
from source.plots import entropy_moment_bounds
time_start = time.time() 
entropy_moment_bounds(n_states) # Here we use relative entropy divergence
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))

## 4. Tables and plots

In [None]:
from source.plots import figure_1
time_start = time.time()
print('Figure 1: Expected log market return')
figure_1()
print('Note 1: here we use n_states = 3 and a relative entropy divergence.')
print('Note 2: user can control the slider to change the percent increase')
print('        of relative entropy from minimum.')
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))

In [None]:
from source.plots import figure_2
time_start = time.time()
print("Figure 2: Proportional risk compensations")
figure_2()
print('Note 1: here we use n_states = 3 and a relative entropy divergence.')
print('Note 2: user can control the slider to change the percent increase')
print('        of relative entropy from minimum.')
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))

In [None]:
from source.plots import table_1
time_start = time.time()
table_1(n_states=3, quadratic=False)
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))

In [None]:
from source.plots import table_2
time_start = time.time()
table_2()
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))

In [None]:
from source.plots import table_3
time_start = time.time()
table_3()
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))    

In [None]:
from source.plots import table_4
time_start = time.time()
table_4()
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))    

In [None]:
from source.plots import table_5
time_start = time.time()
table_5(n_states=3)
print("Time spent: %s seconds ---" % (round(time.time()-time_start,4)))    