# Stochastic Volatility Model

Asset prices have time-varying volatility (variance of day over day returns). In some periods, returns are highly variable, while in others very stable. Stochastic volatility models model this with a latent volatility variable, modeled as a stochastic process. The following model is similar to the one described in the No-U-Turn Sampler paper, Hoffman (2011) p21.

See pymc3 documation on this topic: https://docs.pymc.io/notebooks/stochastic_volatility.html

In [16]:
import pymc3 as pm
import os
import pandas as pd
import numpy as np
from tqdm import tqdm

In [10]:
data_dir = os.path.join(os.pardir, "data")
raw_dir = os.path.join(data_dir, "raw")

In [11]:
train = pd.read_csv(
    os.path.join(raw_dir, "train", "train.csv"),
#     nrows=10000000,
    dtype={"acoustic_data": np.int16, "time_to_failure": np.float64},
)

In [12]:
# Create a training file with simple derived features
rows = 150_000
segments = int(np.floor(train.shape[0] / rows))

X_tr = pd.DataFrame(index=range(segments), dtype=np.float64)

y_tr = pd.DataFrame(index=range(segments), dtype=np.float64, columns=['time_to_failure'])

In [13]:
submission_path = os.path.join(data_dir, "processed", "sample_submission.csv")
submission = pd.read_csv(submission_path, index_col='seg_id')
X_test = pd.DataFrame(columns=X_tr.columns, dtype=np.float64, index=submission.index)

In [17]:
for segment in tqdm(range(segments)[:1]):
    seg = train.iloc[segment*rows:segment*rows+rows]
    x = pd.Series(seg['acoustic_data'].values)
    y = seg['time_to_failure'].values[-1]

100%|██████████| 1/1 [00:00<00:00, 57.64it/s]


In [18]:
x.shape

(150000,)

In [20]:
returns = x[::10].values

In [21]:
returns

array([12,  5,  4, ...,  6,  8, 13], dtype=int16)

In [22]:
with pm.Model() as model:
    step_size = pm.Exponential('sigma', 50.)
    s = GaussianRandomWalk('s', sd=step_size,
                           shape=len(returns))

    nu = pm.Exponential('nu', .1)

    r = pm.StudentT('r', nu=nu,
                    lam=pm.math.exp(-2*s),
                    observed=returns)

AttributeError: module 'numpy.core.multiarray' has no attribute '_get_ndarray_c_version'