# 00 — Data Ingestion (Step-by-step)
**Goal:** Walk through using `get_market_data` and `get_options_data` with caching.

---


In [1]:
# 1. Environment setup
import os, sys
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..', 'src')))

from datetime import date, timedelta
from oami.config import initialize_environment
from oami.data_layer import get_market_data, get_options_data


api_key = initialize_environment()
print('API key configured' if api_key != 'YOUR_KEY_HERE' else 'Running in offline mode using cache')


✅ Polygon API key detected and loaded.
API key configured


## 2. Choose symbol and date window
We'll work with `SPY` and fetch the last 30 days of data.


In [2]:
symbol = 'SPY'
end_date = date.today()
start_date = end_date - timedelta(days=30)
print(f'Requesting data from {start_date} to {end_date}')


Requesting data from 2025-09-27 to 2025-10-27


## 3. Fetch market data
`get_market_data` reads from the consolidated HDF5 cache at `data/cache/oami_store.h5`. When a requested window is missing, the helper fetches gaps from Polygon and persists them immediately.

In [3]:
market_df = get_market_data(
    symbol=symbol,
    start=start_date.isoformat(),
    end=end_date.isoformat(),
    interval='1D',
    use_cache=True,
)
print('Market rows:', len(market_df))
market_df.head()
market_df.info()
market_df.describe()

Fetching market data SPY...done
Market rows: 21
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Timestamp  21 non-null     datetime64[ns]
 1   Open       21 non-null     float64       
 2   High       21 non-null     float64       
 3   Low        21 non-null     float64       
 4   Close      21 non-null     float64       
 5   Volume     21 non-null     float64       
dtypes: datetime64[ns](1), float64(5)
memory usage: 1.1 KB


Unnamed: 0,Timestamp,Open,High,Low,Close,Volume
count,21,21.0,21.0,21.0,21.0,21.0
mean,2025-10-12 02:42:51.428571392,668.571429,671.388605,664.918995,668.261667,76265080.0
min,2025-09-28 21:00:00,657.17,665.13,652.84,653.02,34200380.0
25%,2025-10-05 21:00:00,664.36,668.71,659.7679,664.39,60702240.0
50%,2025-10-12 21:00:00,669.99,672.21,666.78,669.12,72545370.0
75%,2025-10-19 21:00:00,672.0,672.99,669.21,671.3,81702560.0
max,2025-10-26 21:00:00,682.73,683.99,682.115,683.865,159422600.0
std,,5.979157,4.533182,7.108983,6.343241,25106780.0


In [5]:
market_df

## 4. Fetch options data
`get_options_data` returns contract metadata derived from the HDF5 cache and backfills any missing expirations or strike slices before responding.

In [None]:
options_df = get_options_data(
    symbol=symbol,
    start_date=start_date.isoformat(),
    end_date=end_date.isoformat(),
    interval='1D',
    use_cache=True,
    look_forward=14,
)
print('Options rows:', len(options_df))
options_df.head()
options_df.info()
options_df.describe()

In [18]:
options_df.iloc[200][0]

In [12]:
options_df.iloc[0][6]

In [8]:
from oami.utils.cache_manager import H5_PATH
import pandas as pd

with pd.HDFStore(H5_PATH, mode='r') as store:
    print('Cache located at:', H5_PATH)
    print('Available keys:')
    for key in store.keys():
        print(' ', key)


## 5. Inspect the cache keys
Use `pandas.HDFStore` to view the contents of the cache. Keys under `/stocks` contain market bars, while `/options` embed per-contract aggregates.

## 6. Re-using cached data
Running the notebook again with the same window now pulls everything directly from the HDF5 cache—no additional API calls required.