# Datasets

Reliable sources and quick-start code

# Financial Datasets: Quick Start

Use these beginner-friendly data sources. Start small (few tickers,
short dates) to keep things fast and clear.

## Yahoo Finance (Free)

Suitable for equities, indices, ETFs.

``` python
import yfinance as yf

# Minimal example
df = yf.download(["AAPL", "MSFT"], start="2022-01-01", end="2023-01-01")
df["Adj Close"].head()

# Simple return calculation
returns = df["Adj Close"].pct_change().dropna()
returns.tail()
```

Notes: data may be adjusted and occasionally revised; verify for
assessment work.

## FRED (Macro/Economic)

Federal Reserve economic series (GDP, CPI, rates).

``` python
import pandas_datareader.data as web

gdp = web.DataReader("GDP", "fred", start="2018-01-01")
gdp.tail()
```

## CSV Fallback (Offline-friendly)

If network access is blocked, work from local CSVs.

``` python
import pandas as pd

prices = pd.read_csv("prices_sample.csv", parse_dates=["Date"], index_col="Date")
prices.head()
```

How to create a CSV quickly:

``` python
# After pulling with yfinance, save for later offline use
df["Adj Close"].to_csv("prices_sample.csv")
```

## Good practises

-   Start with one or two tickers, short windows
-   Check `.info()` or documentation for series definitions
-   Keep a small “data” folder with versioned CSV snapshots

## JKP Global Factor Data (Replication resources)

The JKP initiative (Jensen–Kelly–Pedersen) provides a curated, global
factor dataset, documentation, and analysis tools:

-   Portal: https://jkpfactors.com
-   Documentation (factor definitions, availability):
    https://jkpfactors.s3.amazonaws.com/documents/Documentation.pdf
-   JKP/WRDS Guide: https://jkpfactors.com/jkp-wrds-guide
-   GitHub (related research replication):
    https://github.com/bkelly-lab/ReplicationCrisis

Notes and usage - Access may require registration and/or institutional
subscriptions (e.g., WRDS). Follow the portal’s terms and
documentation. - For coursework, prefer small, well‑documented slices
(few factors, limited horizon) and record exactly which series/versions
you used. - Context papers: Jensen, Kelly, and Pedersen (2024);
methodology links to Kelly, Malamud, and Zhou (2022) and Gu, Kelly, and
Xiu (2020) for model design and evaluation.

Gu, Shihao, Bryan Kelly, and Dacheng Xiu. 2020. “Empirical Asset Pricing
via Machine Learning.” *Review of Financial Studies*.
<https://doi.org/10.1093/rfs/hhaa009>.

Jensen, Theis I., Bryan T. Kelly, and Lasse Heje Pedersen. 2024. “Is
There a Replication Crisis in Finance?” *Journal of Finance*.
<https://doi.org/10.1111/jofi.13249>.

Kelly, Bryan T., Semyon Malamud, and Kangying Zhou. 2022. “The Virtue of
Complexity in Return Prediction.” Working Paper w30217. National Bureau
of Economic Research. <https://doi.org/10.3386/w30217>.