# 期货持仓报告

COT报告也称为期货持仓报告，由美国CFTC公布，包含美国期货市场上大型交易商的持仓明细，例如持有多少多头合约，空头合约和未平仓合约等。

COT报告每周公布一次，公布时间是美东时间周五下午，数据采集时间截止到公布当周周二。

#### 报告类型

1. Legacy - 传统报告
2. Supplemental - 补充报告
3. Disaggregated - 分类报告
4. Traders in financial futures - 金融期货交易者报告

传统报告，分类报告，金融期货交易者报告又可以细分为两种：仅包含期货，合并期货和期权。

#### 传统报告

传统报告将市场参与者分为两类：非商业交易者和商业交易者。

* 非商业(noncommercial): 一般指商品生产商或制造商，主要用期货对冲产品价格风险。
* 商业(commercial): 一般指投机性头寸，例如对冲基金或投行持仓的合约。

#### 补充报告

补充报告包含13种精选农产品合约，把市场参与者分为三类：非商业，商业和指数交易者。

#### 分类报告

分类报告是传统报告的深入，将市场参与者进一步细分：

* Producer/Merchant/Processor/User: 生产商/商家/加工/用户，一般用期货来对冲商品风险
* Swap Dealers: 掉期交易商，期货头寸主要用于对冲掉期交易的风险，对手方既可能是生产商，也可能是对冲基金
* Managed Money: 管理基金，指专门进行期货交易的实体，例如CTA, CPO, 或者对冲基金等
* Other Reportables: 其它报告，指小型交易者

#### 金融期货交易者报告

金融期货交易者报告包括金融合约，例如货币，美国国债，欧洲美元，股票和彭博商品指数等。该报告将市场参与者分为四类：

* Dealer/Intermediary: 做市商/中介，这些参与者被视为市场“卖方”，它们设计并出售期货合约，例如投行，掉期交易商或其它衍生品公司
* Asset Manager/Institutional: 资产管理人/机构，机构投资者，例如养老基金，捐赠基金，保险公司和共同基金等
* Leveraged Funds: 杠杆基金，主要指对冲基金，CTA, CPO等机构
* Other Reportables: 不属于上述三个类别的其它参与者，一般使用期货对冲风险

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from indicators import lowpass_filter

# 传统报告

## 1. 数据

- 从[官网](https://publicreporting.cftc.gov/stories/s/r4w3-av2u)下载COT报告，存储到本地csv
- 从雅虎财经下载关键资产的历史价格数据，存储到本地csv

下载的持仓报告包含所有产品的历史持仓，根据合约产品代码来筛选特定合约，例如：

- ICE美元指数：098662
- 迷你标普500指数：13874A
- 黄金：088691

产品代码可以在[这里](https://www.tradingster.com/)找到。

In [2]:
# COT报告文件路径
cot_file_path = "~/Downloads/cot_legacy_futures_only.csv"

# CFTC合约市场代码
code = "098662"

# 价格数据文件路径
price_file_path = "../data/yahoo/ICE US Dollar Index.csv"

# 读取COT报告数据
cot = pd.read_csv(cot_file_path)

# 读取价格数据
price = pd.read_csv(price_file_path, index_col=0, parse_dates=True)

清洗和合并数据集

In [3]:
# COT报告包含很多字段，我们只需要最关键的几个字段
fields = {
    "Report_Date_as_YYYY_MM_DD": "date",
    "CFTC_Contract_Market_Code": "code",
    "Commodity Name": "commodity",
    "NonComm_Positions_Long_All": "non_comm_long",
    "NonComm_Positions_Short_All": "non_comm_short",
}

# 清洗期货持仓数据
clean_cot = (
    cot.query("CFTC_Contract_Market_Code == @code")
    .loc[:, list(fields.keys())]
    .rename(columns=fields)
    .assign(date=lambda x: pd.to_datetime(x["date"], format="%m/%d/%Y %I:%M:%S %p"))
    .set_index("date")
    .sort_index()
)

# 合并数据集
df = (
    clean_cot.join(price["Close"], how="left")
    .rename(columns={"Close": "price"})
    .dropna()
)

df

Unnamed: 0_level_0,code,commodity,non_comm_long,non_comm_short,price
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2000-01-04,098662,U.S. DOLLAR INDEX,2203,1967,100.410004
2000-01-11,098662,U.S. DOLLAR INDEX,3652,6236,100.559998
2000-01-18,098662,U.S. DOLLAR INDEX,2932,1675,101.860001
2000-01-25,098662,U.S. DOLLAR INDEX,2801,1521,102.410004
2000-02-01,098662,U.S. DOLLAR INDEX,2594,1831,104.919998
...,...,...,...,...,...
2024-12-31,098662,U.S. DOLLAR INDEX,26598,19721,108.489998
2025-01-07,098662,U.S. DOLLAR INDEX,27958,19313,108.540001
2025-01-14,098662,U.S. DOLLAR INDEX,29512,16783,109.269997
2025-01-21,098662,U.S. DOLLAR INDEX,28811,13939,108.059998


## 2. 情绪指标

期货持仓通常用于衡量市场情绪，主要有两种方法：

1. 计算非商业期货净头寸，然后进行标准化，例如计算52周滚动标准分数。
2. 计算非商业期货多头持仓占总持仓的比例。

以方法1为例，当非商业期货净头寸的滚动标准分数高于2，说明投机者全面做多，市场情绪可能过度乐观，中长期价格可能筑顶并开始下跌；当标准分数低于-2，说明投机者全面做空，市场情绪可能过度悲观，中长期价格可能见底并开始上涨。

从逆向交易的逻辑来解读持仓数据。

### 2.1 滚动标准分数

In [8]:
def peak(series: pd.Series) -> pd.Series:
    """
    找出时间序列中的所有局部高点
    局部高点定义：series(t-1) < series(t) and series(t+1) < series(t)

    参数:
        series: pd.Series - 输入的时间序列
    返回:
        pd.Series - 一个和输入序列长度相同的序列，1表示局部高点，0表示非局部高点
    """
    length = len(series)

    if length < 3:
        return pd.Series([], index=[])

    peak_mask = np.zeros(length, dtype=int)

    for i in range(1, length - 1):
        if series.iloc[i - 1] < series.iloc[i] and series.iloc[i + 1] < series.iloc[i]:
            peak_mask[i] = 1

    return pd.Series(peak_mask, index=series.index)


def valley(series: pd.Series) -> pd.Series:
    """
    找出时间序列中的所有局部低点
    局部低点定义：series(t-1) > series(t) and series(t+1) > series(t)

    参数:
        series: pd.Series - 输入的时间序列
    返回:
        pd.Series - 一个和输入序列长度相同的序列，1表示局部低点，0表示非局部低点
    """
    length = len(series)

    if length < 3:
        return pd.Series([], index=[])

    valley_mask = np.zeros(length, dtype=int)

    for i in range(1, length - 1):
        if series.iloc[i - 1] > series.iloc[i] and series.iloc[i + 1] > series.iloc[i]:
            valley_mask[i] = 1

    return pd.Series(valley_mask, index=series.index)

In [30]:
# 计算标准分数的滚动窗口
# 经过测试，当标准分数窗口=10-20时，捕捉到的局部高点和局部低点最为准确，但这可能是过拟合的结果
zscore_period = 12
smooth_period = 4

# 计算非商业期货净头寸及其标准分数
zscores = df.loc[:, ["non_comm_long", "non_comm_short", "price"]]
zscores["non_comm_net"] = zscores["non_comm_long"] - zscores["non_comm_short"]
zscores["non_comm_net_zscore"] = (
    zscores["non_comm_net"] - zscores["non_comm_net"].rolling(zscore_period).mean()
) / zscores["non_comm_net"].rolling(zscore_period).std()

# 计算平滑滚动标准分数
zscores["non_comm_net_zscore_smooth"] = lowpass_filter(
    zscores["non_comm_net_zscore"].dropna(), smooth_period
)

# 计算平滑标准分数的局部高点和局部低点
zscores["peak"] = peak(zscores["non_comm_net_zscore_smooth"])
zscores["valley"] = valley(zscores["non_comm_net_zscore_smooth"])

# 删除缺失值
zscores.dropna(inplace=True)

zscores.round(2)

Unnamed: 0_level_0,non_comm_long,non_comm_short,price,non_comm_net,non_comm_net_zscore,non_comm_net_zscore_smooth,peak,valley
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2000-03-21,1961,1279,105.81,682,-0.07,-0.07,0,0
2000-03-28,1818,1919,105.53,-101,-0.71,-0.71,0,0
2000-04-04,1322,881,105.17,441,-0.94,-1.20,0,1
2000-04-11,1814,897,105.94,917,-0.08,-1.09,0,0
2000-04-18,2529,1517,106.63,1012,0.12,-0.57,0,0
...,...,...,...,...,...,...,...,...
2024-12-31,26598,19721,108.49,6877,1.65,2.16,1,0
2025-01-07,27958,19313,108.54,8645,1.64,2.11,0,0
2025-01-14,29512,16783,109.27,12729,1.88,2.10,0,0
2025-01-21,28811,13939,108.06,14872,1.74,2.08,0,0


查看非商业期货多头，空头和净头寸的变化。

In [21]:
sample = zscores.loc["2020":].copy()

fig = px.line(
    sample,
    x=sample.index,
    y=["non_comm_long", "non_comm_short", "non_comm_net"],
    title="Non-Commercial Positions",
)

fig.for_each_trace(lambda t: t.update(name=t.name.capitalize().replace("_", " ")))

fig.update_layout(
    width=1000,
    height=600,
    xaxis=dict(title="Date"),
    yaxis=dict(title="# of Contracts"),
    legend=dict(
        orientation="h", xanchor="center", yanchor="bottom", x=0.5, y=1.02, title=""
    ),
)

fig.show()

查看滚动标准分数，评估市场情绪

In [48]:
# 创建样本数据
sample = zscores.loc["2020":].copy()

# 使用make_subplots创建具有双y轴的图表
fig = make_subplots(specs=[[{"secondary_y": True}]])

# 添加情绪指标曲线
fig.add_trace(
    go.Scatter(
        x=sample.index,
        y=sample["non_comm_net_zscore_smooth"],
        name=f"Sentiment (Smoothed Rolling {zscore_period}-week Z-score)",
        line=dict(color="rgba(64, 144, 248, 0.8)", width=2),
        fill="tonexty",
        fillcolor="rgba(64, 144, 248, 0.1)",
    ),
    secondary_y=False,
)

# 添加美元指数曲线
fig.add_trace(
    go.Scatter(
        x=sample.index,
        y=sample["price"],
        name="US Dollar Index",
        line=dict(color="rgba(46, 189, 89, 0.8)", width=2),
    ),
    secondary_y=True,
)

# 添加极端情绪水平线
for level in [-2, -1, 1, 2]:
    fig.add_hline(
        y=level,
        line=dict(color="rgba(128, 128, 128, 0.3)", width=1, dash="dot"),
        secondary_y=False,
    )

# 标记标准分数的局部高点和低点
peak_dates = sample.query("peak == 1 & non_comm_net_zscore_smooth > 2").index
valley_dates = sample.query("valley == 1 & non_comm_net_zscore_smooth < -2").index

fig.add_trace(
    go.Scatter(
        x=peak_dates,
        y=sample.loc[peak_dates, "non_comm_net_zscore_smooth"] * 1.05,
        mode="markers",
        name="Peaks",
        marker=dict(
            symbol="triangle-down",
            size=8,
            color="red",
            line=dict(width=1, color="darkred"),
        ),
        hovertemplate="Date: %{x}<br>Z-score: %{y:.2f}<extra></extra>",
    ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(
        x=valley_dates,
        y=sample.loc[valley_dates, "non_comm_net_zscore_smooth"] * 1.05,
        mode="markers",
        name="Valleys",
        marker=dict(
            symbol="triangle-up",
            size=8,
            color="green",
            line=dict(width=1, color="darkgreen"),
        ),
        hovertemplate="Date: %{x}<br>Z-score: %{y:.2f}<extra></extra>",
    ),
    secondary_y=False,
)


# 定义一个函数来创建时间区间
def create_time_ranges(dates, window_days=10):
    ranges = []
    for date in dates:
        start = date - pd.Timedelta(days=window_days)
        end = date + pd.Timedelta(days=window_days)
        ranges.append((start, end))
    return ranges


# 添加背景色来高亮高点区域
for start, end in create_time_ranges(peak_dates):
    fig.add_vrect(
        x0=start,
        x1=end,
        fillcolor="red",
        opacity=0.1,
        layer="below",
        line_width=0,
    )

# 添加背景色来高亮低点区域
for start, end in create_time_ranges(valley_dates):
    fig.add_vrect(
        x0=start,
        x1=end,
        fillcolor="green",
        opacity=0.1,
        layer="below",
        line_width=0,
    )

# 更新布局
fig.update_layout(
    title=dict(
        text="Sentiment Indicator vs US Dollar Index",
        x=0.5,
        y=0.95,
        xanchor="center",
        font=dict(size=20),
    ),
    width=1000,
    height=600,
    template="plotly_white",
    showlegend=True,
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="center", x=0.5),
    hovermode="x unified",
)

# 更新坐标轴
fig.update_xaxes(title="Date", gridcolor="rgba(128, 128, 128, 0.1)", showgrid=True)

fig.update_yaxes(
    title="Sentiment Z-score",
    secondary_y=False,
    gridcolor="rgba(128, 128, 128, 0.1)",
    showgrid=True,
)

fig.update_yaxes(
    title="US Dollar Index",
    secondary_y=True,
    gridcolor="rgba(128, 128, 128, 0.1)",
    showgrid=True,
)

# 显示图表
fig.show()

### 2.2 多头持仓比例

In [35]:
smooth_period = 4

ratios = df.loc[:, ["non_comm_long", "non_comm_short", "price"]]
ratios["long_ratio"] = ratios["non_comm_long"] / (
    ratios["non_comm_long"] + ratios["non_comm_short"]
)
ratios["long_ratio_smooth"] = lowpass_filter(ratios["long_ratio"], smooth_period)
ratios["peak"] = peak(ratios["long_ratio_smooth"])

ratios

Unnamed: 0_level_0,non_comm_long,non_comm_short,price,long_ratio,long_ratio_smooth,peak
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-04,2203,1967,100.410004,0.528297,0.528297,0
2000-01-11,3652,6236,100.559998,0.369337,0.369337,0
2000-01-18,2932,1675,101.860001,0.636423,0.363753,0
2000-01-25,2801,1521,102.410004,0.648080,0.484350,0
2000-02-01,2594,1831,104.919998,0.586215,0.534955,0
...,...,...,...,...,...,...
2024-12-31,26598,19721,108.489998,0.574235,0.571775,0
2025-01-07,27958,19313,108.540001,0.591441,0.590112,0
2025-01-14,29512,16783,109.269997,0.637477,0.618316,0
2025-01-21,28811,13939,108.059998,0.673942,0.657553,0


In [36]:
sample = ratios.loc["2001":]

fig = go.Figure()

# 多头比例
fig.add_trace(
    go.Scatter(
        x=sample.index,
        y=sample["long_ratio_smooth"],
        name="Smoothed Long Ratio",
        yaxis="y1",
        line=dict(color="blue"),
    )
)

# 合约价格
fig.add_trace(
    go.Scatter(
        x=sample.index,
        y=sample["price"],
        name="US Dollar Index",
        yaxis="y2",
        line=dict(color="green"),
    )
)

# 添加垂直线代表多头比例的局部高点
peak_dates = sample.query("peak == 1 & long_ratio_smooth > 0.8").index
for date in peak_dates:
    fig.add_vline(x=date, line=dict(color="red", width=1, dash="dot"))

fig.update_layout(
    title="Non-Commercial Long Ratio",
    width=1000,
    height=600,
    xaxis=dict(title="Date"),
    yaxis=dict(title="Ratio"),
    yaxis2=dict(title="US Dollar Index", overlaying="y", side="right"),
    legend=dict(orientation="h", xanchor="center", yanchor="bottom", x=0.5, y=1.02),
)

fig.show()

# 金融期货交易者报告

In [29]:
cot_file_path = "~/Downloads/cot_ttf_futures_only.csv"

cot = pd.read_csv(cot_file_path)

In [30]:
code = "098662"

fields = {
    "Report_Date_as_YYYY_MM_DD": "date",
    "CFTC_Contract_Market_Code": "code",
    "Commodity Name": "commodity",
    "Dealer_Positions_Long_All": "dealer_long",
    "Dealer_Positions_Short_All": "dealer_short",
    "Asset_Mgr_Positions_Long_All": "asset_mgr_long",
    "Asset_Mgr_Positions_Short_All": "asset_mgr_short",
    "Lev_Money_Positions_Long_All": "lev_money_long",
    "Lev_Money_Positions_Short_All": "lev_money_short",
    "Other_Rept_Positions_Long_All": "other_rept_long",
    "Other_Rept_Positions_Short_All": "other_rept_short",
    "NonRept_Positions_Long_All": "non_rept_long",
    "NonRept_Positions_Short_All": "non_rept_short",
}

clean_cot = (
    cot.query("CFTC_Contract_Market_Code == @code")
    .loc[:, list(fields.keys())]
    .rename(columns=fields)
    .assign(date=lambda x: pd.to_datetime(x["date"], format="%m/%d/%Y %I:%M:%S %p"))
    .set_index("date")
    .sort_index()
)

查看所有子分类的多头头寸

In [31]:
fig = px.line(
    clean_cot,
    x=clean_cot.index,
    y=[
        "dealer_long",
        "asset_mgr_long",
        "lev_money_long",
        "other_rept_long",
        "non_rept_long",
    ],
    title="Long Positions",
)

fig.update_layout(
    width=1000,
    height=600,
    xaxis=dict(title="Date"),
    yaxis=dict(title="# of Contracts"),
    legend=dict(
        orientation="h", xanchor="center", yanchor="bottom", x=0.5, y=1.02, title=""
    ),
)

fig.show()

查看所有子分类的空头头寸

In [32]:
fig = px.line(
    clean_cot,
    x=clean_cot.index,
    y=[
        "dealer_short",
        "asset_mgr_short",
        "lev_money_short",
        "other_rept_short",
        "non_rept_short",
    ],
    title="Short Positions",
)

fig.update_layout(
    width=1000,
    height=600,
    xaxis=dict(title="Date"),
    yaxis=dict(title="# of Contracts"),
    legend=dict(
        orientation="h", xanchor="center", yanchor="bottom", x=0.5, y=1.02, title=""
    ),
)

fig.show()

多头持仓比例

In [33]:
df = clean_cot.copy()

df["dealer_long_ratio"] = df["dealer_long"] / (df["dealer_long"] + df["dealer_short"])
df["asset_mgr_long_ratio"] = df["asset_mgr_long"] / (
    df["asset_mgr_long"] + df["asset_mgr_short"]
)
df["lev_money_long_ratio"] = df["lev_money_long"] / (
    df["lev_money_long"] + df["lev_money_short"]
)
df["other_rept_long_ratio"] = df["other_rept_long"] / (
    df["other_rept_long"] + df["other_rept_short"]
)
df["non_rept_long_ratio"] = df["non_rept_long"] / (
    df["non_rept_long"] + df["non_rept_short"]
)

In [34]:
columns = [
    "dealer_long_ratio",
    "asset_mgr_long_ratio",
    "lev_money_long_ratio",
    "other_rept_long_ratio",
    "non_rept_long_ratio",
]

fig = make_subplots(
    rows=5, cols=1, shared_xaxes=True, subplot_titles=columns, vertical_spacing=0.04
)

for i, column in enumerate(columns, 1):
    fig.add_trace(
        go.Scatter(
            x=df.index,
            y=df[column],
            name=column.replace("_", " ").capitalize(),
        ),
        row=i,
        col=1,
    )

fig.update_layout(width=1000, height=800, title="Long Ratio", showlegend=False)

滚动标准分数能够更有效的捕捉极端市场情绪，滚动窗口处于12周左右表现最好。

多头比例虽然也能够捕捉极端市场情绪，但效果较差，这个指标更适合识别市场趋势。

todo
- 研究多头比例能否识别市场趋势，向量化回测检验
- 寻找滚动标准分数的底部
- 计算美元指数和比特币的相关性