# 中证1000全因子综合评估总结（Top10）

本 Notebook 总结了对中证1000股票池中**全部可用因子**的批量分析结果，并给出综合评分 Top10。

**分析批次日期**: 2026-02-14
**收益口径**: `close2close_next`（次日收盘到T+2收盘）
**分层数**: 5
**分析区间**: 2026-01-01 ~ 2026-02-10

**Top10 因子（综合评分）**
1. `micro_price` (high_frequency) - score=0.9354
2. `effective_spread` (high_frequency) - score=0.8958
3. `bid_ask_spread` (high_frequency) - score=0.8750
4. `trade_imbalance` (high_frequency) - score=0.8042
5. `order_imbalance` (high_frequency) - score=0.7906
6. `depth_imbalance` (high_frequency) - score=0.7719
7. `pe_ttm` (fundamental) - score=0.5521
8. `trade_flow_intensity` (high_frequency) - score=0.5062
9. `circ_mv` (fundamental) - score=0.4854
10. `vwap_deviation` (high_frequency) - score=0.4792


In [None]:
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8')
ROOT = Path('..') if Path.cwd().name == 'notebook' else Path('.')
REPORT_DIR = ROOT / 'factor_analysis_results' / 'reports'
ALL_PATH = REPORT_DIR / 'zz1000_all_factors_analysis_20260214.csv'
SCORED_PATH = REPORT_DIR / 'zz1000_factor_scored_20260214.csv'
TOP10_PATH = REPORT_DIR / 'zz1000_factor_top10_20260214.csv'
ALL_PATH, SCORED_PATH, TOP10_PATH

In [None]:
all_df = pd.read_csv(ALL_PATH)
scored_df = pd.read_csv(SCORED_PATH)
top10_df = pd.read_csv(TOP10_PATH)

print('all factors rows:', len(all_df))
print('scored factors rows:', len(scored_df))
print('top10 rows:', len(top10_df))
print('source distribution in all results:')
print(all_df['source'].value_counts())

In [None]:
# 查看综合评分前10
cols = ['factor','source','composite_score','ic_mean','ic_ir','ls_sharpe','ls_total_return','ls_win_rate','ic_count','ls_count']
top10_df[cols]

In [None]:
# 图1: Top10综合评分
plot_df = top10_df.sort_values('composite_score', ascending=True).copy()
labels = plot_df['factor'] + ' (' + plot_df['source'] + ')'
plt.figure(figsize=(10, 6))
plt.barh(labels, plot_df['composite_score'], color='#0f5c9a')
plt.title('Top10 因子综合评分')
plt.xlabel('Composite Score')
plt.tight_layout()
plt.show()

In [None]:
# 图2: Top10关键指标对比（标准化）
mcols = ['ic_mean','ic_ir','ls_sharpe','ls_total_return','ls_win_rate']
m = top10_df[mcols].copy()
m['ic_mean'] = m['ic_mean'].abs()
m['ic_ir'] = m['ic_ir'].abs()
m_norm = (m - m.min()) / (m.max() - m.min() + 1e-12)
m_norm.index = top10_df['factor']

plt.figure(figsize=(12, 6))
for c in m_norm.columns:
    plt.plot(m_norm.index, m_norm[c], marker='o', label=c)
plt.xticks(rotation=35, ha='right')
plt.title('Top10 因子关键指标（0-1标准化）')
plt.ylabel('Normalized Value')
plt.legend()
plt.grid(alpha=0.25)
plt.tight_layout()
plt.show()

## 综合评分方法

先对每个指标做百分位排名（越大越好），再加权求和：

- `|ic_mean|` 权重 0.25
- `|ic_ir|` 权重 0.20
- `ic_positive_ratio` 权重 0.10
- `ls_sharpe` 权重 0.20
- `ls_total_return` 权重 0.15
- `ls_win_rate` 权重 0.10

并过滤：`ic_count >= 3` 且 `ls_count > 0`。

In [None]:
# 可选：如果要重跑整套全因子分析，可执行以下命令（耗时）
# !python scripts/run_full_factor_ranking.py
print('当前Notebook默认读取已生成结果文件；如需重跑，请按项目脚本重新执行批量分析。')