### 第1步：导入库并初始化Qlib

这个单元格负责导入所有需要的Python库，并初始化Qlib。`qlib.init()`会连接到你本地的数据源。

In [29]:
# 单元格 1: 导入与初始化
import qlib
import pandas as pd
from qlib.config import REG_CN
from qlib.contrib.data.handler import Alpha158

# 假设您的数据存储在默认路径。如果不是，请使用 qlib.init(provider_uri="你的路径")
# region=REG_CN 用于指定我们分析的是A股市场
qlib.init(region=REG_CN)

print("Qlib 初始化完成。")

[21409:MainThread](2025-08-30 10:39:32,630) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[21409:MainThread](2025-08-30 10:39:32,633) INFO - qlib.Initialization - [__init__.py:79] - qlib successfully initialized based on client settings.
[21409:MainThread](2025-08-30 10:39:32,634) INFO - qlib.Initialization - [__init__.py:81] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/huhao/.qlib/qlib_data/cn_data')}


Qlib 初始化完成。


### 第2步：配置并创建Alpha158数据处理器

在这里，我们定义研究的时间范围和股票池，然后创建一个`Alpha158`处理器的实例。这个实例`h`将是后续我们获取所有数据的入口。

In [31]:
# 单元格 2: 配置并创建数据处理器
data_handler_config = {
    "start_time": "2017-01-01",
    "end_time": "2023-12-31",
    "fit_start_time": "2017-01-01",
    "fit_end_time": "2023-12-31",
    "instruments": "csi300",  # 您可以换成 csi500 或其他股票池
}

print("正在初始化Alpha158数据处理器...")
# 首次运行时，Qlib会计算所有158个因子并缓存结果，这可能需要几分钟时间
h = Alpha158(**data_handler_config)
print("处理器初始化完成。")

正在初始化Alpha158数据处理器...


[21409:MainThread](2025-08-30 10:40:26,339) INFO - qlib.timer - [log.py:127] - Time cost: 16.103s | Loading data Done
[21409:MainThread](2025-08-30 10:40:26,397) INFO - qlib.timer - [log.py:127] - Time cost: 0.026s | DropnaLabel Done
[21409:MainThread](2025-08-30 10:40:26,631) INFO - qlib.timer - [log.py:127] - Time cost: 0.233s | CSZScoreNorm Done
[21409:MainThread](2025-08-30 10:40:26,632) INFO - qlib.timer - [log.py:127] - Time cost: 0.292s | fit & process data Done
[21409:MainThread](2025-08-30 10:40:26,633) INFO - qlib.timer - [log.py:127] - Time cost: 16.396s | Init data Done


处理器初始化完成。


### 第3步：获取因子（Features）和标签（Labels）数据

使用上一步创建的处理器`h`，我们分别调取所有因子和用于评估的标签（即未来收益率）。

In [32]:
# 单元格 3: 获取因子和标签
print("正在从处理器获取因子数据...")
features_df = h.fetch(col_set="feature")

print("正在从处理器获取标签数据...")
labels_df = h.fetch(col_set="label")

print("\n因子数据预览:")
display(features_df.head())

print("\n标签数据预览:")
display(labels_df.head())

print(f"\n数据维度: 因子 {features_df.shape}, 标签 {labels_df.shape}")

正在从处理器获取因子数据...
正在从处理器获取标签数据...

因子数据预览:


Unnamed: 0_level_0,Unnamed: 1_level_0,KMID,KLEN,KMID2,KUP,KUP2,KLOW,KLOW2,KSFT,KSFT2,OPEN0,...,VSUMN5,VSUMN10,VSUMN20,VSUMN30,VSUMN60,VSUMD5,VSUMD10,VSUMD20,VSUMD30,VSUMD60
datetime,instrument,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2017-01-03,SH600000,0.005558,0.016657,0.333653,0.008637,0.518537,0.002462,0.14781,-0.000618,-0.037073,0.994473,...,0.439778,0.410348,0.563252,0.509626,0.502747,0.120444,0.179303,-0.126504,-0.019252,-0.005494
2017-01-03,SH600008,0.009756,0.012195,0.799995,0.002439,0.200005,0.0,0.0,0.007317,0.59999,0.990338,...,0.558911,0.502497,0.546016,0.530332,0.516062,-0.117821,-0.004993,-0.092033,-0.060664,-0.032123
2017-01-03,SH600009,0.003766,0.014313,0.263155,0.004896,0.34211,0.00565,0.394736,0.00452,0.315781,0.996248,...,0.388259,0.520077,0.55403,0.509252,0.499738,0.223482,-0.040153,-0.10806,-0.018503,0.000524
2017-01-03,SH600010,0.003532,0.014134,0.249907,0.010601,0.750093,0.0,0.0,-0.007069,-0.500186,0.99648,...,0.257201,0.328895,0.513838,0.505262,0.488866,0.485598,0.342211,-0.027676,-0.010524,0.022268
2017-01-03,SH600015,0.022957,0.033976,0.675675,0.010101,0.297312,0.000918,0.027013,0.013773,0.405377,0.977559,...,0.273855,0.373894,0.531979,0.528549,0.499011,0.452289,0.252213,-0.063958,-0.057097,0.001978



标签数据预览:


Unnamed: 0_level_0,Unnamed: 1_level_0,LABEL0
datetime,instrument,Unnamed: 2_level_1
2017-01-03,SH600000,-0.001831
2017-01-03,SH600008,-0.002398
2017-01-03,SH600009,0.001493
2017-01-03,SH600010,0.00352
2017-01-03,SH600015,-0.007142



数据维度: 因子 (273300, 158), 标签 (273300, 1)


### 第4步：计算所有因子的IC和ICIR

这是核心的计算步骤。我们将遍历每一个因子，计算其与标签之间的每日**斯皮尔曼等级相关系数（Rank IC）**，然后汇总得到IC均值和ICIR。

In [33]:
# 单元格 4: 循环计算IC和ICIR

# 将因子和标签数据合并，方便后续处理
# 默认的标签列名为 LABEL0
label_name = labels_df.columns[0]
all_data = pd.concat([features_df, labels_df], axis=1)

# 获取所有因子的名称列表
factor_names = features_df.columns

ic_records = []
print(f"开始为 {len(factor_names)} 个因子计算IC，请稍候...")

# 按日期对数据进行分组
grouped = all_data.groupby('datetime')

# 循环计算
for factor_name in factor_names:
    # 使用 apply 方法计算每个截面（每天）的秩相关系数
    daily_ic = grouped.apply(
        lambda df: df[factor_name].corr(df[label_name], method='spearman')
    )
    
    # 计算IC均值和IC标准差
    mean_ic = daily_ic.mean()
    std_ic = daily_ic.std()
    
    # 计算信息比率 (ICIR)，处理分母为0的特殊情况
    icir = mean_ic / std_ic if std_ic != 0 and pd.notna(std_ic) else 0
    
    ic_records.append({
        "Factor": factor_name,
        "IC Mean": mean_ic,
        "IC Std": std_ic,
        "ICIR": icir,
    })

# 将结果列表转换为DataFrame
ic_df = pd.DataFrame(ic_records)

print("\n所有因子的IC计算完成！")
display(ic_df.head())

开始为 158 个因子计算IC，请稍候...


  return spearmanr(a, b)[0]



所有因子的IC计算完成！


Unnamed: 0,Factor,IC Mean,IC Std,ICIR
0,KMID,-0.015946,0.162731,-0.09799
1,KLEN,-0.012875,0.178771,-0.072019
2,KMID2,-0.013311,0.151049,-0.088122
3,KUP,-0.008987,0.129795,-0.069236
4,KUP2,-0.004212,0.105381,-0.03997


### 第5步：排序并展示结果

最后，我们将上一步计算出的结果进行排序，找出表现最好的因子。通常我们会关注两个排序结果：

1.  **按ICIR排序**：综合考虑了因子的有效性（IC Mean）和稳定性（IC Std）。
2.  **按IC均值的绝对值排序**：只关心因子的预测能力有多强，不关心方向（正向或反向）。

<!-- end list -->

In [34]:
# 单元格 5: 排序并展示最终结果

# 增加一列用于按IC绝对值排序
ic_df['Abs IC Mean'] = ic_df['IC Mean'].abs()

# 1. 按ICIR降序排序
sorted_by_icir = ic_df.sort_values(by="ICIR", ascending=False).reset_index(drop=True)

# 2. 按IC均值的绝对值降序排序
sorted_by_abs_ic = ic_df.sort_values(by="Abs IC Mean", ascending=False).reset_index(drop=True)

print("--- 因子有效性分析报告 (Alpha158) ---")

print("\n【按 ICIR (信息比率) 降序排序 Top 20】")
display(sorted_by_icir.head(20))

print("\n【按 IC均值绝对值 降序排序 Top 20】")
display(sorted_by_abs_ic.head(20))

--- 因子有效性分析报告 (Alpha158) ---

【按 ICIR (信息比率) 降序排序 Top 20】


Unnamed: 0,Factor,IC Mean,IC Std,ICIR,Abs IC Mean
0,CNTN5,0.022244,0.134494,0.165393,0.022244
1,ROC5,0.029219,0.183151,0.159537,0.029219
2,IMIN5,0.019895,0.131917,0.150815,0.019895
3,MIN5,0.024367,0.166307,0.146521,0.024367
4,MA5,0.025581,0.176412,0.145006,0.025581
5,LOW0,0.022003,0.154952,0.141996,0.022003
6,QTLD5,0.023709,0.16837,0.140813,0.023709
7,QTLD10,0.024673,0.177975,0.138631,0.024673
8,MA10,0.025957,0.18985,0.136725,0.025957
9,QTLU5,0.023212,0.173357,0.133897,0.023212



【按 IC均值绝对值 降序排序 Top 20】


Unnamed: 0,Factor,IC Mean,IC Std,ICIR,Abs IC Mean
0,BETA5,-0.030234,0.177871,-0.169976,0.030234
1,ROC5,0.029219,0.183151,0.159537,0.029219
2,MA10,0.025957,0.18985,0.136725,0.025957
3,MA5,0.025581,0.176412,0.145006,0.025581
4,QTLD10,0.024673,0.177975,0.138631,0.024673
5,RESI60,-0.024435,0.179999,-0.13575,0.024435
6,MIN5,0.024367,0.166307,0.146521,0.024367
7,RANK5,-0.024164,0.151519,-0.159477,0.024164
8,CORD5,-0.023945,0.12882,-0.185879,0.023945
9,QTLD5,0.023709,0.16837,0.140813,0.023709
