## 获取待提取特征的文件

提供两种批量处理的模式：
1. 目录模式，提取指定目录下的所有jpg文件的特征。
2. 文件模式，待提取的数据存储在文件中，每行一个样本。

当然也可以在最后自己指定手动提取指定若干文件。

In [1]:
import os
# 目录模式
mydir = r'Y:\20241231-ChenYeJi\vnet_crop'
# mydir = r'C:\Users\onekey\Project\OnekeyDS\CT\full'
directory = os.path.expanduser(mydir)
test_samples = [os.path.join(directory, p) for p in os.listdir(directory) if p.endswith('.png') or p.endswith('.jpg')]

# 文件模式
# test_file = ''
# with open(test_file) as f:
#     test_samples = [l.strip() for l in f.readlines()]

# 自定义模式
# test_sampleses = ['path2jpg']
test_samples

['Y:\\20241231-ChenYeJi\\vnet_crop\\1010.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1021.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1022.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1025.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1030.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1033.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1040.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1051.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1052.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1053.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1061.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1063.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1069.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1075.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1078.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1085.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1092.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1103.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_crop\\1107.nii.png',
 'Y:\\20241231-ChenYeJi\\vnet_c

## 确定提取特征

通过关键词获取要提取那一层的特征。

### 支持的模型名称

模型名称替换代码中的 `model_name`变量的值。

| **模型系列** | **模型名称**                                                 |
| ------------ | ------------------------------------------------------------ |
| AlexNet      | alexnet                                                      |
| VGG          | vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19_bn, vgg19 |
| ResNet       | resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d, wide_resnet50_2, wide_resnet101_2 |
| DenseNet     | densenet121, densenet169, densenet201, densenet161           |
| Inception    | googlenet, inception_v3                                      |
| SqueezeNet   | squeezenet1_0, squeezenet1_1                                 |
| ShuffleNetV2 | shufflenet_v2_x2_0, shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5 |
| MobileNet    | mobilenet_v2, mobilenet_v3_large, mobilenet_v3_small         |
| MNASNet      | mnasnet0_5, mnasnet0_75, mnasnet1_0, mnasnet1_3              |

In [3]:
from onekey_algo.custom.components.comp2 import extract, print_feature_hook, reg_hook_on_module, \
    init_from_model, init_from_onekey

model, transformer, device = init_from_onekey(r'Y:\20241231-ChenYeJi\vnet_models_full\8\resnet101\viz')
for n, m in model.named_modules():
    print('Feature name:', n, "|| Module:", m)

Feature name:  || Module: ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kerne

## 提取特征

`Feature name:` 之后的名称为要提取的特征名，例如`layer3.0.conv2`, 一般深度学习特征提取最后一层，例如`avgpool`

In [4]:
from functools import partial
feature_name = 'avgpool'
with open('features/feature.csv', 'w') as outfile:
    hook = partial(print_feature_hook, fp=outfile)
    find_num = reg_hook_on_module(feature_name, model, hook)
    results = extract(test_samples, model, transformer, device, fp=outfile)

## 读取数据

In [5]:
import pandas as pd
features = pd.read_csv('features/feature.csv', header=None)
features.columns=['ID'] + list(features.columns[1:])
features.head()

Unnamed: 0,ID,1,2,3,4,5,6,7,8,9,...,2039,2040,2041,2042,2043,2044,2045,2046,2047,2048
0,1010.nii.png,0.595,0.352,0.628,0.686,0.136,0.244,0.896,0.355,0.353,...,0.625,0.597,0.56,1.103,0.196,1.1,0.319,0.184,1.006,0.301
1,1021.nii.png,0.557,0.161,0.918,0.41,0.156,1.004,0.418,0.36,1.127,...,1.406,0.134,0.247,0.177,0.186,0.147,0.182,0.885,0.335,0.504
2,1022.nii.png,0.5,0.577,1.06,0.267,0.958,0.418,0.425,1.388,0.233,...,0.37,3.28,0.816,0.713,1.359,0.299,0.741,0.117,0.111,0.797
3,1025.nii.png,1.136,0.292,0.44,0.284,0.137,0.272,0.613,0.383,1.129,...,0.633,0.112,0.332,0.238,0.39,0.084,0.303,0.659,0.278,0.243
4,1030.nii.png,1.336,0.798,1.0,0.093,0.204,0.658,0.524,1.374,1.305,...,0.954,1.602,0.872,0.477,0.342,0.617,0.837,0.26,0.862,0.19


### 深度特征压缩

深度学习特征压缩，注意压缩到的维度需要小于样本数

```python
def compress_df_feature(features: pd.DataFrame, dim: int, not_compress: Union[str, List[str]] = None,
                        prefix='') -> pd.DataFrame:
    """
    压缩深度学习特征
    Args:
        features: 特征DataFrame
        dim: 需要压缩到的维度，此值需要小于样本数
        not_compress: 不进行压缩的列。
        prefix: 所有特征的前缀。

    Returns:

    """
```

In [6]:
from onekey_algo.custom.components.comp1 import compress_df_feature

cm_features = compress_df_feature(features=features, dim=256, prefix='DL_', not_compress='ID')
cm_features.to_csv('features/compress_features.csv', header=True, index=False)