# 计算测试集图像语义特征

抽取Pytorch训练得到的图像分类模型中间层的输出特征，作为输入图像的语义特征。

计算测试集所有图像的语义特征，使用t-SNE和UMAP两种降维方法降维至二维和三维，可视化。

分析不同类别的语义距离、异常数据、细粒度分类、高维数据结构。

同济子豪兄：https://space.bilibili.com/1900783

[代码运行云GPU环境](https://featurize.cn/?s=d7ce99f842414bfcaea5662a97581bd1)：GPU RTX 3060、CUDA v11.2

## 导入工具包

In [1]:
from tqdm import tqdm

import pandas as pd
import numpy as np

import torch

import cv2
from PIL import Image

# 忽略烦人的红色提示
import warnings
warnings.filterwarnings("ignore")

# 有 GPU 就用 GPU，没有就用 CPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('device', device)

device cuda:0


## 图像预处理

In [2]:
from torchvision import transforms

# # 训练集图像预处理：缩放裁剪、图像增强、转 Tensor、归一化
# train_transform = transforms.Compose([transforms.RandomResizedCrop(224),
#                                       transforms.RandomHorizontalFlip(),
#                                       transforms.ToTensor(),
#                                       transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
#                                      ])

# 测试集图像预处理-RCTN：缩放、裁剪、转 Tensor、归一化
test_transform = transforms.Compose([transforms.Resize(256),
                                     transforms.CenterCrop(224),
                                     transforms.ToTensor(),
                                     transforms.Normalize(
                                         mean=[0.485, 0.456, 0.406], 
                                         std=[0.229, 0.224, 0.225])
                                    ])

## 导入训练好的模型

In [3]:
model = torch.load('checkpoints/fruit30_pytorch_20220814.pth')
model = model.eval().to(device)

## 抽取模型中间层输出结果作为语义特征

In [4]:
from torchvision.models.feature_extraction import create_feature_extractor

In [5]:
model_trunc = create_feature_extractor(model, return_nodes={'avgpool': 'semantic_feature'})

## 计算单张图像的语义特征

In [6]:
img_path = 'fruit30_split/val/菠萝/105.jpg'
img_pil = Image.open(img_path)
input_img = test_transform(img_pil) # 预处理
input_img = input_img.unsqueeze(0).to(device)
# 执行前向预测，得到指定中间层的输出
pred_logits = model_trunc(input_img) 

In [7]:
pred_logits['semantic_feature'].squeeze().detach().cpu().numpy().shape

(512,)

In [None]:
pred_logits['semantic_feature'].squeeze().detach().cpu().numpy()

## 载入测试集图像分类结果

In [10]:
df = pd.read_csv('测试集预测结果.csv')

In [11]:
df.head()

Unnamed: 0,图像路径,标注类别ID,标注类别名称,top-1-预测ID,top-1-预测名称,top-2-预测ID,top-2-预测名称,top-3-预测ID,top-3-预测名称,top-n预测正确,...,草莓-预测置信度,荔枝-预测置信度,菠萝-预测置信度,葡萄-白-预测置信度,葡萄-红-预测置信度,西瓜-预测置信度,西红柿-预测置信度,车厘子-预测置信度,香蕉-预测置信度,黄瓜-预测置信度
0,fruit30_split/val/哈密瓜/106.jpg,0,哈密瓜,4,柚子,5,柠檬,7,梨,False,...,1.815084e-07,1e-06,3.243423e-06,1.1e-05,6e-06,0.000116,0.0001286697,4.142584e-07,5e-06,6.217669e-07
1,fruit30_split/val/哈密瓜/109.jpg,0,哈密瓜,6,桂圆,0,哈密瓜,8,椰子,True,...,7.804896e-08,1e-06,9.750311e-07,0.001511,4.3e-05,0.000157,6.638699e-07,3.048453e-06,3.2e-05,2.386899e-06
2,fruit30_split/val/哈密瓜/114.jpg,0,哈密瓜,0,哈密瓜,26,西红柿,23,葡萄-白,True,...,0.00933481,0.007176,0.001038816,0.037528,0.034992,0.001578,0.265402,0.0001620361,0.005669,0.001115545
3,fruit30_split/val/哈密瓜/116.jpg,0,哈密瓜,0,哈密瓜,16,芒果,4,柚子,True,...,3.197652e-05,0.000254,6.003276e-05,0.001584,3e-06,0.00028,0.0007256652,2.260151e-07,0.021936,0.0003845498
4,fruit30_split/val/哈密瓜/118.png,0,哈密瓜,4,柚子,11,猕猴桃,23,葡萄-白,False,...,0.0007075434,6.8e-05,7.408392e-05,0.115253,0.000762,0.0004,0.00289347,2.952121e-08,0.000335,0.0004361433


## 计算测试集每张图像的语义特征

In [12]:
encoding_array = []
img_path_list = []

for img_path in tqdm(df['图像路径']):
    img_path_list.append(img_path)
    img_pil = Image.open(img_path).convert('RGB')
    input_img = test_transform(img_pil).unsqueeze(0).to(device) # 预处理
    feature = model_trunc(input_img)['semantic_feature'].squeeze().detach().cpu().numpy() # 执行前向预测，得到 avgpool 层输出的语义特征
    encoding_array.append(feature)
encoding_array = np.array(encoding_array)

100%|██████████| 1078/1078 [01:22<00:00, 13.13it/s]


In [13]:
encoding_array.shape

(1078, 512)

## 保存为本地的.npy文件

In [14]:
# 保存为本地的 npy 文件
np.save('测试集语义特征.npy', encoding_array)