# 常规赛：PALM病理性近视预测

**赛题简述**

	PALM病理性近视预测常规赛的重点是研究和发展与病理性近视诊断相关的算法。该常规赛的目标是评估和比较在一个常见的视网膜眼底图像数据集上检测病理性近视的自动算法。具体任务是将提供的图像分为病理性近视眼底彩照和非病理性近视眼底彩照，其中，非病理性近视眼底彩照包括正常眼底和高度近视眼底彩照。
    
![](https://ai-studio-static-online.cdn.bcebos.com/049a11b2a0d6459bbea817529d74be64b69ab7357c8f4990bc56a3b830aa6def)


**数据简介**

	PALM病理性近视预测常规赛由中山大学中山眼科中心提供800张带病理性近视分类标注的眼底彩照供选手训练模型，另提供400张带标注数据供平台进行模型测试。本次常规赛提供的病理性近视分类金标准是从临床报告中获取，不仅基于眼底彩照，还结合了OCT、视野检查等结果。

**数据基本标签**

    非病理：0
    病理  ：1

**训练数据集**

文件名称：Train

	Train文件夹里有一个fundus_image文件夹和一个Classification.xlsx文件。fundus_image文件夹中数据均为眼底彩照，分辨率为1444×1444，或2124×2056。命名形如N0001.jpg、H0001.jpg、P0001.jpg和V0001.jpg。Classification.xlsx文件中为各眼底图像是否属于病理性近视，属于为1，不属于为0。

**测试数据集**

文件名称：PALM-Testing400-Images 

	文件夹里包含400张眼底彩照，命名形如T0001.jpg。


In [None]:
#解压数据
!unzip -oq /home/aistudio/data/data171620/data.zip

In [None]:
!mv '常规赛：PALM病理性近视预测' dataset

In [3]:
#导包
import paddle
import numpy as np
from paddle.vision import transforms
from paddle import optimizer as optim
from paddle.io import Dataset
import matplotlib.pylab as plt
import os
import pandas as pd
from PIL import Image
import warnings
warnings.filterwarnings("ignore")
print("paddle's version:", paddle.__version__)

  from collections import MutableMapping
  from collections import Iterable, Mapping
  from collections import Sized


paddle's version: 2.3.2


In [1]:
#安装paddlex
! pip install paddlex -i https://mirror.baidu.com/pypi/simple

Looking in indexes: https://mirror.baidu.com/pypi/simple


In [2]:
# 划分训练集和测试集
import pandas as pd
import random

train_excel_file = 'dataset/Train/Classification.xlsx'
pd_list=pd.read_excel(train_excel_file)

pd_list_lenght=len(pd_list)
# 乱序
pd_list=pd_list.sample(frac=1)
offset=int(pd_list_lenght*0.9)
trian_list=pd_list[:offset]
eval_list=pd_list[offset:]
trian_list.to_csv("train_list.txt", index=None, header=None, sep='\t')
eval_list.to_csv("eval_list.txt", index=None, header=None, sep='\t')

In [4]:
# 设置使用0号GPU卡（如无GPU，执行此代码后仍然会使用CPU训练模型）
import matplotlib
matplotlib.use('Agg') 
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

[10-15 22:08:01 MainThread @utils.py:79] WRN paddlepaddle version: 2.3.2. The dynamic graph version of PARL is under development, not fully tested and supported




In [5]:
import paddlex as pdx
from paddlex import transforms as T

# 定义训练和验证时的transforms
# API说明：https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/transforms/transforms.md

train_transforms = T.Compose([
    T.RandomCrop(crop_size=1440),#随机裁剪图像
    #T.CenterCrop(crop_size=1440),#以图像中心点扩散裁剪长宽为目标尺寸的正方形
    T.RandomHorizontalFlip(0.6), #以一定的概率对图像进行随机水平翻转。
    T.RandomVerticalFlip(0.2),
    T.Normalize()])

eval_transforms = T.Compose([
    T.ResizeByShort(short_size=1440), #根据图像的短边调整图像大小。
    T.CenterCrop(crop_size=1440), 
    T.Normalize()])#对图像进行标准化

In [6]:
# 定义训练和验证所用的数据集
# API说明：https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/datasets.md
train_dataset = pdx.datasets.ImageNet(
    data_dir='dataset/Train/fundus_image',
    file_list='train_list.txt',
    label_list='labels.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
    data_dir='dataset/Train/fundus_image',
    file_list='eval_list.txt',
    label_list='labels.txt',
    transforms=eval_transforms)

2022-10-15 22:08:09 [INFO]	Starting to read file list from dataset...
2022-10-15 22:08:09 [INFO]	720 samples in file train_list.txt
2022-10-15 22:08:09 [INFO]	Starting to read file list from dataset...
2022-10-15 22:08:09 [INFO]	80 samples in file eval_list.txt


In [7]:
# 初始化模型，并进行训练
# 可使用VisualDL查看训练指标，参考https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/visualdl.md
num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV3_small(num_classes=num_classes)

optimizer = optim.AdamW(learning_rate=3e-4, parameters=model.parameters())

#model.prepare(
# optimizer=optimizer, #指定优化器
# loss=paddle.nn.CrossEntropyLoss(), #指定损失函数
# metrics=paddle.metric.Accuracy()#指定评估方法
#) 



#早停机制，在eval_acc不增大10个epoch时停止训练并保存最佳模型
early_stop = paddle.callbacks.EarlyStopping(
                                            'loss',
                                            mode='auto',
                                            patience=6,
                                            verbose=1,
                                            min_delta=0,
                                            baseline=None,
                                            save_best_model=True)


# API说明：https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/models/classification.md
# 各参数介绍与调整说明：https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/parameters.md
model.train(
    num_epochs=10, #训练迭代轮数
    train_dataset=train_dataset,#训练数据集
    train_batch_size=64,#训练数据batch大小。同时作为验证数据batch大小。默认为64,之前为8
    eval_dataset=eval_dataset,#评估数据集
    lr_decay_epochs=[4, 6, 8], #默认优化器的学习率衰减轮数。
    lr_decay_gamma =0.1,# 默认优化器的学习率衰减率。默认为0.1。
    learning_rate=0.025,# 默认优化器的初始学习率。默认为0.025。
    save_dir='output/mobilenetv3_small_1015_2',
    use_vdl=True)

2022-10-15 22:08:50 [INFO]	Loading pretrained model from output/mobilenetv3_small_1015_2/pretrain/MobileNetV3_small_x1_0_pretrained.pdparams
2022-10-15 22:08:50 [INFO]	There are 212/214 variables loaded into MobileNetV3_small_x1_0.


# 模型预测
载入想要使用的模型文件，对测试数据进行预测。

In [None]:
# 单张预测测试
import paddlex as pdx
model = pdx.load_model('output/mobilenetv3_small_1015_1/epoch_9')
image_name = 'dataset/PALM-Testing400-Images/T0001.jpg'
result = model.predict(image_name, topk=2)
print("Predict Result:", result)


image_name = 'dataset/PALM-Testing400-Images/T0002.jpg'
result = model.predict(image_name, topk=2)
print("Predict Result:", result)

2022-10-15 21:31:26 [INFO]	Model[MobileNetV3_small] loaded.
Predict Result: [{'category_id': 1, 'category': '病理性近视眼底', 'score': 0.9992028}, {'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.0007972635}]
Predict Result: [{'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.97810304}, {'category_id': 1, 'category': '病理性近视眼底', 'score': 0.021896938}]


In [None]:
# 预测数据集val_list
val_list=[]
for i in range(1,401,1):
# for i in range(1,201,1):
    filename='T'+ str(i).zfill(4)+'.jpg'
    # print(filename)
    val_list.append(filename+'\n')

with open('val_list.txt','w') as f:
    f.writelines(val_list)
    
val_list=[]
with open('val_list.txt', 'r') as f:
    for line in f:
        line='dataset/PALM-Testing400-Images/'+line
        val_list.append(line.split('\n')[0])
        # print(line.split('\n')[0])

print(len(val_list))
print(val_list)


In [None]:
#批量预测
import paddlex as pdx

result_list=[]
model = pdx.load_model('output/mobilenetv3_small_1015_1/best_model')
for image_name in val_list:
    result = model.predict(image_name, topk=2)
    result_list.append(result)
    print("Predict Result:", result)

2022-10-15 21:31:39 [INFO]	Model[MobileNetV3_small] loaded.
Predict Result: [{'category_id': 1, 'category': '病理性近视眼底', 'score': 0.9998215}, {'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.00017851777}]
Predict Result: [{'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.991299}, {'category_id': 1, 'category': '病理性近视眼底', 'score': 0.008700976}]
Predict Result: [{'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.99845064}, {'category_id': 1, 'category': '病理性近视眼底', 'score': 0.0015493007}]
Predict Result: [{'category_id': 1, 'category': '病理性近视眼底', 'score': 0.9914307}, {'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.008569262}]
Predict Result: [{'category_id': 1, 'category': '病理性近视眼底', 'score': 0.9996407}, {'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.00035926126}]
Predict Result: [{'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.97740763}, {'category_id': 1, 'category': '病理性近视眼底', 'score': 0.022592394}]
Predict Result: [{'category_id': 1, 'category': '病理性近视眼底',

In [None]:
#结果检查
item = result_list[0]
print(item)
print(item[0]['category_id'],item[0]['score'])
print(item[1]['category_id'],item[1]['score'])

[{'category_id': 1, 'category': '病理性近视眼底', 'score': 0.9998215}, {'category_id': 0, 'category': '非病理性近视眼底', 'score': 0.00017851777}]
1 0.9998215
0 0.00017851777


# 构造pandas dataframe

In [None]:
# 结果列
pd_B=[]
for item in result_list:
    # print(item)
    if item[0]['category_id']==1:
        pd_B.append(item[0]['score'])
    else:
        pd_B.append(item[1]['score'])

In [None]:

# 文件名列
pd_A=[]
with open('val_list.txt', 'r') as f:
    for line in f:
        pd_A.append(line.split('\n')[0])
        # print(line.split('\n')[0])

In [None]:

# 构造pandas的DataFrame
import pandas as pd
df= pd.DataFrame({'FileName': pd_A, 'PM Risk':pd_B})

In [None]:
# 保存为提交文件
df.to_csv("1015-结果1/Classification_Results.csv", index=None)

In [None]:
!zip -q Classification_Results.zip Classification_Results.csv

# 总结

mobilenetv3为轻量级网络，下个月可以试一下更大的网络，比如inception_v3
## 参考资料

### paddle vision：

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/vision/Overview_cn.html

### paddle 图像增强：

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/vision/transforms/adjust_brightness_cn.html

### k-fold 验证：

图灵丛书《Deep Learning with Python》，中文名《Python 深度学习》

### resnet 论文：

https://arxiv.org/abs/1512.03385

### 病理性近视预测相关论文：

https://aistudio.baidu.com/aistudio/competition/detail/85