aistudio上[here](https://aistudio.baidu.com/aistudio/projectdetail/5056816?contributionType=1)
# VisualDL2.0应用案例--眼疾识别训练可视化

本项目将基于眼疾分类数据集[iChallenge-PM](https://ai.baidu.com/broad/introduction)，介绍如何运用飞桨可视化分析工具--VisualDL对模型训练过程进行可视化分析。

VisualDL是深度学习模型可视化分析工具，以丰富的图表呈现训练参数变化趋势、模型结构、数据样本、高维数据分布等。帮助用户清晰直观地理解模型训练过程及模型结构，进而实现高效的模型调优。VisualDL的具体介绍请参考：[GitHub](https://github.com/PaddlePaddle/VisualDL)、[VisualDL使用指南](https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/components/README.md)。

iChallenge-PM是百度大脑和中山大学中山眼科中心联合举办的iChallenge比赛中，提供的关于病理性近视（Pathologic Myopia，PM）的医疗类数据集，包含1200个受试者的眼底视网膜图片，训练、验证和测试数据集各400张。下面我们将详细介绍如何使用VisualDL进行：

- 创建日志文件
- 实时训练参数可视化
- 展示多组实验训练参数对比
- 训练数据中间状态可视化

## VisualDL2.0使用步骤

- 创建日志文件`LogWriter`，设置实验结果存放路径，以'./log'为例
- 训练过程中插入数据打点语句，将结果储存至日志文件中
- 复制项目网址，并将网址中'notebooks'后的内容替换为'visualdl'（包含'notebook'），例如：
	* 将'https://aistudio.baidu.com/bdvgpu/user/220109/502834/notebooks/502834.ipynb?redirects=1'更改为'https://aistudio.baidu.com/bdvgpu/user/220109/502834/visualdl'
- 打开浏览器输入网址即可启动VisualDL实现可视化


***注意：使用VisualDL2.0需要Paddle版本1.8及以上**

### 数据集准备
/home/aistudio/data/data19065 目录包括如下三个文件，解压缩后存放在/home/aistudio/work/palm目录下。
- training.zip：包含训练中的图片和标签
- validation.zip：包含验证集的图片
- valid_gt.zip：包含验证集的标签


------
**注意**：

valid_gt.zip文件解压缩之后，需要将/home/aistudio/work/palm/PALM-Validation-GT/目录下的PM_Label_and_Fovea_Location.xlsx文件转存成csv格式，本节代码示例中已经提前转成文件labels.csv。

------



In [None]:
# 初次运行时将注释取消，以便解压文件
# 如果已经解压过了，则不需要运行此段代码，否则文件已经存在解压会报错
!unzip -o -q -d /home/aistudio/work/palm /home/aistudio/data/data19469//training.zip
%cd /home/aistudio/work/palm/PALM-Training400/
!unzip -o -q PALM-Training400.zip
!unzip -o -q -d /home/aistudio/work/palm /home/aistudio/data/data19469/validation.zip
!unzip -o -q -d /home/aistudio/work/palm /home/aistudio/data/data19469/valid_gt.zip

^C
/home/aistudio/work/palm/PALM-Training400


### 使用VisualDL查看数据集

iChallenge-PM中既有病理性近视患者的眼底图片，也有非病理性近视患者的图片，命名规则如下：

- 病理性近视（PM）：文件名以P开头

- 非病理性近视（non-PM）：

  * 高度近似（high myopia）：文件名以H开头
  
  * 正常眼睛（normal）：文件名以N开头

我们将病理性患者的图片作为正样本，标签为1； 非病理性患者的图片作为负样本，标签为0。从数据集中选取两张图片，通过LeNet提取特征，构建分类器，对正负样本进行分类，并将图片在VisualDL中显示出来。代码如下所示：



In [None]:
!pip install --upgrade pillow==9.1.0  -i https://mirror.baidu.com/pypi/simple

In [None]:
!pip install --upgrade visualdl  -i https://mirror.baidu.com/pypi/simple

In [12]:
!pip show pillow

Name: Pillow
Version: 9.1.0
Summary: Python Imaging Library (Fork)
Home-page: https://python-pillow.org
Author: Alex Clark (PIL Fork Author)
Author-email: aclark@python-pillow.org
License: HPND
Location: /home/aistudio/.data/webide/pip/lib/python3.7/site-packages
Requires: 
Required-by: imageio, paddlehub, paddlepaddle-gpu, sahi, streamlit, tb-paddle, visualdl


In [1]:
import numpy as np
from PIL import Image
from visualdl import LogWriter


def random_crop(img):
    """获取图片的随机 100x100 分片
    """
    img = Image.open(img)
    w, h = img.size
    random_w = np.random.randint(0, w - 100)
    random_h = np.random.randint(0, h - 100)
    # 生成HWC格式的图片
    r = img.crop((random_w, random_h, random_w + 100, random_h + 100))
    return np.asarray(r)


if __name__ == '__main__':
    # 初始化一个记录器
    with LogWriter(logdir="./log/image_test/train") as writer:
        for step in range(6):
            # 添加一个图片数据
            writer.add_image(tag="eye",
                             img=random_crop("/home/aistudio/work/palm/PALM-Training400/PALM-Training400/H0001.jpg"),
                             step=step)

In [2]:
import numpy as np
from PIL import Image
from visualdl import LogWriter
import os

#确保路径为'/home/aistudio'
os.chdir('/home/aistudio')

#创建 LogWriter 对象，将图像数据存放在 `./log/train`路径下
from visualdl import LogWriter
log_writer = LogWriter("./log/train")

#导入所需展示的图片
img1 = Image.open('work/palm/PALM-Training400/PALM-Training400/N0012.jpg')
img2 = Image.open('work/palm/PALM-Training400/PALM-Training400/P0095.jpg')

#将图片转化成array格式
img_n1=np.asarray(img1)
img_n2=np.asarray(img2)

#将图片数据打点至日志文件
log_writer.add_image(tag='图像样本/正样本',img=img_n2, step=5)
log_writer.add_image(tag='图像样本/负样本',img=img_n1, step=5)

启动VisualDL面板，查看图像数据：


![](https://ai-studio-static-online.cdn.bcebos.com/5275ceb8083d4b2aa8e3f2bdc510a1625d6613da1c40412e81a62d0c7fca41e4)





### 定义数据读取器

使用OpenCV从磁盘读入图片，将每张图缩放到$224\times224$大小，并且将像素值调整到$[-1, 1]$之间，代码如下所示：



In [3]:
import cv2
import random
import numpy as np
import os

# 对读入的图像数据进行预处理
def transform_img(img):
    # 将图片尺寸缩放道 224x224
    img = cv2.resize(img, (224, 224))
    # 读入的图像数据格式是[H, W, C]
    # 使用转置操作将其变成[C, H, W]
    img = np.transpose(img, (2,0,1))
    img = img.astype('float32')
    # 将数据范围调整到[-1.0, 1.0]之间
    img = img / 255.
    img = img * 2.0 - 1.0
    return img

# 定义训练集数据读取器
def data_loader(datadir, batch_size=10, mode = 'train'):
    # 将datadir目录下的文件列出来，每条文件都要读入
    filenames = os.listdir(datadir)
    def reader():
        if mode == 'train':
            # 训练时随机打乱数据顺序
            random.shuffle(filenames)
        batch_imgs = []
        batch_labels = []
        for name in filenames:
            filepath = os.path.join(datadir, name)
            img = cv2.imread(filepath)
            img = transform_img(img)
            if name[0] == 'H' or name[0] == 'N':
                # H开头的文件名表示高度近似，N开头的文件名表示正常视力
                # 高度近视和正常视力的样本，都不是病理性的，属于负样本，标签为0
                label = 0
            elif name[0] == 'P':
                # P开头的是病理性近视，属于正样本，标签为1
                label = 1
            else:
                raise('Not excepted file name')
            # 每读取一个样本的数据，就将其放入数据列表中
            batch_imgs.append(img)
            batch_labels.append(label)
            if len(batch_imgs) == batch_size:
                # 当数据列表的长度等于batch_size的时候，
                # 把这些数据当作一个mini-batch，并作为数据生成器的一个输出
                imgs_array = np.array(batch_imgs).astype('float32')
                labels_array = np.array(batch_labels).astype('float32').reshape(-1, 1)
                yield imgs_array, labels_array
                batch_imgs = []
                batch_labels = []

        if len(batch_imgs) > 0:
            # 剩余样本数目不足一个batch_size的数据，一起打包成一个mini-batch
            imgs_array = np.array(batch_imgs).astype('float32')
            labels_array = np.array(batch_labels).astype('float32').reshape(-1, 1)
            yield imgs_array, labels_array

    return reader

# 定义验证集数据读取器
def valid_data_loader(datadir, csvfile, batch_size=10, mode='valid'):
    # 训练集读取时通过文件名来确定样本标签，验证集则通过csvfile来读取每个图片对应的标签
    # 请查看解压后的验证集标签数据，观察csvfile文件里面所包含的内容
    # csvfile文件所包含的内容格式如下，每一行代表一个样本，
    # 其中第一列是图片id，第二列是文件名，第三列是图片标签，
    # 第四列和第五列是Fovea的坐标，与分类任务无关
    # ID,imgName,Label,Fovea_X,Fovea_Y
    # 1,V0001.jpg,0,1157.74,1019.87
    # 2,V0002.jpg,1,1285.82,1080.47
    # 打开包含验证集标签的csvfile，并读入其中的内容
    filelists = open(csvfile).readlines()
    def reader():
        batch_imgs = []
        batch_labels = []
        for line in filelists[1:]:
            # print('valid_data_loader:',line)
            line = line.strip().split(',')
            name = line[1]
            label = int(line[2])
            # 根据图片文件名加载图片，并对图像数据作预处理
            filepath = os.path.join(datadir, name)
            img = cv2.imread(filepath)
            img = transform_img(img)
            # 每读取一个样本的数据，就将其放入数据列表中
            batch_imgs.append(img)
            batch_labels.append(label)
            if len(batch_imgs) == batch_size:
                # 当数据列表的长度等于batch_size的时候，
                # 把这些数据当作一个mini-batch，并作为数据生成器的一个输出
                imgs_array = np.array(batch_imgs).astype('float32')
                labels_array = np.array(batch_labels).astype('float32').reshape(-1, 1)
                yield imgs_array, labels_array
                batch_imgs = []
                batch_labels = []

        if len(batch_imgs) > 0:
            # 剩余样本数目不足一个batch_size的数据，一起打包成一个mini-batch
            imgs_array = np.array(batch_imgs).astype('float32')
            labels_array = np.array(batch_labels).astype('float32').reshape(-1, 1)
            yield imgs_array, labels_array

    return reader

## 可视化第一组实验--使用LeNet网络进行眼疾分类

- LeNet网络实现代码如下：

***注意：需要在`forward`函数中导出每一层输出图片并储存于一个'list'中，后续才能将每一层输出写入日志文件进行可视化**

In [4]:
# 导入需要的包
import paddle
import paddle.fluid as fluid
import numpy as np
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, Linear

# 定义 LeNet 网络结构
class LeNet(fluid.dygraph.Layer):
    def __init__(self, num_classes=1):
        super(LeNet, self).__init__()

        # 创建卷积和池化层块，每个卷积层使用Sigmoid激活函数，后面跟着一个2x2的池化
        self.conv1 = Conv2D(num_channels=3, num_filters=6, filter_size=5, act='sigmoid')
        self.pool1 = Pool2D(pool_size=2, pool_stride=2, pool_type='max')
        self.conv2 = Conv2D(num_channels=6, num_filters=16, filter_size=5, act='sigmoid')
        self.pool2 = Pool2D(pool_size=2, pool_stride=2, pool_type='max')
        # 创建第3个卷积层
        self.conv3 = Conv2D(num_channels=16, num_filters=120, filter_size=4, act='sigmoid')
        # 创建全连接层，第一个全连接层的输出神经元个数为64， 第二个全连接层输出神经元个数为分类标签的类别数
        self.fc1 = Linear(input_dim=300000, output_dim=64, act='sigmoid')
        self.fc2 = Linear(input_dim=64, output_dim=num_classes)
    # 网络的前向计算过程，定义输出每一层的结果，后续将结果写入VisualDL日志文件，实现每一层输出图片的可视化
    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.pool1(x1)
        x3 = self.conv2(x2)
        x4 = self.pool2(x3)
        x5 = self.conv3(x4)
        x6 = fluid.layers.reshape(x5, [x5.shape[0], -1])
        x7 = self.fc1(x6)
        x8 = self.fc2(x7)
        conv=[x,x1,x2,x3,x4,x5,x6,x7,x8]
        return x8,conv

In [17]:
# 查看数据形状
DATADIR = '/home/aistudio/work/palm/PALM-Training400/PALM-Training400'
train_loader = data_loader(DATADIR, 
                           batch_size=10, mode='train')
data_reader = train_loader()
data = next(data_reader)
data[0].shape, data[1].shape

((10, 3, 224, 224), (10, 1))

### 训练模型并使用VisualDL可视化训练参数及数据样本
- 创建LeNet日志文件，以便对比其他模型训练参数，代码如下：

log_writer = LogWriter("./log/lenet")

- 训练过程中插入作图语句，展示accuracy和loss的变化趋势：

log_writer.add_scalar(tag='acc', step=iter, value=acc.numpy())

log_writer.add_scalar(tag='loss', step=iter, value=avg_loss.numpy())

- 设计网络向前计算过程时，将每一层的输出储存于名为'conv'的list中，方便后续写入日志文件

- 训练过程中插入作图语句，展示输入图片在每一层网络的输出

log_writer.add_image(tag='input_lenet/conv_1', img=conv[0].numpy(), step=batch_id)

***注意使用相同tag才能实现多组模型实验对比**

#### 完整训练及可视化代码如下：

In [5]:
def convert_out_img(_img):
    # print(_img.shape)    
    _img=_img.numpy()+1    
    if len(_img.shape)==3:_img=np.transpose(_img,(1,2,0))
    _img=_img*127.5
    return _img

In [6]:
# LeNet 识别眼疾图片

import os
import random
import paddle
import paddle.fluid as fluid
import numpy as np
import datetime

#创建日志文件，储存lenet训练结果
date_str=datetime.datetime.strftime(datetime.datetime.now(),'%Y%m%d_%H%M%S')
log_dir=os.path.join('./log/lenet')
log_writer = LogWriter(log_dir)

#定义文件路径
DATADIR = '/home/aistudio/work/palm/PALM-Training400/PALM-Training400'
DATADIR2 = '/home/aistudio/work/palm/PALM-Validation400'
CSVFILE = '/home/aistudio/labels.csv'
from matplotlib import pyplot as plt
# 定义训练过程
def train(model):
    print('train model')
    with fluid.dygraph.guard():
        print('start training ... ')
        model.train()
        epoch_num = 5
        iter=0
        # 定义优化器
        opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters())
        # 定义数据读取器，训练数据读取器和验证数据读取器
        train_loader = data_loader(DATADIR, batch_size=10, mode='train')
        valid_loader = valid_data_loader(DATADIR2, CSVFILE)
        for epoch in range(epoch_num):
            # if epoch>0:break
            for batch_id, data in enumerate(train_loader()):
                # if batch_id>0:break
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                pred = fluid.layers.sigmoid(logits)
                pred2 = pred * (-1.0) + 1.0
                pred = fluid.layers.concat([pred2, pred], axis=1)               
                log_writer.add_image(tag='input_lenet/original', img=convert_out_img(conv[0]),dataformats="NCHW", step=batch_id)             
                log_writer.add_image(tag='input_lenet/pool_1', img=convert_out_img(conv[1][:,:4]),dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_lenet/conv_2', img=convert_out_img(conv[2][:,:4]),dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_lenet/pool_2', img=convert_out_img(conv[3][:,:4]),dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_lenet/conv_3', img=convert_out_img(conv[5][:,:4]),dataformats="NCHW", step=batch_id)
                # log_writer.add_image(tag='input_lenet/reshape', img=conv[6].numpy(), step=batch_id)
                # log_writer.add_image(tag='input_lenet/fc1', img=conv[7].numpy(), step=batch_id)
                # log_writer.add_image(tag='input_lenet/fc2', img=conv[8].numpy(), step=batch_id)
                #计算accuracy
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                # 进行loss计算
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                #训练过程中插入作图语句，当每10个batch训练完成后，将当前损失、准确率作为一个新增的数据点储存到记录器中。
                if batch_id % 10 == 0:
                    log_writer.add_scalar(tag='train/acc', step=iter, value=acc.numpy())
                    log_writer.add_scalar(tag='train/loss', step=iter, value=avg_loss.numpy())

                    iter+=10
                    print("epoch: {}, batch_id: {}, loss is: {} acc {}".format(epoch, batch_id, avg_loss.numpy(),acc.numpy()))
                # 反向传播，更新权重，清除梯度
                avg_loss.backward()
                opt.minimize(avg_loss)
                model.clear_gradients()

            model.eval()
            accuracies = []
            losses = []
            for batch_id, data in enumerate(valid_loader()):
                if batch_id>0:break
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                # 二分类，sigmoid计算后的结果以0.5为阈值分两个类别
                # 计算sigmoid后的预测概率，进行loss计算
                pred = fluid.layers.sigmoid(logits)
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                # 计算预测概率小于0.5的类别
                print('pred',pred.numpy())
                pred2 = pred * (-1.0) + 1.0
                print('pred2',pred2.numpy())
                # 得到两个类别的预测概率，并沿第一个维度级联
                pred = fluid.layers.concat([pred2, pred], axis=1)
                print('concat后，pred',pred.numpy())
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                accuracies.append(acc.numpy())
                losses.append(loss.numpy())
            print("[validation] accuracy/loss: {}/{}".format(np.mean(accuracies), np.mean(losses)))
            model.train()

        # save params of model
        fluid.save_dygraph(model.state_dict(), 'mnist')
        # save optimizer state
        fluid.save_dygraph(opt.state_dict(), 'mnist')


# 定义评估过程
def evaluation(model, params_file_path):
    with fluid.dygraph.guard():
        print('start evaluation .......')
        #加载模型参数
        model_state_dict, _ = fluid.load_dygraph(params_file_path)
        model.load_dict(model_state_dict)

        model.eval()
        eval_loader = load_data('eval')

        acc_set = []
        avg_loss_set = []
        for batch_id, data in enumerate(eval_loader()):
            x_data, y_data = data
            img = fluid.dygraph.to_variable(x_data)
            label = fluid.dygraph.to_variable(y_data)
            # 计算预测和精度
            prediction, acc = model(img, label)
            # 计算损失函数值
            loss = fluid.layers.cross_entropy(input=prediction, label=label)
            avg_loss = fluid.layers.mean(loss)
            acc_set.append(float(acc.numpy()))
            avg_loss_set.append(float(avg_loss.numpy()))
        # 求平均精度
        acc_val_mean = np.array(acc_set).mean()
        avg_loss_val_mean = np.array(avg_loss_set).mean()

        print('loss={}, acc={}'.format(avg_loss_val_mean, acc_val_mean))

# 定义 LeNet 网络结构
class LeNet(fluid.dygraph.Layer):
    def __init__(self, num_classes=1):
        super(LeNet, self).__init__()

        # 创建卷积和池化层块，每个卷积层使用Sigmoid激活函数，后面跟着一个2x2的池化
        self.conv1 = Conv2D(num_channels=3, num_filters=6, filter_size=5, act='sigmoid')
        self.pool1 = Pool2D(pool_size=2, pool_stride=2, pool_type='max')
        self.conv2 = Conv2D(num_channels=6, num_filters=16, filter_size=5, act='sigmoid')
        self.pool2 = Pool2D(pool_size=2, pool_stride=2, pool_type='max')
        # 创建第3个卷积层
        self.conv3 = Conv2D(num_channels=16, num_filters=120, filter_size=4, act='sigmoid')
        # 创建全连接层，第一个全连接层的输出神经元个数为64， 第二个全连接层输出神经元个数为分类标签的类别数
        self.fc1 = Linear(input_dim=300000, output_dim=64, act='sigmoid')
        self.fc2 = Linear(input_dim=64, output_dim=num_classes)
    # 网络的前向计算过程，定义输出每一层的结果，后续将结果写入VisualDL日志文件，实现每一层输出图片的可视化
    def forward(self, x):
        x1 = self.conv1(x)
        #print(x1.shape)
        x2 = self.pool1(x1)
        x3 = self.conv2(x2)
        x4 = self.pool2(x3)
        x5 = self.conv3(x4)
        x6 = fluid.layers.reshape(x5, [x5.shape[0], -1])
        x7 = self.fc1(x6)
        x8 = self.fc2(x7)
        conv=[x,x1,x2,x3,x4,x5,x6,x7,x8]
        return x8,conv

# if __name__ == '__main__':
#     # 创建模型
with fluid.dygraph.guard():
    model = LeNet(num_classes=1)
print('model')
train(model)

2022-12-01 22:42:42,865-INFO: font search path ['/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf', '/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/afm', '/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts']
2022-12-01 22:42:43,208-INFO: generated new fontManager
W1201 22:42:43.293185   222 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 11.2, Runtime API Version: 9.0
W1201 22:42:43.296937   222 device_context.cc:260] device: 0, cuDNN Version: 7.6.


model
train model
start training ... 
epoch: 0, batch_id: 0, loss is: [0.69579124] acc [0.5]
epoch: 0, batch_id: 10, loss is: [0.8092405] acc [0.5]


Premature end of JPEG file


epoch: 0, batch_id: 20, loss is: [0.6578455] acc [0.7]
epoch: 0, batch_id: 30, loss is: [0.6947677] acc [0.4]
pred [[0.5126238 ]
 [0.51271063]
 [0.51269174]
 [0.5125851 ]
 [0.51254845]
 [0.5125189 ]
 [0.5125436 ]
 [0.5127212 ]
 [0.5126487 ]
 [0.51259047]]
pred2 [[0.4873762 ]
 [0.48728937]
 [0.48730826]
 [0.4874149 ]
 [0.48745155]
 [0.48748112]
 [0.48745638]
 [0.48727882]
 [0.4873513 ]
 [0.48740953]]
concat后，pred [[0.4873762  0.5126238 ]
 [0.48728937 0.51271063]
 [0.48730826 0.51269174]
 [0.4874149  0.5125851 ]
 [0.48745155 0.51254845]
 [0.48748112 0.5125189 ]
 [0.48745638 0.5125436 ]
 [0.48727882 0.5127212 ]
 [0.4873513  0.5126487 ]
 [0.48740953 0.51259047]]
[validation] accuracy/loss: 0.30000001192092896/0.703454852104187
epoch: 1, batch_id: 0, loss is: [0.67821574] acc [0.8]
epoch: 1, batch_id: 10, loss is: [0.720326] acc [0.3]
epoch: 1, batch_id: 20, loss is: [0.6815664] acc [0.8]


Premature end of JPEG file


epoch: 1, batch_id: 30, loss is: [0.6856409] acc [0.6]
pred [[0.521613  ]
 [0.5216748 ]
 [0.52166367]
 [0.5215772 ]
 [0.5215587 ]
 [0.52154374]
 [0.52156126]
 [0.521678  ]
 [0.5216254 ]
 [0.5215876 ]]
pred2 [[0.478387  ]
 [0.4783252 ]
 [0.47833633]
 [0.47842282]
 [0.4784413 ]
 [0.47845626]
 [0.47843874]
 [0.47832203]
 [0.4783746 ]
 [0.4784124 ]]
concat后，pred [[0.478387   0.521613  ]
 [0.4783252  0.5216748 ]
 [0.47833633 0.52166367]
 [0.47842282 0.5215772 ]
 [0.4784413  0.5215587 ]
 [0.47845626 0.52154374]
 [0.47843874 0.52156126]
 [0.47832203 0.521678  ]
 [0.4783746  0.5216254 ]
 [0.4784124  0.5215876 ]]
[validation] accuracy/loss: 0.30000001192092896/0.711302638053894
epoch: 2, batch_id: 0, loss is: [0.6853463] acc [0.6]
epoch: 2, batch_id: 10, loss is: [0.6862491] acc [0.7]
epoch: 2, batch_id: 20, loss is: [0.6953396] acc [0.4]


Premature end of JPEG file


epoch: 2, batch_id: 30, loss is: [0.68917084] acc [0.6]
pred [[0.5495572 ]
 [0.5496324 ]
 [0.5496178 ]
 [0.5495182 ]
 [0.5494985 ]
 [0.5494904 ]
 [0.54950666]
 [0.5496312 ]
 [0.54956865]
 [0.5495304 ]]
pred2 [[0.4504428 ]
 [0.45036763]
 [0.45038217]
 [0.45048177]
 [0.4505015 ]
 [0.4505096 ]
 [0.45049334]
 [0.45036882]
 [0.45043135]
 [0.4504696 ]]
concat后，pred [[0.4504428  0.5495572 ]
 [0.45036763 0.5496324 ]
 [0.45038217 0.5496178 ]
 [0.45048177 0.5495182 ]
 [0.4505015  0.5494985 ]
 [0.4505096  0.5494904 ]
 [0.45049334 0.54950666]
 [0.45036882 0.5496312 ]
 [0.45043135 0.54956865]
 [0.4504696  0.5495304 ]]
[validation] accuracy/loss: 0.30000001192092896/0.7377703785896301
epoch: 3, batch_id: 0, loss is: [0.6383264] acc [0.8]
epoch: 3, batch_id: 10, loss is: [0.6733761] acc [0.6]


Premature end of JPEG file


epoch: 3, batch_id: 20, loss is: [0.75188196] acc [0.2]
epoch: 3, batch_id: 30, loss is: [0.70675117] acc [0.4]
pred [[0.52727354]
 [0.5273216 ]
 [0.5273153 ]
 [0.52725005]
 [0.5272394 ]
 [0.5272305 ]
 [0.52724147]
 [0.52731997]
 [0.52728534]
 [0.52725583]]
pred2 [[0.47272646]
 [0.47267842]
 [0.47268468]
 [0.47274995]
 [0.47276062]
 [0.4727695 ]
 [0.47275853]
 [0.47268003]
 [0.47271466]
 [0.47274417]]
concat后，pred [[0.47272646 0.52727354]
 [0.47267842 0.5273216 ]
 [0.47268468 0.5273153 ]
 [0.47274995 0.52725005]
 [0.47276062 0.5272394 ]
 [0.4727695  0.5272305 ]
 [0.47275853 0.52724147]
 [0.47268003 0.52731997]
 [0.47271466 0.52728534]
 [0.47274417 0.52725583]]
[validation] accuracy/loss: 0.30000001192092896/0.716422438621521
epoch: 4, batch_id: 0, loss is: [0.66182554] acc [0.8]
epoch: 4, batch_id: 10, loss is: [0.70763093] acc [0.3]
epoch: 4, batch_id: 20, loss is: [0.6931961] acc [0.5]


Premature end of JPEG file


epoch: 4, batch_id: 30, loss is: [0.69379306] acc [0.5]
pred [[0.5465    ]
 [0.54655546]
 [0.5465472 ]
 [0.546474  ]
 [0.5464621 ]
 [0.54645467]
 [0.54646605]
 [0.5465523 ]
 [0.5465122 ]
 [0.5464808 ]]
pred2 [[0.45349997]
 [0.45344454]
 [0.45345283]
 [0.45352602]
 [0.45353788]
 [0.45354533]
 [0.45353395]
 [0.4534477 ]
 [0.4534878 ]
 [0.45351923]]
concat后，pred [[0.45349997 0.5465    ]
 [0.45344454 0.54655546]
 [0.45345283 0.5465472 ]
 [0.45352602 0.546474  ]
 [0.45353788 0.5464621 ]
 [0.45354533 0.54645467]
 [0.45353395 0.54646605]
 [0.4534477  0.5465523 ]
 [0.4534878  0.5465122 ]
 [0.45351923 0.5464808 ]]
[validation] accuracy/loss: 0.30000001192092896/0.7347368001937866


**运行结束后，有两种选择**
- 可选择直接启动VisualDL面板查看LeNet模型效果
- 继续训练以下模型，所有模型训练完成后再启动VisualDL查看不同模型的训练效果对比。

## 可视化第二组实验--使用AlexNet网络进行眼疾分类

- AlexNet网络实现代码如下：



In [7]:
# -*- coding:utf-8 -*-

# 导入需要的包
import paddle
import paddle.fluid as fluid
import numpy as np
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, Linear


# 定义 AlexNet 网络结构
class AlexNet(fluid.dygraph.Layer):
    def __init__(self, num_classes=1):
        super(AlexNet, self).__init__()
        
        # AlexNet与LeNet一样也会同时使用卷积和池化层提取图像特征
        # 与LeNet不同的是激活函数换成了‘relu’
        self.conv1 = Conv2D(num_channels=3, num_filters=96, filter_size=11, stride=4, padding=5, act='relu')
        self.pool1 = Pool2D(pool_size=2, pool_stride=2, pool_type='max')
        self.conv2 = Conv2D(num_channels=96, num_filters=256, filter_size=5, stride=1, padding=2, act='relu')
        self.pool2 = Pool2D(pool_size=2, pool_stride=2, pool_type='max')
        self.conv3 = Conv2D(num_channels=256, num_filters=384, filter_size=3, stride=1, padding=1, act='relu')
        self.conv4 = Conv2D(num_channels=384, num_filters=384, filter_size=3, stride=1, padding=1, act='relu')
        self.conv5 = Conv2D(num_channels=384, num_filters=256, filter_size=3, stride=1, padding=1, act='relu')
        self.pool5 = Pool2D(pool_size=2, pool_stride=2, pool_type='max')

        self.fc1 = Linear(input_dim=12544, output_dim=4096, act='relu')
        self.drop_ratio1 = 0.5
        self.fc2 = Linear(input_dim=4096, output_dim=4096, act='relu')
        self.drop_ratio2 = 0.5
        self.fc3 = Linear(input_dim=4096, output_dim=num_classes)

        
    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.pool1(x1)
        x3 = self.conv2(x2)
        x4 = self.pool2(x3)
        x5 = self.conv3(x4)
        x6 = self.conv4(x5)
        x7 = self.conv5(x6)
        x8 = self.pool5(x7)
        x9 = fluid.layers.reshape(x8, [x8.shape[0], -1])
        x10 = self.fc1(x9)
        # 在全连接之后使用dropout抑制过拟合
        x10= fluid.layers.dropout(x10, self.drop_ratio1)
        x10 = self.fc2(x10)
        # 在全连接之后使用dropout抑制过拟合
        x10 = fluid.layers.dropout(x10, self.drop_ratio2)
        x10 = self.fc3(x10)
        conv=[x,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10]
        return x10, conv

### 训练模型并使用VisualDL可视化训练参数及数据样本
- 创建AlexNet日志文件，以便对比其他模型训练参数，代码如下：

log_writer = LogWriter("./log/alexnet")

- 训练过程中插入作图语句，展示accuracy和loss的变化趋势：

log_writer.add_scalar(tag='acc', step=iter, value=acc.numpy())

log_writer.add_scalar(tag='loss', step=iter, value=avg_loss.numpy())

- 设计网络向前计算过程时，将每一层的输出储存于名为'conv'的list中，方便后续写入日志文件

- 训练过程中插入作图语句，展示输入图片在每一层网络的输出

log_writer.add_image(tag='input_alexnet/conv_1', img=conv[0].numpy(), step=batch_id)

***注意使用相同tag才能实现多组模型实验对比**

#### 完整训练及可视化代码如下：

In [8]:
#创建储存alexnet结果的日志文件夹
log_writer = LogWriter("./log/alexnet")
# 定义训练过程
def train(model):
    with fluid.dygraph.guard():
        print('start training ... ')
        model.train()
        epoch_num = 5
        iter=0
        # 定义优化器
        opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters())
        # 定义数据读取器，训练数据读取器和验证数据读取器
        train_loader = data_loader(DATADIR, batch_size=10, mode='train')
        valid_loader = valid_data_loader(DATADIR2, CSVFILE)
        for epoch in range(epoch_num):
            for batch_id, data in enumerate(train_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                pred = fluid.layers.sigmoid(logits)
                pred2 = pred * (-1.0) + 1.0
                pred = fluid.layers.concat([pred2, pred], axis=1)
                #将每一层输出的图片数据转成numpy array格式并写入日志文件
                log_writer.add_image(tag='input_alexnet/original', img=convert_out_img(conv[0]), dataformats="NCHW",step=batch_id)
                log_writer.add_image(tag='input_alexnet/conv_1', img=convert_out_img(conv[1][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_alexnet/pool_1', img=convert_out_img(conv[2][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_alexnet/conv_2', img=convert_out_img(conv[3][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_alexnet/pool_2', img=convert_out_img(conv[4][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_alexnet/conv_3', img=convert_out_img(conv[5][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_alexnet/conv_4', img=convert_out_img(conv[6][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_alexnet/conv_5', img=convert_out_img(conv[7][:,:4]), dataformats="NCHW", step=batch_id)
                # log_writer.add_image(tag='input_alexnet/pool_5', img=conv[8].numpy(), step=batch_id)
                # log_writer.add_image(tag='input_alexnet/reshape', img=conv[9].numpy(), step=batch_id)
                # log_writer.add_image(tag='input_alexnet/fc', img=conv[10].numpy(), step=batch_id)
                # #计算accuracy
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                # 进行loss计算
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                #训练过程中插入作图语句，当每10个batch训练完成后，将当前损失、准确率作为一个新增的数据点储存到记录器中。
                if batch_id % 10 == 0:
                    log_writer.add_scalar(tag='train/acc', step=iter, value=acc.numpy())
                    log_writer.add_scalar(tag='train/loss', step=iter, value=avg_loss.numpy())
                    iter+=10
                    print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
                # 反向传播，更新权重，清除梯度
                avg_loss.backward()
                opt.minimize(avg_loss)
                model.clear_gradients()

            model.eval()
            accuracies = []
            losses = []
            for batch_id, data in enumerate(valid_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                # 二分类，sigmoid计算后的结果以0.5为阈值分两个类别
                # 计算sigmoid后的预测概率，进行loss计算
                pred = fluid.layers.sigmoid(logits)
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                # 计算预测概率小于0.5的类别
                pred2 = pred * (-1.0) + 1.0
                # 得到两个类别的预测概率，并沿第一个维度级联
                pred = fluid.layers.concat([pred2, pred], axis=1)
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                accuracies.append(acc.numpy())
                losses.append(loss.numpy())
            print("[validation] accuracy/loss: {}/{}".format(np.mean(accuracies), np.mean(losses)))
            model.train()

        # save params of model
        fluid.save_dygraph(model.state_dict(), 'mnist')
        # save optimizer state
        fluid.save_dygraph(opt.state_dict(), 'mnist')

with fluid.dygraph.guard():
    model = AlexNet()

train(model)

start training ... 
epoch: 0, batch_id: 0, loss is: [0.7688733]
epoch: 0, batch_id: 10, loss is: [0.86218554]


Premature end of JPEG file


epoch: 0, batch_id: 20, loss is: [0.4846797]
epoch: 0, batch_id: 30, loss is: [0.7140993]
[validation] accuracy/loss: 0.7425000071525574/0.613825798034668
epoch: 1, batch_id: 0, loss is: [0.6335093]
epoch: 1, batch_id: 10, loss is: [0.48883742]
epoch: 1, batch_id: 20, loss is: [0.5731732]
epoch: 1, batch_id: 30, loss is: [0.42922032]


Premature end of JPEG file


[validation] accuracy/loss: 0.9125000238418579/0.3493860363960266
epoch: 2, batch_id: 0, loss is: [0.4302479]


Premature end of JPEG file


epoch: 2, batch_id: 10, loss is: [0.21872629]
epoch: 2, batch_id: 20, loss is: [0.48957294]
epoch: 2, batch_id: 30, loss is: [0.17584175]
[validation] accuracy/loss: 0.9100000262260437/0.2691851854324341
epoch: 3, batch_id: 0, loss is: [0.6128808]
epoch: 3, batch_id: 10, loss is: [0.20954657]
epoch: 3, batch_id: 20, loss is: [0.28138608]
epoch: 3, batch_id: 30, loss is: [0.12840594]


Premature end of JPEG file


[validation] accuracy/loss: 0.925000011920929/0.22186213731765747
epoch: 4, batch_id: 0, loss is: [0.23191528]


Premature end of JPEG file


epoch: 4, batch_id: 10, loss is: [0.35504922]
epoch: 4, batch_id: 20, loss is: [0.0974201]
epoch: 4, batch_id: 30, loss is: [0.4901902]
[validation] accuracy/loss: 0.9274999499320984/0.18138444423675537


**运行结束后，有两种选择**
- 可选择直接启动VisualDL面板查看AlexNet和LeNet的accuracy和loss对比
- 继续训练以下模型，所有模型训练完成后再启动VisualDL查看不同模型的训练效果对比。

## 可视化第三组实验--使用VGG网络进行眼疾分类

- VGG网络实现代码如下

In [13]:
# -*- coding:utf-8 -*-

# VGG模型代码
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear
from paddle.fluid.dygraph.base import to_variable

# 定义vgg块，包含多层卷积和1层2x2的最大池化层
class vgg_block(fluid.dygraph.Layer):
    def __init__(self, num_convs, in_channels, out_channels):
        """
        num_convs, 卷积层的数目
        num_channels, 卷积层的输出通道数，在同一个Incepition块内，卷积层输出通道数是一样的
        """
        super(vgg_block, self).__init__()
        self.conv_list = []
        for i in range(num_convs):
            conv_layer = self.add_sublayer('conv_' + str(i), Conv2D(num_channels=in_channels, 
                                        num_filters=out_channels, filter_size=3, padding=1, act='relu'))
            self.conv_list.append(conv_layer)
            in_channels = out_channels
        self.pool = Pool2D(pool_stride=2, pool_size = 2, pool_type='max')
    def forward(self, x):
        for item in self.conv_list:
            x = item(x)
        return self.pool(x)

class VGG(fluid.dygraph.Layer):
    def __init__(self, conv_arch=((2, 64), 
                                (2, 128), (3, 256), (3, 512), (3, 512))):
        super(VGG, self).__init__()
        self.vgg_blocks=[]
        iter_id = 0
        # 添加vgg_block
        # 这里一共5个vgg_block，每个block里面的卷积层数目和输出通道数由conv_arch指定
        in_channels = [3, 64, 128, 256, 512, 512]
        for (num_convs, num_channels) in conv_arch:
            block = self.add_sublayer('block_' + str(iter_id), 
                    vgg_block(num_convs, in_channels=in_channels[iter_id], 
                              out_channels=num_channels))
            self.vgg_blocks.append(block)
            iter_id += 1
        self.fc1 = Linear(input_dim=512*7*7, output_dim=4096,
                      act='relu')
        self.drop1_ratio = 0.5
        self.fc2= Linear(input_dim=4096, output_dim=4096,
                      act='relu')
        self.drop2_ratio = 0.5
        self.fc3 = Linear(input_dim=4096, output_dim=1)
        
    def forward(self, input):
        x=input
        for item in self.vgg_blocks:
            x = item(x)
        x1 = fluid.layers.reshape(x, [x.shape[0], -1])
        x2 = fluid.layers.dropout(self.fc1(x1), self.drop1_ratio)
        x3 = fluid.layers.dropout(self.fc2(x2), self.drop2_ratio)
        x4 = self.fc3(x3)
        conv=[input,x,x1,x2,x3,x4]
        return x4,conv

### 训练模型并使用VisualDL可视化训练参数及数据样本
- 创建vgg日志文件，以便对比其他模型训练参数，代码如下：

log_writer = LogWriter("./log/vgg")

- 训练过程中插入作图语句，展示accuracy和loss的变化趋势：

log_writer.add_scalar(tag='acc', step=iter, value=acc.numpy())

log_writer.add_scalar(tag='loss', step=iter, value=avg_loss.numpy())

- 设计网络向前计算过程时，将每一层的输出储存于名为'conv'的list中，方便后续写入日志文件

- 训练过程中插入作图语句，展示输入图片在每一层网络的输出

log_writer.add_image(tag='input_vgg/conv_1', img=conv[0].numpy(), step=batch_id)

***注意使用相同tag才能实现多组模型实验对比**

#### 完整训练及可视化代码如下：

In [14]:
#创建储存vgg结果的日志文件夹
log_writer = LogWriter("./log/vgg")
# 定义训练过程
def train(model):
    with fluid.dygraph.guard():
        print('start training ... ')
        model.train()
        epoch_num = 5
        iter=0
        # 定义优化器
        opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters())
        # 定义数据读取器，训练数据读取器和验证数据读取器
        train_loader = data_loader(DATADIR, batch_size=10, mode='train')
        valid_loader = valid_data_loader(DATADIR2, CSVFILE)
        for epoch in range(epoch_num):
            for batch_id, data in enumerate(train_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                pred = fluid.layers.sigmoid(logits)
                pred2 = pred * (-1.0) + 1.0
                pred = fluid.layers.concat([pred2, pred], axis=1)
                #将每一层输出的图片数据转成numpy array格式并写入日志文件
                log_writer.add_image(tag='input_vgg/original', img=convert_out_img(conv[0][:,:4]), dataformats="NCHW",step=batch_id)
                log_writer.add_image(tag='input_vgg/vgg_blocks', img=convert_out_img(conv[1][:,:4]), dataformats="NCHW",step=batch_id)
                # log_writer.add_image(tag='input_vgg/fc1', img=convert_out_img(conv[2][:,:4]), dataformats="NCHW",step=batch_id)
                # log_writer.add_image(tag='input_vgg/fc2', img=convert_out_img(conv[3][:,:4]), dataformats="NCHW",step=batch_id)
                # log_writer.add_image(tag='input_vgg/fc3', img=convert_out_img(conv[4][:,:4]), dataformats="NCHW",step=batch_id)
                #计算accuracy
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                # 进行loss计算
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                #训练过程中插入作图语句，当每10个batch训练完成后，将当前损失、准确率作为一个新增的数据点储存到记录器中。
                if batch_id % 10 == 0:
                    log_writer.add_scalar(tag='train/acc', step=iter, value=acc.numpy())
                    log_writer.add_scalar(tag='train/loss', step=iter, value=avg_loss.numpy())
                    iter+=10
                    print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
                # 反向传播，更新权重，清除梯度
                avg_loss.backward()
                opt.minimize(avg_loss)
                model.clear_gradients()

            model.eval()
            accuracies = []
            losses = []
            for batch_id, data in enumerate(valid_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                # 二分类，sigmoid计算后的结果以0.5为阈值分两个类别
                # 计算sigmoid后的预测概率，进行loss计算
                pred = fluid.layers.sigmoid(logits)
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                # 计算预测概率小于0.5的类别
                pred2 = pred * (-1.0) + 1.0
                # 得到两个类别的预测概率，并沿第一个维度级联
                pred = fluid.layers.concat([pred2, pred], axis=1)
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                accuracies.append(acc.numpy())
                losses.append(loss.numpy())
            print("[validation] accuracy/loss: {}/{}".format(np.mean(accuracies), np.mean(losses)))
            model.train()

        # save params of model
        fluid.save_dygraph(model.state_dict(), 'mnist')
        # save optimizer state
        fluid.save_dygraph(opt.state_dict(), 'mnist')

with fluid.dygraph.guard():
    model = VGG()

train(model)

start training ... 
epoch: 0, batch_id: 0, loss is: [0.752526]


Premature end of JPEG file


epoch: 0, batch_id: 10, loss is: [0.82166547]
epoch: 0, batch_id: 20, loss is: [0.7532598]
epoch: 0, batch_id: 30, loss is: [0.5577036]
[validation] accuracy/loss: 0.5275000333786011/0.66121506690979
epoch: 1, batch_id: 0, loss is: [0.72483516]
epoch: 1, batch_id: 10, loss is: [0.64986324]


Premature end of JPEG file


epoch: 1, batch_id: 20, loss is: [0.61736953]
epoch: 1, batch_id: 30, loss is: [0.62518173]
[validation] accuracy/loss: 0.9200000762939453/0.40356606245040894
epoch: 2, batch_id: 0, loss is: [0.47684214]
epoch: 2, batch_id: 10, loss is: [0.5038073]


Premature end of JPEG file


epoch: 2, batch_id: 20, loss is: [0.35800314]
epoch: 2, batch_id: 30, loss is: [0.33573705]
[validation] accuracy/loss: 0.9200000762939453/0.25573059916496277
epoch: 3, batch_id: 0, loss is: [0.31720468]
epoch: 3, batch_id: 10, loss is: [0.1300174]
epoch: 3, batch_id: 20, loss is: [0.37148193]


Premature end of JPEG file


epoch: 3, batch_id: 30, loss is: [0.29787353]
[validation] accuracy/loss: 0.9275000691413879/0.21195261180400848
epoch: 4, batch_id: 0, loss is: [0.17336412]


Premature end of JPEG file


epoch: 4, batch_id: 10, loss is: [0.2628278]
epoch: 4, batch_id: 20, loss is: [0.30194184]
epoch: 4, batch_id: 30, loss is: [0.63926095]
[validation] accuracy/loss: 0.9125000238418579/0.2694101631641388


**运行结束后，有两种选择**
- 可选择直接启动VisualDL面板查看AlexNet、LeNet和VGG的accuracy和loss对比
- 继续训练以下模型，所有模型训练完成后再启动VisualDL查看不同模型的训练效果对比。



## 可视化第三组实验--使用GoogleNet网络进行眼疾分类

- Inception模块的具体实现如下代码所示：

In [15]:
class Inception(fluid.dygraph.Layer):
    def __init__(self, c1, c2, c3, c4, **kwargs):
        '''
        Inception模块的实现代码，
        
        c1,  图(b)中第一条支路1x1卷积的输出通道数，数据类型是整数
        c2，图(b)中第二条支路卷积的输出通道数，数据类型是tuple或list, 
               其中c2[0]是1x1卷积的输出通道数，c2[1]是3x3
        c3，图(b)中第三条支路卷积的输出通道数，数据类型是tuple或list, 
               其中c3[0]是1x1卷积的输出通道数，c3[1]是3x3
        c4,  图(b)中第一条支路1x1卷积的输出通道数，数据类型是整数
        '''
        super(Inception, self).__init__()
        # 依次创建Inception块每条支路上使用到的操作
        self.p1_1 = Conv2D(num_filters=c1, 
                           filter_size=1, act='relu')
        self.p2_1 = Conv2D(num_filters=c2[0], 
                           filter_size=1, act='relu')
        self.p2_2 = Conv2D(num_filters=c2[1], 
                           filter_size=3, padding=1, act='relu')
        self.p3_1 = Conv2D(num_filters=c3[0], 
                           filter_size=1, act='relu')
        self.p3_2 = Conv2D(num_filters=c3[1], 
                           filter_size=5, padding=2, act='relu')
        self.p4_1 = Pool2D(pool_size=3, 
                           pool_stride=1,  pool_padding=1, 
                           pool_type='max')
        self.p4_2 = Conv2D(num_filters=c4, 
                           filter_size=1, act='relu')

    def forward(self, x):
        # 支路1只包含一个1x1卷积
        p1 = self.p1_1(x)
        # 支路2包含 1x1卷积 + 3x3卷积
        p2 = self.p2_2(self.p2_1(x))
        # 支路3包含 1x1卷积 + 5x5卷积
        p3 = self.p3_2(self.p3_1(x))
        # 支路4包含 最大池化和1x1卷积
        p4 = self.p4_2(self.p4_1(x))
        # 将每个支路的输出特征图拼接在一起作为最终的输出结果
        return fluid.layers.concat([p1, p2, p3, p4], axis=1)  

- GoogLeNet的具体实现如下代码所示：

In [16]:
# -*- coding:utf-8 -*-

# GoogLeNet模型代码
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear
from paddle.fluid.dygraph.base import to_variable

# 定义Inception块
class Inception(fluid.dygraph.Layer):
    def __init__(self, c0,c1, c2, c3, c4, **kwargs):
        '''
        Inception模块的实现代码，
        
        c1,  图(b)中第一条支路1x1卷积的输出通道数，数据类型是整数
        c2，图(b)中第二条支路卷积的输出通道数，数据类型是tuple或list, 
               其中c2[0]是1x1卷积的输出通道数，c2[1]是3x3
        c3，图(b)中第三条支路卷积的输出通道数，数据类型是tuple或list, 
               其中c3[0]是1x1卷积的输出通道数，c3[1]是3x3
        c4,  图(b)中第一条支路1x1卷积的输出通道数，数据类型是整数
        '''
        super(Inception, self).__init__()
        # 依次创建Inception块每条支路上使用到的操作
        self.p1_1 = Conv2D(num_channels=c0, num_filters=c1, 
                           filter_size=1, act='relu')
        self.p2_1 = Conv2D(num_channels=c0, num_filters=c2[0], 
                           filter_size=1, act='relu')
        self.p2_2 = Conv2D(num_channels=c2[0], num_filters=c2[1], 
                           filter_size=3, padding=1, act='relu')
        self.p3_1 = Conv2D(num_channels=c0, num_filters=c3[0], 
                           filter_size=1, act='relu')
        self.p3_2 = Conv2D(num_channels=c3[0], num_filters=c3[1], 
                           filter_size=5, padding=2, act='relu')
        self.p4_1 = Pool2D(pool_size=3, 
                           pool_stride=1,  pool_padding=1, 
                           pool_type='max')
        self.p4_2 = Conv2D(num_channels=c0, num_filters=c4, 
                           filter_size=1, act='relu')

    def forward(self, x):
        # 支路1只包含一个1x1卷积
        p1 = self.p1_1(x)
        # 支路2包含 1x1卷积 + 3x3卷积
        p2 = self.p2_2(self.p2_1(x))
        # 支路3包含 1x1卷积 + 5x5卷积
        p3 = self.p3_2(self.p3_1(x))
        # 支路4包含 最大池化和1x1卷积
        p4 = self.p4_2(self.p4_1(x))
        # 将每个支路的输出特征图拼接在一起作为最终的输出结果
        return fluid.layers.concat([p1, p2, p3, p4], axis=1)  
    
class GoogLeNet(fluid.dygraph.Layer):
    def __init__(self):
        super(GoogLeNet, self).__init__()
        # GoogLeNet包含五个模块，每个模块后面紧跟一个池化层
        # 第一个模块包含1个卷积层
        self.conv1 = Conv2D(num_channels=3, num_filters=64, filter_size=7, 
                            padding=3, act='relu')
        # 3x3最大池化
        self.pool1 = Pool2D(pool_size=3, pool_stride=2,  
                            pool_padding=1, pool_type='max')
        # 第二个模块包含2个卷积层
        self.conv2_1 = Conv2D(num_channels=64, num_filters=64, 
                              filter_size=1, act='relu')
        self.conv2_2 = Conv2D(num_channels=64, num_filters=192, 
                              filter_size=3, padding=1, act='relu')
        # 3x3最大池化
        self.pool2 = Pool2D(pool_size=3, pool_stride=2,  
                            pool_padding=1, pool_type='max')
        # 第三个模块包含2个Inception块
        self.block3_1 = Inception(192, 64, (96, 128), (16, 32), 32)
        self.block3_2 = Inception(256, 128, (128, 192), (32, 96), 64)
        # 3x3最大池化
        self.pool3 = Pool2D(pool_size=3, pool_stride=2,  
                               pool_padding=1, pool_type='max')
        # 第四个模块包含5个Inception块
        self.block4_1 = Inception(480, 192, (96, 208), (16, 48), 64)
        self.block4_2 = Inception(512, 160, (112, 224), (24, 64), 64)
        self.block4_3 = Inception(512, 128, (128, 256), (24, 64), 64)
        self.block4_4 = Inception(512, 112, (144, 288), (32, 64), 64)
        self.block4_5 = Inception(528, 256, (160, 320), (32, 128), 128)
        # 3x3最大池化
        self.pool4 = Pool2D(pool_size=3, pool_stride=2,  
                               pool_padding=1, pool_type='max')
        # 第五个模块包含2个Inception块
        self.block5_1 = Inception(832, 256, (160, 320), (32, 128), 128)
        self.block5_2 = Inception(832, 384, (192, 384), (48, 128), 128)
        # 全局池化，尺寸用的是global_pooling，pool_stride不起作用
        self.pool5 = Pool2D(pool_stride=1, 
                               global_pooling=True, pool_type='avg')
        self.fc = Linear(input_dim=1024, output_dim=1, act=None)

    def forward(self, x):
        x1 = self.pool1(self.conv1(x))
        x2 = self.pool2(self.conv2_2(self.conv2_1(x1)))
        x3 = self.pool3(self.block3_2(self.block3_1(x2)))
        x4 = self.block4_3(self.block4_2(self.block4_1(x3)))
        x5 = self.pool4(self.block4_5(self.block4_4(x4)))
        x6 = self.pool5(self.block5_2(self.block5_1(x5)))
        x7 = fluid.layers.reshape(x6, [x6.shape[0], -1])
        x8 = self.fc(x7)
        conv=[x,x1,x2,x3,x4,x5,x6,x7,x8]
        return x8,conv


### 训练模型并使用VisualDL可视化训练参数及数据样本
- 创建GoogleNet日志文件，以便对比其他模型训练参数，代码如下：

log_writer = LogWriter("./log/googlenet")

- 训练过程中插入作图语句，展示accuracy和loss的变化趋势：

log_writer.add_scalar(tag='acc', step=iter, value=acc.numpy())

log_writer.add_scalar(tag='loss', step=iter, value=avg_loss.numpy())

- 设计网络向前计算过程时，将每一层的输出储存于名为'conv'的list中，方便后续写入日志文件

- 训练过程中插入作图语句，展示输入图片在每一层网络的输出

log_writer.add_image(tag='input_googlenet/pool_1', img=conv[0].numpy(), step=batch_id)

***注意使用相同tag才能实现多组模型实验对比**

#### 完整训练及可视化代码如下：

In [19]:
#创建储存googlenet结果的日志文件夹
log_writer = LogWriter("./log/googlenet")
# 定义训练过程
def train(model):
    with fluid.dygraph.guard():
        print('start training ... ')
        model.train()
        epoch_num = 5
        iter=0
        # 定义优化器
        opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters())
        # 定义数据读取器，训练数据读取器和验证数据读取器
        train_loader = data_loader(DATADIR, batch_size=10, mode='train')
        valid_loader = valid_data_loader(DATADIR2, CSVFILE)
        for epoch in range(epoch_num):
            for batch_id, data in enumerate(train_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                pred = fluid.layers.sigmoid(logits)
                pred2 = pred * (-1.0) + 1.0
                pred = fluid.layers.concat([pred2, pred], axis=1)
                #将每一层输出的图片数据转成numpy array格式并写入日志文件
                log_writer.add_image(tag='input_googlenet/original', img=convert_out_img(conv[0]), dataformats="NCHW",step=batch_id)
                log_writer.add_image(tag='input_googlenet/pool_1', img=convert_out_img(conv[1][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_googlenet/pool_2', img=convert_out_img(conv[2][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_googlenet/pool_3', img=convert_out_img(conv[3][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_googlenet/block4_3', img=convert_out_img(conv[4][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_googlenet/pool_4', img=convert_out_img(conv[5][:,:4]), dataformats="NCHW", step=batch_id)
                log_writer.add_image(tag='input_googlenet/pool_5', img=convert_out_img(conv[6][:,:4]), dataformats="NCHW", step=batch_id)
                # log_writer.add_image(tag='input_googlenet/reshape', img=convert_out_img(conv[7][:,:4]), dataformats="NCHW", step=batch_id)
                # log_writer.add_image(tag='input_googlenet/fc', img=convert_out_img(conv[8][:,:4]), dataformats="NCHW", step=batch_id)
                #计算accuracy
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                # 进行loss计算
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                #训练过程中插入作图语句，当每10个batch训练完成后，将当前损失、准确率作为一个新增的数据点储存到记录器中。
                if batch_id % 10 == 0:
                    log_writer.add_scalar(tag='train/acc', step=iter, value=acc.numpy())
                    log_writer.add_scalar(tag='train/loss', step=iter, value=avg_loss.numpy())
                    iter+=10
                    print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
                # 反向传播，更新权重，清除梯度
                avg_loss.backward()
                opt.minimize(avg_loss)
                model.clear_gradients()

            model.eval()
            accuracies = []
            losses = []
            for batch_id, data in enumerate(valid_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                # 二分类，sigmoid计算后的结果以0.5为阈值分两个类别
                # 计算sigmoid后的预测概率，进行loss计算
                pred = fluid.layers.sigmoid(logits)
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                # 计算预测概率小于0.5的类别
                pred2 = pred * (-1.0) + 1.0
                # 得到两个类别的预测概率，并沿第一个维度级联
                pred = fluid.layers.concat([pred2, pred], axis=1)
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                accuracies.append(acc.numpy())
                losses.append(loss.numpy())
            print("[validation] accuracy/loss: {}/{}".format(np.mean(accuracies), np.mean(losses)))
            model.train()

        # save params of model
        fluid.save_dygraph(model.state_dict(), 'mnist')
        # save optimizer state
        fluid.save_dygraph(opt.state_dict(), 'mnist')


with fluid.dygraph.guard():
    model = GoogLeNet()

train(model)

start training ... 
epoch: 0, batch_id: 0, loss is: [0.75418067]
epoch: 0, batch_id: 10, loss is: [0.7037188]


Premature end of JPEG file


epoch: 0, batch_id: 20, loss is: [0.6713807]
epoch: 0, batch_id: 30, loss is: [0.65968716]
[validation] accuracy/loss: 0.550000011920929/0.6173955798149109
epoch: 1, batch_id: 0, loss is: [0.5912618]
epoch: 1, batch_id: 10, loss is: [0.63442963]
epoch: 1, batch_id: 20, loss is: [0.884045]
epoch: 1, batch_id: 30, loss is: [0.6811533]


Premature end of JPEG file


[validation] accuracy/loss: 0.8575000762939453/0.43541648983955383


Premature end of JPEG file


epoch: 2, batch_id: 0, loss is: [0.45489442]
epoch: 2, batch_id: 10, loss is: [0.38237202]
epoch: 2, batch_id: 20, loss is: [0.53658324]
epoch: 2, batch_id: 30, loss is: [1.0337268]
[validation] accuracy/loss: 0.8700000047683716/0.5773341655731201
epoch: 3, batch_id: 0, loss is: [0.5977422]
epoch: 3, batch_id: 10, loss is: [0.7324745]
epoch: 3, batch_id: 20, loss is: [0.592078]


Premature end of JPEG file


epoch: 3, batch_id: 30, loss is: [0.33997554]
[validation] accuracy/loss: 0.9125000238418579/0.3259984850883484
epoch: 4, batch_id: 0, loss is: [0.17838205]


Premature end of JPEG file


epoch: 4, batch_id: 10, loss is: [0.21453923]
epoch: 4, batch_id: 20, loss is: [0.24832763]
epoch: 4, batch_id: 30, loss is: [0.6171415]
[validation] accuracy/loss: 0.8324999809265137/0.41865235567092896


**运行结束后，有两种选择**
- 可选择直接启动VisualDL面板查看AlexNet、LeNet、VGG和GoogleNet的accuracy和loss对比
- 继续训练以下模型，所有模型训练完成后再启动VisualDL查看不同模型的训练效果对比。

## 可视化第三组实验--使用ResNet网络进行眼疾分类

- ResNet-50的具体实现如下代码所示：

In [21]:
# -*- coding:utf-8 -*-

# ResNet模型代码
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear
from paddle.fluid.dygraph.base import to_variable

# ResNet中使用了BatchNorm层，在卷积层的后面加上BatchNorm以提升数值稳定性
# 定义卷积批归一化块
class ConvBNLayer(fluid.dygraph.Layer):
    def __init__(self,
                 num_channels,
                 num_filters,
                 filter_size,
                 stride=1,
                 groups=1,
                 act=None):
        """
        
        num_channels, 卷积层的输入通道数
        num_filters, 卷积层的输出通道数
        stride, 卷积层的步幅
        groups, 分组卷积的组数，默认groups=1不使用分组卷积
        act, 激活函数类型，默认act=None不使用激活函数
        """
        super(ConvBNLayer, self).__init__()

        # 创建卷积层
        self._conv = Conv2D(
            num_channels=num_channels,
            num_filters=num_filters,
            filter_size=filter_size,
            stride=stride,
            padding=(filter_size - 1) // 2,
            groups=groups,
            act=None,
            bias_attr=False)

        # 创建BatchNorm层
        self._batch_norm = BatchNorm(num_filters, act=act)

    def forward(self, inputs):
        y = self._conv(inputs)
        y = self._batch_norm(y)
        return y

# 定义残差块
# 每个残差块会对输入图片做三次卷积，然后跟输入图片进行短接
# 如果残差块中第三次卷积输出特征图的形状与输入不一致，则对输入图片做1x1卷积，将其输出形状调整成一致
class BottleneckBlock(fluid.dygraph.Layer):
    def __init__(self,
                 num_channels,
                 num_filters,
                 stride,
                 shortcut=True):
        super(BottleneckBlock, self).__init__()
        # 创建第一个卷积层 1x1
        self.conv0 = ConvBNLayer(
            num_channels=num_channels,
            num_filters=num_filters,
            filter_size=1,
            act='relu')
        # 创建第二个卷积层 3x3
        self.conv1 = ConvBNLayer(
            num_channels=num_filters,
            num_filters=num_filters,
            filter_size=3,
            stride=stride,
            act='relu')
        # 创建第三个卷积 1x1，但输出通道数乘以4
        self.conv2 = ConvBNLayer(
            num_channels=num_filters,
            num_filters=num_filters * 4,
            filter_size=1,
            act=None)

        # 如果conv2的输出跟此残差块的输入数据形状一致，则shortcut=True
        # 否则shortcut = False，添加1个1x1的卷积作用在输入数据上，使其形状变成跟conv2一致
        if not shortcut:
            self.short = ConvBNLayer(
                num_channels=num_channels,
                num_filters=num_filters * 4,
                filter_size=1,
                stride=stride)

        self.shortcut = shortcut

        self._num_channels_out = num_filters * 4

    def forward(self, inputs):
        y = self.conv0(inputs)
        conv1 = self.conv1(y)
        conv2 = self.conv2(conv1)

        # 如果shortcut=True，直接将inputs跟conv2的输出相加
        # 否则需要对inputs进行一次卷积，将形状调整成跟conv2输出一致
        if self.shortcut:
            short = inputs
        else:
            short = self.short(inputs)

        y = fluid.layers.elementwise_add(x=short, y=conv2)
        layer_helper = LayerHelper(self.full_name(), act='relu')
        return layer_helper.append_activation(y)

# 定义ResNet模型
class ResNet(fluid.dygraph.Layer):
    def __init__(self, layers=50, class_dim=1):
        """
        
        layers, 网络层数，可以是50, 101或者152
        class_dim，分类标签的类别数
        """
        super(ResNet, self).__init__()
        self.layers = layers
        supported_layers = [50, 101, 152]
        assert layers in supported_layers, \
            "supported layers are {} but input layer is {}".format(supported_layers, layers)

        if layers == 50:
            #ResNet50包含多个模块，其中第2到第5个模块分别包含3、4、6、3个残差块
            depth = [3, 4, 6, 3]
        elif layers == 101:
            #ResNet101包含多个模块，其中第2到第5个模块分别包含3、4、23、3个残差块
            depth = [3, 4, 23, 3]
        elif layers == 152:
            #ResNet50包含多个模块，其中第2到第5个模块分别包含3、8、36、3个残差块
            depth = [3, 8, 36, 3]
        
        # 残差块中使用到的卷积的输出通道数
        num_filters = [64, 128, 256, 512]

        # ResNet的第一个模块，包含1个7x7卷积，后面跟着1个最大池化层
        self.conv = ConvBNLayer(
            num_channels=3,
            num_filters=64,
            filter_size=7,
            stride=2,
            act='relu')
        self.pool2d_max = Pool2D(
            pool_size=3,
            pool_stride=2,
            pool_padding=1,
            pool_type='max')

        # ResNet的第二到第五个模块c2、c3、c4、c5
        self.bottleneck_block_list = []
        num_channels = 64
        for block in range(len(depth)):
            shortcut = False
            for i in range(depth[block]):
                bottleneck_block = self.add_sublayer(
                    'bb_%d_%d' % (block, i),
                    BottleneckBlock(
                        num_channels=num_channels,
                        num_filters=num_filters[block],
                        stride=2 if i == 0 and block != 0 else 1, # c3、c4、c5将会在第一个残差块使用stride=2；其余所有残差块stride=1
                        shortcut=shortcut))
                num_channels = bottleneck_block._num_channels_out
                self.bottleneck_block_list.append(bottleneck_block)
                shortcut = True

        # 在c5的输出特征图上使用全局池化
        self.pool2d_avg = Pool2D(pool_size=7, pool_type='avg', global_pooling=True)

        # stdv用来作为全连接层随机初始化参数的方差
        import math
        stdv = 1.0 / math.sqrt(2048 * 1.0)
        
        # 创建全连接层，输出大小为类别数目
        self.out = Linear(input_dim=2048, output_dim=class_dim,
                      param_attr=fluid.param_attr.ParamAttr(
                          initializer=fluid.initializer.Uniform(-stdv, stdv)))

        
    def forward(self, inputs):
        y = self.conv(inputs)
        y = self.pool2d_max(y)
        for bottleneck_block in self.bottleneck_block_list:
            y = bottleneck_block(y)
        y1 = self.pool2d_avg(y)
        y2 = fluid.layers.reshape(y1, [y1.shape[0], -1])
        y3 = self.out(y2)
        conv=[inputs,y,y1,y2,y3]
        return y3,conv



### 训练模型并使用VisualDL可视化训练参数及数据样本
- 创建ResNet日志文件，以便对比其他模型训练参数，代码如下：

log_writer = LogWriter("./log/resenet")

- 训练过程中插入作图语句，展示accuracy和loss的变化趋势：

log_writer.add_scalar(tag='acc', step=iter, value=acc.numpy())

log_writer.add_scalar(tag='loss', step=iter, value=avg_loss.numpy())

- 设计网络向前计算过程时，将每一层的输出储存于名为'conv'的list中，方便后续写入日志文件

- 训练过程中插入作图语句，展示输入图片在每一层网络的输出

log_writer.add_image(tag='input_resnet/pool2d_avg', img=conv[0].numpy(), step=batch_id)

***注意使用相同tag才能实现多组模型实验对比**

#### 完整训练及可视化代码如下：

In [24]:
#创建储存resnet结果的日志文件夹
log_writer = LogWriter("./log/resnet")
# 定义训练过程
def train(model):
    with fluid.dygraph.guard():
        print('start training ... ')
        model.train()
        epoch_num = 5
        iter=0
        # 定义优化器
        opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters())
        # 定义数据读取器，训练数据读取器和验证数据读取器
        train_loader = data_loader(DATADIR, batch_size=10, mode='train')
        valid_loader = valid_data_loader(DATADIR2, CSVFILE)
        for epoch in range(epoch_num):
            for batch_id, data in enumerate(train_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                pred = fluid.layers.sigmoid(logits)
                pred2 = pred * (-1.0) + 1.0
                pred = fluid.layers.concat([pred2, pred], axis=1)
                #将每一层输出的图片数据转成numpy array格式并写入日志文件
                log_writer.add_image(tag='input_resnet/original', img=convert_out_img(conv[0]),dataformats="NCHW",step=batch_id)
                log_writer.add_image(tag='input_resnet/pool2d_avg', img=convert_out_img(conv[1][:,:4]),dataformats="NCHW",step=batch_id)
                # log_writer.add_image(tag='input_resnet/reshape', img=convert_out_img(conv[2][:,:4]),dataformats="NCHW", step=batch_id)
                # log_writer.add_image(tag='input_resnet/output', img=convert_out_img(conv[3][:,:4]),dataformats="NCHW", step=batch_id)
                #计算accuracy
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                # 进行loss计算
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                #训练过程中插入作图语句，当每10个batch训练完成后，将当前损失、准确率作为一个新增的数据点储存到记录器中。
                if batch_id % 10 == 0:
                    log_writer.add_scalar(tag='train/acc', step=iter, value=acc.numpy())
                    log_writer.add_scalar(tag='train/loss', step=iter, value=avg_loss.numpy())
                    iter+=10
                    print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
                # 反向传播，更新权重，清除梯度
                avg_loss.backward()
                opt.minimize(avg_loss)
                model.clear_gradients()

            model.eval()
            accuracies = []
            losses = []
            for batch_id, data in enumerate(valid_loader()):
                x_data, y_data = data
                img = fluid.dygraph.to_variable(x_data)
                label = fluid.dygraph.to_variable(y_data)
                # 运行模型前向计算，得到预测值
                logits,conv = model(img)
                # 二分类，sigmoid计算后的结果以0.5为阈值分两个类别
                # 计算sigmoid后的预测概率，进行loss计算
                pred = fluid.layers.sigmoid(logits)
                loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
                avg_loss = fluid.layers.mean(loss)
                # 计算预测概率小于0.5的类别
                pred2 = pred * (-1.0) + 1.0
                # 得到两个类别的预测概率，并沿第一个维度级联
                pred = fluid.layers.concat([pred2, pred], axis=1)
                acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
                accuracies.append(acc.numpy())
                losses.append(loss.numpy())
            print("[validation] accuracy/loss: {}/{}".format(np.mean(accuracies), np.mean(losses)))
            model.train()

        # save params of model
        fluid.save_dygraph(model.state_dict(), 'mnist')
        # save optimizer state
        fluid.save_dygraph(opt.state_dict(), 'mnist')
with fluid.dygraph.guard():
    model = ResNet()

train(model)

start training ... 
epoch: 0, batch_id: 0, loss is: [0.6848939]
epoch: 0, batch_id: 10, loss is: [0.722348]
epoch: 0, batch_id: 20, loss is: [0.69279873]


Premature end of JPEG file


epoch: 0, batch_id: 30, loss is: [0.6632378]
[validation] accuracy/loss: 0.7400000095367432/0.5463799834251404
epoch: 1, batch_id: 0, loss is: [0.6896794]
epoch: 1, batch_id: 10, loss is: [0.5407681]


Premature end of JPEG file


epoch: 1, batch_id: 20, loss is: [0.41758567]
epoch: 1, batch_id: 30, loss is: [0.4623827]
[validation] accuracy/loss: 0.875/0.3386470675468445
epoch: 2, batch_id: 0, loss is: [0.2210343]
epoch: 2, batch_id: 10, loss is: [0.304076]
epoch: 2, batch_id: 20, loss is: [0.2317107]


Premature end of JPEG file


epoch: 2, batch_id: 30, loss is: [0.24433315]
[validation] accuracy/loss: 0.9475000500679016/0.19909973442554474
epoch: 3, batch_id: 0, loss is: [0.26255694]
epoch: 3, batch_id: 10, loss is: [0.7900493]


Premature end of JPEG file


epoch: 3, batch_id: 20, loss is: [0.1531486]
epoch: 3, batch_id: 30, loss is: [0.25372422]
[validation] accuracy/loss: 0.9225000143051147/0.23160284757614136
epoch: 4, batch_id: 0, loss is: [0.21286654]
epoch: 4, batch_id: 10, loss is: [0.25417072]
epoch: 4, batch_id: 20, loss is: [0.24430172]


Premature end of JPEG file


epoch: 4, batch_id: 30, loss is: [0.358006]
[validation] accuracy/loss: 0.8899999856948853/0.2840989828109741


**至此，所有模型训练完毕，启动VisuaDL查看模型训练参数对比**

对比五个模型的Accuracy和Loss：

![](https://ai-studio-static-online.cdn.bcebos.com/671fac7e91e84d8a947db56ed70f9e7d78e5c248d4a4467e839975f1b74c99f2)


通过对比，我们可以发现：

- LeNet的loss很难下降，模型没有收敛。
- AlexNet的loss能有效下降，经过5个epoch的训练，在训练集上的准确率可以达到92%左右。
- VGG的loss能有效下降，经过5个epoch的训练，在训练集上的准确率可以达到94%左右
- GoogleNet的loss能有效下降，经过5个epoch的训练，在训练集上的准确率可以达到96%左右
- ResNet的loss能有效下降，经过5个epoch的训练，在训练集上的准确率可以达到98%左右

除了LeNet不适合大尺寸的图像分类问题之外，其它几个模型在此数据集上损失函数都能显著下降，如果读者有兴趣的话，可以进一步调整学习率和训练轮数等超参数，使用VisualDL记录这些参数，观察其对模型精度的影响。

**启动VisualDL查看输入图片在训练过程中的变化**

- 查看不同迭代次数训练下的图片数据

![](https://ai-studio-static-online.cdn.bcebos.com/4e78a9f53f3c4256beabdf13f66739d343db636cd1274b939f9f3694d34437cb)

- 查看输入图片经过每一层网络训练的输出：
	* 输入图片经过LeNet网络第一层池化（左图）和第二层卷积（右图）：
    ![](https://ai-studio-static-online.cdn.bcebos.com/d051e48e98534560af58d897b16c3b2e36de282b7a8d410687d4b41b5a09ba20)
    
    * 输入图片经过AlexNet网络第一层池化（左图）和第二层卷积（右图）：
    ![](https://ai-studio-static-online.cdn.bcebos.com/7379218091a24f4284eeaebc76c9348ce602538695344305815d1475b212ef1d)

通过观察图片随训练的变化，我们可以直观的看到每一层网络对于图片的影响，进而辅助我们改善模型结构的设计。

# 总结

本项目为读者介绍了如何使用VisualDL可视化训练参数、图像数据，以及对比五种经典的图像分类模型的在眼疾筛查数据集上的训练效果。

- 创建日志文件记录：使用LogWriter()并明确文件路径
- 训练参数实时可视化：使用add_scalar()并明确参数名称以及值
- 多组实验对比：创建多组子日志文件，如
	* 'log/lenet'
    * 'log/alexnet'
    
    * 'log/vgg'
    
	* 'log/googlenet'
    
    * 'log/resnet'
    
      接着在写入数据时，同一类参数使用统一tag名称写入即可
    
- 图像数据训练过程可视化：
	* 导出每一层网络的输出图片，并存入一个'list'，后续写入日志文件实现可视化
    * 使用add_image()并明确图片名称、图片数据（array格式）、所处迭代次数



详细说明请参考[使用指南](https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/components/README.md)

欢迎[加入VisualDL官方QQ群](https://jq.qq.com/?_wv=1027&k=TyzyVT4C)：1045783368

<img src="https://ai-studio-static-online.cdn.bcebos.com/830248838e5c4d159b2496d6f4cc2eaa65377984f69a4cd086729e37f75f0954" width = "200" height = "100" align=center />

