# 飞桨学习赛：个贷违约预测 - 使用Paddle极简构造个贷违约预测器 - 202208第3名方案：

## 参考官方给的案例，编写了多种模型校验

参考案例：
用户：笠雨聆月
项目：https://aistudio.baidu.com/aistudio/projectdetail/3555696

## 赛题介绍
利用已有的与目标客群稍有差异的另一批信贷数据，辅助目标业务风控模型的创建，两者数据集之间存在大量相同的字段和极少的共同用户。此处希望大家可以利用迁移学习捕捉不同业务中用户基本信息与违约行为之间的关联，帮助实现对新业务的用户违约预测。

## 数据介绍
训练集包括10000条数据，属性分别为

'loan_id', 'user_id', 'total_loan', 'year_of_loan', 'interest',
       'monthly_payment', 'class', 'employer_type', 'industry', 'work_year',
       'house_exist', 'censor_status', 'issue_date', 'use', 'post_code',
       'region', 'debt_loan_ratio', 'del_in_18month', 'scoring_low',
       'scoring_high', 'known_outstanding_loan', 'known_dero',
       'pub_dero_bankrup', 'recircle_b', 'recircle_u', 'initial_list_status',
       'app_type', 'earlies_credit_mon', 'title', 'policy_code', 'f0', 'f1',
       'f2', 'f3', 'f4', 'early_return', 'early_return_amount',
       'early_return_amount_3mon', 'isDefault'
       
其中'isDefault'为目标分类值，近1/3为分类变量，近2/3为数值型变量，剩余变量包含时间信息等。

并且10000条数据中仅1683个样本为正样本，剩余为负样本，即样本不平衡。

## 方案介绍

1. 对于样本不平衡的问题，修改损失函数的权重，将负样本的权重设为0.2，正样本为1.0
2. 对于连续数值型变量，对他们进行 均值-方差归一化
3. 对于分类变量，在网络中进行重编码（即增加一个全连接层，用于模拟embedding）
4. 本方案直接弃用了地区编码，时间等信息。

# 代码

In [1]:
# 查看样本信息
import pandas as pd
train_df=pd.read_csv('data/data130186/train_public.csv')
# 展示一下train_df
# train_df.columns # 展示列名
train_df

Unnamed: 0,loan_id,user_id,total_loan,year_of_loan,interest,monthly_payment,class,employer_type,industry,work_year,...,policy_code,f0,f1,f2,f3,f4,early_return,early_return_amount,early_return_amount_3mon,isDefault
0,1040418,240418,31818.18182,3,11.466,1174.91,C,政府机构,金融业,3 years,...,1,1.0,0.0,4.0,5.0,4.0,3,9927,0.0,0
1,1025197,225197,28000.00000,5,16.841,670.69,C,政府机构,金融业,10+ years,...,1,7.0,0.0,4.0,45.0,22.0,0,0,0.0,0
2,1009360,209360,17272.72727,3,8.900,603.32,A,政府机构,公共服务、社会组织,10+ years,...,1,6.0,0.0,6.0,28.0,19.0,0,0,0.0,0
3,1039708,239708,20000.00000,3,4.788,602.30,A,世界五百强,文化和体育业,6 years,...,1,5.0,0.0,10.0,15.0,9.0,0,0,0.0,0
4,1027483,227483,15272.72727,3,12.790,470.31,C,政府机构,信息传输、软件和信息技术服务业,< 1 year,...,1,10.0,0.0,6.0,15.0,4.0,0,0,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,1028093,228093,17727.27273,3,15.037,510.27,B,普通企业,建筑业,7 years,...,1,4.0,0.0,4.0,11.0,7.0,2,5287,0.0,0
9996,1043911,243911,13636.36364,3,6.534,464.95,A,政府机构,农、林、牧、渔业,2 years,...,1,2.0,0.0,2.0,7.0,6.0,3,7182,0.0,0
9997,1023503,223503,24818.18182,3,14.421,708.69,B,普通企业,信息传输、软件和信息技术服务业,10+ years,...,1,6.0,0.0,5.0,15.0,11.0,1,8540,2562.0,0
9998,1024616,224616,20000.00000,3,18.450,727.58,D,政府机构,农、林、牧、渔业,10+ years,...,1,7.0,0.0,5.0,17.0,10.0,2,6161,616.1,0


## 构造dataset

通过paddle的io.dataset构造数据读取可以不用手动写入shuffle和drop_last判断，但是本质效果和直接写一个loader进行yeild是一样的

In [2]:
import paddle
import numpy as np
import paddle.vision.transforms as T
from PIL import Image
import pandas as pd

class MyDateset(paddle.io.Dataset):
    # csv_dir对应要读取的数据地址，standard_csv_dir用于生成均值和方差信息对数据进行归一化的文件地址
    def __init__(self,csv_dir,standard_csv_dir='data/data130186/train_public.csv',mode = 'train'):
        super(MyDateset, self).__init__()

        # 读取数据
        self.df = pd.read_csv(csv_dir)
        
        # 构造各个变量的均值和方差
        st_df = pd.read_csv(standard_csv_dir)
        self.mean_df = st_df.mean()
        self.std_df = st_df.std()

        ##########################
        print(self.mean_df)
        ##########################

        # 分别指定数值型变量/分类变量/不使用的变量
        self.num_item = ['total_loan', 'year_of_loan', 'interest','monthly_payment',
        'debt_loan_ratio', 'del_in_18month', 'scoring_low','scoring_high', 'known_outstanding_loan', 'known_dero','pub_dero_bankrup', 'recircle_b', 'recircle_u', 
        'f0', 'f1','f2', 'f3', 'f4', 'early_return', 'early_return_amount','early_return_amount_3mon']
        self.un_num_item = ['class','employer_type','industry','work_year','house_exist', 'censor_status',
        'use',
        'initial_list_status','app_type',
        'policy_code']
        self.un_use_item = ['loan_id', 'user_id',
        'issue_date', 
        'post_code', 'region',
        'earlies_credit_mon','title']

        # 构造一个映射表，将分类变量/分类字符串映射到对应数值上
        un_num_item_list = {}
        for item in self.un_num_item:
            un_num_item_list[item]=list(set(st_df[item].values))
        self.un_num_item_list = un_num_item_list

        self.mode = mode

    def __getitem__(self, index):
        data=[]

        # 进行归一化，如果这个数值缺省了直接设置为0
        for item in self.num_item:
            if np.isnan(self.df[item][index]):
                data.append((0-self.mean_df[item])/self.std_df[item])
            else:
                data.append((self.df[item][index]-self.mean_df[item])/self.std_df[item])
        
        emb_data = []

        # 将分类变量映射到对应数值上
        for item in self.un_num_item:
            try:
                if self.df[item][index] not in self.un_num_item_list[item]:
                    emb_data.append(-1)
                else:
                    emb_data.append(self.un_num_item_list[item].index(self.df[item][index]))
            except:
                emb_data.append(-1)

        data = paddle.to_tensor(data).astype('float32')
        emb_data = paddle.to_tensor(emb_data).astype('float32')

        # 如果当前模式不为train，则返回对应的loan_id，用于锁定样本条目
        if self.mode == 'train':
            label = self.df['isDefault'][index]
        else:
            label = self.df['loan_id'][index]

        label = np.array(label).astype('int64')
        return data,emb_data,label

    def __len__(self):
        return len(self.df)

In [3]:
dataset=MyDateset('data/data130186/train_public.csv')
[data,emb_data,label] = dataset[0]
print(data.shape)
print(emb_data.shape)
print(label)


loan_id                     1.025210e+06
user_id                     2.252096e+05
total_loan                  1.440213e+04
year_of_loan                3.479600e+00
interest                    1.322278e+01
monthly_payment             4.369604e+02
house_exist                 6.122000e-01
censor_status               1.014600e+00
use                         1.762600e+00
post_code                   2.575191e+02
region                      1.631990e+01
debt_loan_ratio             1.753217e+01
del_in_18month              3.116000e-01
scoring_low                 6.641156e+02
scoring_high                7.744483e+02
known_outstanding_loan      1.164500e+01
known_dero                  2.264000e-01
pub_dero_bankrup            1.389973e-01
recircle_b                  1.654830e+04
recircle_u                  5.362262e+01
initial_list_status         4.141000e-01
app_type                    2.000000e-02
title                       1.808202e+03
policy_code                 1.000000e+00
f0              

## 构造网络

对于分类变量使用两层全连接成生成embedding。
数字数据和文本数据分别使用两层全连接。

In [7]:
class MyNet(paddle.nn.Layer):
    def __init__(self):
        super(MyNet,self).__init__()
        self.fc = paddle.nn.Linear(in_features=21, out_features=512)
        self.fc2 = paddle.nn.Linear(in_features=512,out_features=1024)

        self.emb1 = paddle.nn.Linear(in_features=10,out_features=512)
        self.emb2 = paddle.nn.Linear(in_features=512,out_features=1024)

        self.out = paddle.nn.Linear(in_features=2048,out_features=2)

    def forward(self,data,emb_data):
        x = self.fc(data)
        x = self.fc2(x)

        emb = self.emb1(emb_data)
        emb = self.emb2(emb)

        x = paddle.concat([x,emb],axis=-1)

        x = self.out(x)
        
        x = paddle.nn.functional.sigmoid(x)
        return x

## 构造CNN网络
对于分类变量使用CNN模型，再合并输出embedding。
分别使用1维卷积，不使用池化，因为怕信息丢失太多。

In [5]:
#CNN 模型
class MyNetCNN(paddle.nn.Layer):
    def __init__(self):
        super(MyNetCNN,self).__init__()
        self.data_conv1d1= paddle.nn.Conv1D(in_channels=1,out_channels=30,kernel_size=2,padding=0)
        self.data_maxpool1 = paddle.nn.MaxPool1D(kernel_size=2,stride=1)
        self.data_conv1d2 = paddle.nn.Conv1D(in_channels=30,out_channels=60,kernel_size=2,padding=0)
        self.data_maxpool2 = paddle.nn.MaxPool1D(kernel_size=2,stride=1)

        self.out = paddle.nn.Linear(in_features=1620,out_features=2)

    def forward(self,data,emb_data):
        
        x = self.data_conv1d1(data)
        #x = self.data_maxpool1(x)
        x = self.data_conv1d2(x)
        #x = self.data_maxpool2(x)

        emb_x = self.data_conv1d1(emb_data)
        #emb_x = self.data_maxpool1(emb_x)
        emb_x = self.data_conv1d2(emb_x)
        #emb_x = self.data_maxpool2(emb_x)

        x = paddle.reshape(x,[x.shape[0],-1])
        emb_x = paddle.reshape(emb_x,[emb_x.shape[0],-1])

        x = paddle.concat([x,emb_x],axis=-1)
        x = self.out(x)
        x = paddle.nn.functional.sigmoid(x)
        return x

## 构造复杂的网络，提高准确率
对于分类变量使用更复杂的全连接模型，再合并输出embedding。
数字数据和文本数据分别使用三层全连接。
合并输出再使用两层全连接。
也可以尝试使用Dropout和激活函数，测试后发现效果不佳。


In [4]:
#构造复杂的网络，提高准确率
class MyNet2(paddle.nn.Layer):
    def __init__(self):
        super(MyNet2,self).__init__()
        self.fc1 = paddle.nn.Linear(in_features=21, out_features=128)
        self.Sigmoid1 = paddle.nn.Sigmoid()
        self.dropout1 = paddle.nn.Dropout(p=0.2)
        self.fc2 = paddle.nn.Linear(in_features=128, out_features=256)
        self.Sigmoid2 = paddle.nn.Sigmoid()
        self.dropout2 = paddle.nn.Dropout(p=0.3)
        self.fc3 = paddle.nn.Linear(in_features=256, out_features=512)
        self.Sigmoid3 = paddle.nn.Sigmoid()
        self.dropout3 = paddle.nn.Dropout(p=0.4)

        self.emb1 = paddle.nn.Linear(in_features=10,out_features=128)
        self.emb_Sigmoid1 = paddle.nn.Sigmoid()
        self.emb_dropout1 = paddle.nn.Dropout(p=0.2)
        self.emb2 = paddle.nn.Linear(in_features=128,out_features=256)
        self.emb_Sigmoid2 = paddle.nn.Sigmoid()
        self.emb_dropout2 = paddle.nn.Dropout(p=0.2)
        self.emb3 = paddle.nn.Linear(in_features=256,out_features=512)
        self.emb_Sigmoid3 = paddle.nn.Sigmoid()
        self.emb_dropout3 = paddle.nn.Dropout(p=0.2)


        self.con_fc1 = paddle.nn.Linear(in_features=1024, out_features=1024)
        self.con_Sigmoid1 = paddle.nn.Sigmoid()
        self.con_dropout1 = paddle.nn.Dropout(p=0.2)
        self.con_fc2 = paddle.nn.Linear(in_features=1024, out_features=2048)
        self.con_Sigmoid2 = paddle.nn.Sigmoid()
        self.con_dropout2 = paddle.nn.Dropout(p=0.2)
        

        self.out = paddle.nn.Linear(in_features=2048,out_features=2)

    def forward(self,data,emb_data):
        x = self.fc1(data)
        #x = self.dropout1(x)
        #x = self.Sigmoid1(x)
        x = self.fc2(x)
        #x = self.dropout2(x)
        #x = self.Sigmoid2(x)
        
        x = self.fc3(x)
        #x = self.dropout3(x)
        #x = self.Sigmoid3(x)
        

        emb = self.emb1(emb_data)
        #emb = self.emb_dropout1(emb)
        #emb = self.emb_Sigmoid1(emb)
        
        emb = self.emb2(emb)
        #emb = self.emb_dropout2(emb)
        #emb = self.emb_Sigmoid2(emb)

        emb = self.emb3(emb)
        #emb = self.emb_Sigmoid3(emb)

        x = paddle.concat([x,emb],axis=-1)

        x = self.con_fc1(x)
        #x = self.con_dropout1(x)
        #x = self.con_Sigmoid1(x)
        x = self.con_fc2(x)
        #x = self.con_dropout2(x)
        #x = self.con_Sigmoid2(x)

        x = self.out(x)
        
        x = paddle.nn.functional.sigmoid(x)
        return x

## 构造复杂的网络，提高准确率
对于分类变量先合并，再使用4层全连接模型输出embedding。


In [27]:
#对数据进行合并构造复杂的网络，提高准确率
class MyNet3(paddle.nn.Layer):
    def __init__(self):
        super(MyNet3,self).__init__()
        self.fc1 = paddle.nn.Linear(in_features=31, out_features=64)
        self.Sigmoid1 = paddle.nn.Sigmoid()
        self.dropout1 = paddle.nn.Dropout(p=0.2)
        self.fc2 = paddle.nn.Linear(in_features=64, out_features=256)
        self.Sigmoid2 = paddle.nn.Sigmoid()
        self.dropout2 = paddle.nn.Dropout(p=0.3)
        self.fc3 = paddle.nn.Linear(in_features=256, out_features=512)
        self.Sigmoid3 = paddle.nn.Sigmoid()
        self.dropout3 = paddle.nn.Dropout(p=0.2)

        self.fc4 = paddle.nn.Linear(in_features=512, out_features=1024)
        self.Sigmoid4 = paddle.nn.Sigmoid()
        self.dropout4 = paddle.nn.Dropout(p=0.2)

        self.out = paddle.nn.Linear(in_features=1024,out_features=2)

    def forward(self,data,emb_data):

        x = paddle.concat([data,emb_data],axis=-1)

        x = self.fc1(x)
        #x = self.dropout1(x)
        #x = self.Sigmoid1(x)
        x = self.fc2(x)
        #x = self.dropout2(x)
        #x = self.Sigmoid2(x)
        
        x = self.fc3(x)
        #x = self.dropout3(x)
        #x = self.Sigmoid3(x)
        
        x = self.fc4(x)
        #x = self.Sigmoid4(x)
        
        x = self.out(x)
        
        x = paddle.nn.functional.sigmoid(x)
        return x

## 构造读取器

In [5]:
# 构造读取器
train_dataset=MyDateset('data/data130186/train_public.csv')

train_dataloader = paddle.io.DataLoader(
    train_dataset,
    batch_size=1000,
    shuffle=True,
    drop_last=False)

loan_id                     1.025210e+06
user_id                     2.252096e+05
total_loan                  1.440213e+04
year_of_loan                3.479600e+00
interest                    1.322278e+01
monthly_payment             4.369604e+02
house_exist                 6.122000e-01
censor_status               1.014600e+00
use                         1.762600e+00
post_code                   2.575191e+02
region                      1.631990e+01
debt_loan_ratio             1.753217e+01
del_in_18month              3.116000e-01
scoring_low                 6.641156e+02
scoring_high                7.744483e+02
known_outstanding_loan      1.164500e+01
known_dero                  2.264000e-01
pub_dero_bankrup            1.389973e-01
recircle_b                  1.654830e+04
recircle_u                  5.362262e+01
initial_list_status         4.141000e-01
app_type                    2.000000e-02
title                       1.808202e+03
policy_code                 1.000000e+00
f0              

## 训练，并设置最小的loss和最大的准确率。
当模型当前loss小于最小loss时，保存该模型并更新最小loss。
当模型当前Auc大于最大Auc时，保存该模型并更新最大Auc。


In [6]:
# 构造模型
isCNN = 0
#model = MyNetCNN()
#isCNN = 1
model = MyNet2()
# model_dict = paddle.load('model.pdparams')
# model.set_dict(model_dict)
model.train()
max_epoch=100
opt = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())

# 训练
now_step=0
# 最小的loss
minLoss = 0.48
# 最大的Auc
maxAuc = 0.835
m = paddle.metric.Auc()
for epoch in range(max_epoch):
    for step, data in enumerate(train_dataloader):
        now_step+=1

        data,emb_data, label = data
        if isCNN == 1:
            data = data.reshape([data.shape[0],1,data.shape[1]])
            emb_data = emb_data.reshape([emb_data.shape[0],1,emb_data.shape[1]])
        pre = model(data,emb_data)
        
        #print(pre.shape)
        loss = paddle.nn.functional.cross_entropy(pre,label,weight=paddle.to_tensor([0.2,1.0]),reduction='mean')
        # loss = paddle.mean(loss)
        loss.backward()
        opt.step()
        opt.clear_gradients()
        
        if now_step%1==0:
            if minLoss > loss.mean():
                minLoss = loss.mean().numpy()
                paddle.save(model.state_dict(), 'model/modelMinLoss_mynet_{}.pdparams'.format(minLoss))

            m.update(preds=pre, labels=label)
            resAccumulate = m.accumulate()
            if maxAuc < resAccumulate:
                maxAuc = resAccumulate
                paddle.save(model.state_dict(), 'model/modelmaxAuc_mynet_{}.pdparams'.format(maxAuc))
            print("epoch: {}, batch: {}, Auc is:{}, loss is: {}".format(epoch, step, resAccumulate,loss.mean().numpy()))
        

# 保存模型到model.pdparams
paddle.save(model.state_dict(), 'modelM_mynet.pdparams')

  format(lhs_dtype, rhs_dtype, lhs_dtype))


epoch: 0, batch: 0, Auc is:0.6220063629117022, loss is: [0.7062972]
epoch: 0, batch: 1, Auc is:0.6372079284215935, loss is: [0.69641393]
epoch: 0, batch: 2, Auc is:0.6176093354779898, loss is: [0.6831036]
epoch: 0, batch: 3, Auc is:0.6104863090261837, loss is: [0.68203586]
epoch: 0, batch: 4, Auc is:0.621920510274286, loss is: [0.68506503]
epoch: 0, batch: 5, Auc is:0.624742612194785, loss is: [0.67886966]
epoch: 0, batch: 6, Auc is:0.6349409128778151, loss is: [0.67931896]
epoch: 0, batch: 7, Auc is:0.6362777551026074, loss is: [0.67467475]
epoch: 0, batch: 8, Auc is:0.6460048140123917, loss is: [0.67041004]
epoch: 0, batch: 9, Auc is:0.6501510161342257, loss is: [0.66909844]
epoch: 1, batch: 0, Auc is:0.6601486754727016, loss is: [0.6603739]
epoch: 1, batch: 1, Auc is:0.665241589696695, loss is: [0.660295]
epoch: 1, batch: 2, Auc is:0.671893265593028, loss is: [0.66338736]
epoch: 1, batch: 3, Auc is:0.6756333032977954, loss is: [0.6572029]
epoch: 1, batch: 4, Auc is:0.679313540447437

## 预测

载入保存下来的模型

最后直接提交生成result.csv即可

In [7]:
# 读取模型和构造读取器
#model = MyNetCNN()
model = MyNet2()
# 如果想要替换自己的训练结果请替换load的pdparams文件路径，如model.pdarams
model_dict = paddle.load('model/modelMinLoss_mynet_[0.47419816].pdparams')
# model_dict = paddle.load('model.pdparams')
model.set_dict(model_dict)
model.eval()

test_dataset=MyDateset('data/data130187/test_public.csv',mode = 'test')
test_dataloader = paddle.io.DataLoader(
    test_dataset,
    batch_size=1,
    shuffle=False,
    drop_last=False)

loan_id                     1.025210e+06
user_id                     2.252096e+05
total_loan                  1.440213e+04
year_of_loan                3.479600e+00
interest                    1.322278e+01
monthly_payment             4.369604e+02
house_exist                 6.122000e-01
censor_status               1.014600e+00
use                         1.762600e+00
post_code                   2.575191e+02
region                      1.631990e+01
debt_loan_ratio             1.753217e+01
del_in_18month              3.116000e-01
scoring_low                 6.641156e+02
scoring_high                7.744483e+02
known_outstanding_loan      1.164500e+01
known_dero                  2.264000e-01
pub_dero_bankrup            1.389973e-01
recircle_b                  1.654830e+04
recircle_u                  5.362262e+01
initial_list_status         4.141000e-01
app_type                    2.000000e-02
title                       1.808202e+03
policy_code                 1.000000e+00
f0              

In [8]:
# 将结果保存在result.csv中
result = []
for step, data in enumerate(test_dataloader):
    data ,emb_data, loan_id = data
    if isCNN == 1:
        data = data.reshape([data.shape[0],1,data.shape[1]])
        emb_data = emb_data.reshape([emb_data.shape[0],1,emb_data.shape[1]])
    pre = model(data,emb_data)
    preRes = 0
    if pre[:,1].numpy()[0] >0.3:
        preRes = 1
    result.append([loan_id.numpy()[0], pre[:,1].numpy()[0]])
    # result.append([loan_id.numpy()[0], np.argmax(pre.numpy())])

pd.DataFrame(result,columns=['id','isDefault']).to_csv('result.csv',index=None)

In [21]:
#提交结果
!rm -rf submit.sh
!wget -O submit.sh http://ai-studio-static.bj.bcebos.com/script/submit.sh
!sh submit.sh /result.csv 

--2022-08-18 11:37:44--  http://ai-studio-static.bj.bcebos.com/script/submit.sh
正在解析主机 ai-studio-static.bj.bcebos.com (ai-studio-static.bj.bcebos.com)... 182.61.200.195, 182.61.200.229, 2409:8c04:1001:1002:0:ff:b001:368a
正在连接 ai-studio-static.bj.bcebos.com (ai-studio-static.bj.bcebos.com)|182.61.200.195|:80... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度： 340 [text/x-sh]
正在保存至: “submit.sh”


2022-08-18 11:37:45 (102 MB/s) - 已保存 “submit.sh” [340/340])

--2022-08-18 11:37:45--  http://ai-studio-static.bj.bcebos.com/script/bce-python-sdk.zip
正在解析主机 ai-studio-static.bj.bcebos.com (ai-studio-static.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
正在连接 ai-studio-static.bj.bcebos.com (ai-studio-static.bj.bcebos.com)|182.61.200.229|:80... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度： 80914 (79K) [application/zip]
正在保存至: “bce-python-sdk.zip”


2022-08-18 11:37:45 (5.82 MB/s) - 已保存 “bce-python-sdk.zip” [80914/80914])

Archive:  bce-python-sdk.zip
   creating: bce-python-sdk/


# 结语

本项目通过非常简朴的方式构造了一个分类器，AUC为0.85984，查看了原比赛链接，榜单上大量选手得分为0.9，因此本模型仍有改进空间。

可以考虑从以下几点进行改进：

1. 更替网络结构：当前的网络仅包含4个全连接层和一个Sigmoid函数，较为简朴，可以增加层数和激活函数促使模型收敛到更高精度。
2. 对时间信息等加以使用。
3. 使用dropout等策略。



请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.  <br>
Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. 