## 思路介绍

<font size=5>本项目是一个较为基础的二分类任务。
考虑到有29个输入特征，因此可以考虑使用较为简单的前馈全连接神经网络进行建模。
本项目大致步骤分为：</font>
![](https://ai-studio-static-online.cdn.bcebos.com/e3d9c46c5c82413db44086d61fad24179a64567df7a74d788416ef78922f85ce)



<font size=5>具体数据处理与调优策略详见代码注释部分</font>

## 导入所需要的包

In [1]:
import pandas as pd  #数据分析库
import paddle  
import numpy as np  #科学计算库
import seaborn as sns   #可视化库

#读取csv文件
train_data = pd.read_csv('data/data137276/train.csv.zip')
test_data = pd.read_csv('data/data137276/test.csv.zip')

#数据预处理
train_data = train_data.drop(['id', 'timecc'], axis=1)
test_data = test_data.drop(['id', 'timecc'], axis=1)

  from collections import MutableMapping
  from collections import Iterable, Mapping
  from collections import Sized


## 归一化

In [2]:
#通过每个元素除以其最大值的形式进行归一化
for col in train_data.columns[1:]:
    train_data[col] /= train_data[col].max()
    test_data[col] /= test_data[col].max()
#对于归一化的操作还可以利用最大-最小归一化或者均值-方差归一化，这里采用简单的最大值法进行归一化#

## 搭建模型

In [None]:
class Classifier(paddle.nn.Layer):
    def __init__(self):
        super(Classifier, self).__init__()
        
        #三层隐藏层，分别为100，50，20个结点
        self.fc1 = paddle.nn.Linear(in_features=29, out_features=100)
        self.fc2 = paddle.nn.Linear(in_features=100, out_features=50)
        self.fc3 = paddle.nn.Linear(in_features=50, out_features=20)
        self.fc4 = paddle.nn.Linear(in_features=20, out_features=1)
        self.relu = paddle.nn.ReLU()
    
    # 前馈计算
    def forward(self, inputs):
        x = self.relu(self.fc1(inputs))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.fc4(x)
        return x

## 模型训练

In [4]:
#创建模型实例
model = Classifier()

#模型训练
model.train()

#定义优化器
#采用随机梯度下降法寻优，学习率设置为0.01
#这里还可以采用其他优化算法
opt = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())

#损失函数
#这里也可以使用交叉熵损失
loss_fn = paddle.nn.BCEWithLogitsLoss()

W1104 12:34:47.715279  2027 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W1104 12:34:47.719049  2027 device_context.cc:465] device: 0, cuDNN Version: 7.6.


In [7]:
EPOCH_NUM = 15   # 外层循环次数       #可设置更多轮数
BATCH_SIZE = 120  # 设置batch大小      #通过多次测试，发现batch_size=120时效果较好

training_data = train_data.iloc[:-1000,].values.astype(np.float32)
val_data = train_data.iloc[-1000:, ].values.astype(np.float32)

for epoch_id in range(EPOCH_NUM):
    np.random.shuffle(training_data)
    
    # 将训练数据进行拆分
    mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)]
    
    
    for iter_id, mini_batch in enumerate(mini_batches):
        x = np.array(mini_batch[:, 1:]) # 获得当前批次训练数据
        y = np.array(mini_batch[:, :1]) # 获得当前批次训练标签
        
        # 将numpy数据转为飞桨动态图tensor的格式
        features = paddle.to_tensor(x)
        y = paddle.to_tensor(y)
        
        # 前向计算
        predicts = model(features)
        
        # 计算损失
        loss = loss_fn(predicts, y, )
        avg_loss = paddle.mean(loss)
        if iter_id%200==0:
            acc = (predicts > 0).astype(int).flatten() == y.flatten().astype(int)
            acc = acc.astype(float).mean()

            print("epoch: {}, iter: {}, loss is: {}, acc is {}".format(epoch_id, iter_id, avg_loss.numpy(), acc.numpy()))
        
        # 反向传播，计算每层参数的梯度值
        avg_loss.backward()
        # 更新参数，根据设置好的学习率迭代一步
        opt.step()
        # 清空梯度变量，以备下一轮计算
        opt.clear_grad()

epoch: 0, iter: 0, loss is: [0.70042956], acc is [0.48333333]
epoch: 0, iter: 200, loss is: [0.67738193], acc is [0.71666667]
epoch: 0, iter: 400, loss is: [0.6610948], acc is [0.68333333]
epoch: 0, iter: 600, loss is: [0.66419184], acc is [0.69166667]
epoch: 0, iter: 800, loss is: [0.6436514], acc is [0.7]
epoch: 0, iter: 1000, loss is: [0.6101005], acc is [0.73333333]
epoch: 0, iter: 1200, loss is: [0.56124663], acc is [0.80833333]
epoch: 0, iter: 1400, loss is: [0.5477344], acc is [0.79166667]
epoch: 1, iter: 0, loss is: [0.5378845], acc is [0.78333333]
epoch: 1, iter: 200, loss is: [0.5364874], acc is [0.75]
epoch: 1, iter: 400, loss is: [0.4506206], acc is [0.80833333]
epoch: 1, iter: 600, loss is: [0.46172494], acc is [0.78333333]
epoch: 1, iter: 800, loss is: [0.4224408], acc is [0.79166667]
epoch: 1, iter: 1000, loss is: [0.47177482], acc is [0.8]
epoch: 1, iter: 1200, loss is: [0.44979936], acc is [0.75833333]
epoch: 1, iter: 1400, loss is: [0.41606823], acc is [0.825]
epoch: 

## 模型预测

In [5]:
model.eval()    #模型预测
test_data = paddle.to_tensor(test_data.values.astype(np.float32))   #将测试数据转换为tensor格式
test_predict = model(test_data)     #开始预测
test_predict = (test_predict > 0).astype(int).flatten()

## 将结果写入csv文件并压缩

In [7]:
pd.DataFrame({'win':
              test_predict.numpy()
             }).to_csv('submission.csv', index=None)    #写入csv

!zip submission.zip submission.csv  #压缩

  adding: submission.csv (deflated 96%)
