根据之前的知识，我们现在来进行一些训练。<br>
In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” <br>
using passenger data (ie name, age, gender, socio-economic class, etc).<br>
<br>
Variable Notes<br>
<br>
pclass: A proxy for socio-economic status (SES)<br>
1st = Upper<br>
2nd = Middle<br>
3rd = Lower<br>
<br>
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5<br>
<br>
sibsp: The dataset defines family relations in this way...<br>
<br>
Sibling = brother, sister, stepbrother, stepsister<br>
<br>
Spouse = husband, wife (mistresses and fiancés were ignored)<br>
<br>
parch: The dataset defines family relations in this way...<br>
<br>
Parent = mother, father<br>
<br>
Child = daughter, son, stepdaughter, stepson<br>
Some children travelled only with a nanny, therefore parch=0 for them.<br>

In [53]:
import torch
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
from torch.utils.data import DataLoader,TensorDataset
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

#加载文件
train_df = pd.read_csv('./dataset/titanic/train.csv')
test_df = pd.read_csv('./dataset/titanic/test.csv')

#print(train_df.isnull().sum())
#print(test_df.isnull().sum())

#填补空缺
#inplace=true表示在原数据框内填写，不创建新副本。
train_df['Age'].fillna(train_df['Age'].mean(),inplace = True)
#.mode()是为了找众数，[0]是因为众数可能不止一个，取第一个数
train_df['Embarked'].fillna(train_df['Embarked'].mode()[0],inplace = True)
train_df['Fare'].fillna(train_df['Fare'].median(),inplace= True)
train_df['Cabin'].fillna('Unknown',inplace=True)
train_df['Name'].fillna('Unknown',inplace=True)
train_df['Ticket'].fillna('Unknown',inplace=True)

test_df['Age'].fillna(test_df['Age'].mean(),inplace=True)
test_df['Embarked'].fillna(test_df['Embarked'].mode()[0],inplace= True)
test_df['Fare'].fillna(test_df['Fare'].median(),inplace=True)
test_df['Cabin'].fillna('Unknown',inplace=True)
test_df['Name'].fillna('Unknown',inplace=True)
test_df['Ticket'].fillna('Unknown',inplace=True)

#Sex encode
#讲分类变量转化为数值
#.fit_transform先拟合数据，然后进行转换
#.transform将拟合好的编码器用于新的数据
labelencoder_sex = LabelEncoder()
train_df['Sex'] = labelencoder_sex.fit_transform(train_df['Sex'])
test_df['Sex'] = labelencoder_sex.transform(test_df['Sex'])

#Embarked one-hot encode
#pd.getdimmies将分类变量转化为onehot编码，columns指定需要编码的列
#drop_first=Ture删除第一个类别避免多重共线性
train_df = pd.get_dummies(train_df,columns=['Embarked'],drop_first=True)
test_df = pd.get_dummies(test_df,columns=['Embarked'],drop_first=True)


#title encode
train_df['Title'] = train_df['Name'].apply(lambda x: x.split(',')[1].split('.')[0].strip())
test_df['Title'] = test_df['Name'].apply(lambda x: x.split(',')[1].split('.')[0].strip())


for df in [train_df,test_df]:
    df['Title'] = df['Title'].replace(['Mlle','Ms'],'Miss')
    df['Title'] = df['Title'].replace(['Mme'],'Mrs')
    df['Title'] = df['Title'].replace(['Dr','Rev','Col','Major','jonkheer','Capt','Don','Sir','Lady','Countess','Dona'],'Rare')

labelencoder_title = LabelEncoder()
train_df['Title'] = labelencoder_title.fit_transform(train_df['Title'])
test_df['Title'] = labelencoder_title.transform(test_df['Title'])

#Carbin encode

train_df['Cabin'] = train_df['Cabin'].apply(lambda x: x[0])
test_df['Cabin'] = test_df['Cabin'].apply(lambda x : x[0])
labelencoder_carbin = LabelEncoder()
train_df['Cabin'] = labelencoder_carbin.fit_transform(train_df['Cabin'])
test_df['Cabin'] = labelencoder_carbin.transform(test_df['Cabin'])

#ticket encode

def process_ticket(ticket):
    #去掉ticket两边的空格，然后按照‘ ’（空格）区分，然后取最后一列。
    ticket = ticket.strip().split(' ')[-1]
    return 'X' if not ticket.isdigit() else ticket

train_df['Ticket'] = train_df['Ticket'].apply(process_ticket)
test_df['Ticket'] = test_df['Ticket'].apply(process_ticket)
combined_tickets = pd.concat([train_df['Ticket'], test_df['Ticket']], axis=0)
labelencoder_ticket = LabelEncoder()
labelencoder_ticket.fit(combined_tickets)
train_df['Ticket'] = labelencoder_ticket.transform(train_df['Ticket'])
test_df['Ticket'] = labelencoder_ticket.transform(test_df['Ticket'])

#choose feature and in-outputs
features = ['Pclass','Title','Sex','Age','SibSp','Parch','Ticket','Fare','Cabin','Embarked_S','Embarked_Q']
X_train = train_df[features]
Y_train = train_df['Survived']
X_test = test_df[features]


#标准化数据
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#totensor
X_train_tensor = torch.tensor(X_train,dtype=torch.float32)
Y_train_tensor = torch.tensor(Y_train.values,dtype=torch.float32)
X_test_tensor = torch.tensor(X_test,dtype=torch.float32)



#loaddata
train_dataset = TensorDataset(X_train_tensor,Y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=64,shuffle=True)



#model
class Net(torch.nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.fc1 = torch.nn.Linear(len(features),8)
        self.fc2 = torch.nn.Linear(8,6)
        self.fc3 = torch.nn.Linear(6,3)
        self.fc4 = torch.nn.Linear(3,1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self,x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x
model = Net()

criterion = torch.nn.BCELoss(reduction = 'mean')
optimizer =  optim.Adam(model.parameters(),lr=0.001)

def train(epoch):
    for batch_idx, data in enumerate (train_loader,0):
        inputs,target = data
        target = target.view(-1,1)

        outputs = model(inputs)
        loss = criterion(outputs,target)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


def test():
    model.eval()
    predictions=[]
    with torch.no_grad():
        for inputs in X_test_tensor:
            outputs = model(inputs).squeeze()
            predicted = torch.round(outputs).item()
            predictions.append(predicted)
    survived_count = sum(predictions)
    total_count = len(predictions)
    survival_rate = survived_count / total_count * 100
    print(f'Survival Rate: {survival_rate:.2f}%')
    for passenger_id, prediction in zip(test_df['PassengerId'], predictions):
        print('PassengerId: %d, Predicted: %d'% (passenger_id, int(prediction)))

for epoch in range(10):
    train(epoch)
test()


Survival Rate: 22.25%
PassengerId: 892, Predicted: 0
PassengerId: 893, Predicted: 0
PassengerId: 894, Predicted: 0
PassengerId: 895, Predicted: 0
PassengerId: 896, Predicted: 0
PassengerId: 897, Predicted: 0
PassengerId: 898, Predicted: 1
PassengerId: 899, Predicted: 0
PassengerId: 900, Predicted: 1
PassengerId: 901, Predicted: 0
PassengerId: 902, Predicted: 0
PassengerId: 903, Predicted: 0
PassengerId: 904, Predicted: 1
PassengerId: 905, Predicted: 0
PassengerId: 906, Predicted: 0
PassengerId: 907, Predicted: 1
PassengerId: 908, Predicted: 0
PassengerId: 909, Predicted: 0
PassengerId: 910, Predicted: 0
PassengerId: 911, Predicted: 0
PassengerId: 912, Predicted: 0
PassengerId: 913, Predicted: 0
PassengerId: 914, Predicted: 1
PassengerId: 915, Predicted: 0
PassengerId: 916, Predicted: 1
PassengerId: 917, Predicted: 0
PassengerId: 918, Predicted: 1
PassengerId: 919, Predicted: 0
PassengerId: 920, Predicted: 0
PassengerId: 921, Predicted: 0
PassengerId: 922, Predicted: 0
PassengerId: 923,

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  train_df['Age'].fillna(train_df['Age'].mean(),inplace = True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  train_df['Embarked'].fillna(train_df['Embarked'].mode()[0],inplace = True)
The behavior will change in pandas 3.0. This inplace method will never work because the interme