In [1]:
import torch
import torch.nn as nn
import numpy as np
torch.__version__

'1.0.0'

# 3.1 logistic regression in practice
In this chapter, we will deal with structured data and use logistic regression to classify structured data simply.
## 3.1.1 Introduction to logistic regression
Logistic regression is a kind of generalized linear regression (generalized linear model), which has many similarities with multiple linear regression analysis. Their model forms are basically the same, both have wx + b, where w and b are the parameters to be sought, the difference lies in their different dependent variables, multiple linear regression directly uses wx+b as the dependent variable, that is, y = wx+b , And logistic regression uses the function L to correspond wx+b to a hidden state p, p =L(wx+b), and then determine the value of the dependent variable according to the size of p and 1-p. If L is a logistic function, it is logistic regression, and if L is a polynomial function, it is polynomial regression.

To put it more popularly, logistic regression will add a layer of logistic function calls after linear regression.

Logistic regression is mainly for two-class prediction. We talked about the Sigmod function in the activation function. The Sigmod function is the most common logistic function, because the output of the Sigmod function is the probability value between 0 and 1, when the probability is greater than 0.5 is predicted as 1, and less than 0.5 is predicted as 0.

Let’s use public data to introduce

## 3.1.2 UCI German Credit Data Set

UCI German Credit is UCI's German credit data set, which contains original data and numerical data.

The German Credit data is a data set that predicts the tendency to default on loans based on personal bank loan information and overdue loan applications from customers. The data set contains 1000 pieces of data in 24 dimensions.

Here we directly use the processed numerical data as a display.

[Address](https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/)

## 3.2 Code combat
The german.data-numeric we use here is that numpy processes the numerical data, we can directly use numpy's load method to read

In [2]:
data=np.loadtxt("german.data-numeric")

After the data is read, we need to normalize the data

In [3]:
n,l=data.shape
for j in range(l-1):
    meanVal=np.mean(data[:,j])
    stdVal=np.std(data[:,j])
    data[:,j]=(data[:,j]-meanVal)/stdVal

Scramble data

In [4]:
np.random.shuffle(data)

Distinguish between the training set and the test set. Since there is no verification set here, we directly use the accuracy of the test set as the criterion for judging good or bad

Distinguishing rules: 900 for training and 100 for testing

The format of german.data-numeric is, the first 24 columns are 24 dimensions, and the last one is the label (0, 1) to be marked, so we distinguish the data and the label together

In [5]:
train_data=data[:900,:l-1]
train_lab=data[:900,l-1]-1
test_data=data[900:,:l-1]
test_lab=data[900:,l-1]-1

Below we define the model, the model is very simple

In [6]:
class LR(nn.Module):
    def __init__(self):
        super(LR,self).__init__()
        self.fc=nn.Linear(24,2) # Since 24 dimensions have been fixed, write 24 here
    def forward(self,x):
        out=self.fc(x)
        out=torch.sigmoid(out)
        return out


Accuracy on the test set

In [7]:
def test(pred,lab):
    t=pred.max(-1)[1]==lab
    return torch.mean(t.float())

Here are some settings

In [8]:
net=LR()
criterion=nn.CrossEntropyLoss() # Use CrossEntropyLoss loss
optm=torch.optim.Adam(net.parameters()) # Adam optimization
epochs=1000 # Train 1000 times


Let’s start training

In [9]:
for i in range(epochs):
    # Specify the model as training mode and calculate the gradient
    net.train()
    # Input values ​​need to be converted into torch Tensor
    x=torch.from_numpy(train_data).float()
    y=torch.from_numpy(train_lab).long()
    y_hat=net(x)
    loss=criterion(y_hat,y) # calculate loss
    optm.zero_grad() # Clear the loss of the previous step
    loss.backward() # Backpropagation
    optm.step() # optimization
    if (i+1)%100 == 0: # Here we output relevant information every 100 times
        # Specify the model as calculation mode
        net.eval()
        test_in=torch.from_numpy(test_data).float()
        test_l=torch.from_numpy(test_lab).long()
        test_out=net(test_in)
        # Use our test function to calculate accuracy
        accu=test(test_out,test_l)
        print("Epoch:{},Loss:{:.4f},Accuracy:{:.2f}".format(i+1,loss.item(),accu))

Epoch:100,Loss:0.6313,Accuracy：0.76
Epoch:200,Loss:0.6065,Accuracy：0.79
Epoch:300,Loss:0.5909,Accuracy：0.80
Epoch:400,Loss:0.5801,Accuracy：0.81
Epoch:500,Loss:0.5720,Accuracy：0.82
Epoch:600,Loss:0.5657,Accuracy：0.81
Epoch:700,Loss:0.5606,Accuracy：0.81
Epoch:800,Loss:0.5563,Accuracy：0.81
Epoch:900,Loss:0.5527,Accuracy：0.81
Epoch:1000,Loss:0.5496,Accuracy：0.80


The training is complete, our accuracy reached ~ 80%