# Logistic回归算法

Logistic 回归虽然名字叫回归，但是它是用来做分类的。其主要思想是: 根据现有数据对分类边界线建立回归公式，以此进行分类。
具体请参考逻辑回归.docx

主要步骤: 
- 初始化模型的参数
- 通过缩小损失函数学习到参数
- 通过学习得到的参数做出预测
- 分析结果

In [19]:
import numpy as np
import matplotlib.pyplot as plt

### step1 :获取模型数据

In [27]:

def create_data_set():
    """
    创建样本数据
    :return:
    """
    X_train = [[-0.017612, 14.053064], [-1.395634, 4.662541], [-0.752157, 6.538620], 
                [-1.322371, 7.152853], [0.423363, 11.054677],[0.406704,7.067335],
                [0.667394,12.741452],[-2.460150,6.866805],[0.569411,9.548755],
               [-1.693453,-0.557540],[1.985298,3.230619],[-1.78187,9.097953]]
    Y_train = [0,1,0,0,0,1,0,1,0,1,1,0]
    
    X_test = [[-0.346811,-1.678730], [-2.124484,2.672471], [1.217916,9.597015]]
    Y_test = [1,1,0]
    return mat(X_train), mat(Y_train).transpose() , mat(X_test), mat(Y_test).transpose() 

In [21]:
def sigmoid(inX):
    # return 1.0 / (1 + exp(-inX))

    # Tanh是Sigmoid的变形，与 sigmoid 不同的是，tanh 是0均值的。因此，实际应用中，tanh 会比 sigmoid 更好。
    return 2 * 1.0/(1+exp(-2*inX)) - 1

### step2 :初始化参数为零

In [11]:
def initialize_with_zeros(dim):
    w = np.zeros((dim, 1))
    b = 0
    return w, b

### step3:LOOP获取 根据梯度下降获取参数值等
    1.获取输入参数X
    2.计算A=sigm(w.T*X+b)=(a0,a1,....am)
    3.计算损失函数

#### 3.1.定义梯度下降的方法获取cost

In [51]:

def propagate(w, b, X, Y):
    m = X.shape[1]
    #print(w.T.shape)
    # FORWARD PROPAGATION (FROM X TO COST)
    A = sigmoid(np.dot(w.T, X)+b)    
    # compute activation
    cost = -(1.0/m)*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))                                 # compute cost

    # BACKWARD PROPAGATION (TO FIND GRAD)
    dz=A-Y
    dw = (1.0/m)*np.dot(X,dz.T)
    db = (1.0/m)*np.sum(dz)

    cost = np.squeeze(cost)
 
    grads = {"dw": dw,
             "db": db}
    return grads, cost

#### 3.2.使用梯度下降发优化w和b使得cost最小


In [57]:
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):

    costs = []

    for i in range(num_iterations):
        # Cost and gradient calculation 
        print(w.shape)
        grads, cost = propagate(w, b, X, Y)

        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        print(dw)
        print(dw.shape)
        print(db.shape)
        print(w.shape)
        # update rule (≈ 2 lines of code)
        w = w - learning_rate*dw
        b = b - learning_rate*db

        # Record the costs
        if i % 100 == 0:
            costs.append(cost)

        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs


### step4:预测结果

In [14]:

def predict(w, b, X):
 
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)

    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    A = sigmoid(np.dot(w.T, X) + b)

    for i in range(A.shape[1]):

        # Convert probabilities A[0,i] to actual predictions p[0,i]
        if A[0,i] > 0.5:
            Y_prediction[0,i] = 1
        else:
            Y_prediction[0,i] = 0


    return Y_prediction

 ### step5:合并所有的方法到一个模型

In [15]:

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):

    ### START CODE HERE ###

    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]

    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))


    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}

    return d

In [69]:
    # m个数，n特征数
    X_train, Y_train, X_test, Y_test = create_data_set()
  
 
    model(X_train, Y_train, X_test, Y_test)
    

## sklearn方式

class sklearn.linear_model.LogisticRegression(penalty='l2', 
          dual=False, tol=0.0001, C=1.0, fit_intercept=True, 
          intercept_scaling=1, class_weight=None, 
          random_state=None, solver='liblinear', max_iter=100, 
          multi_class='ovr', verbose=0, warm_start=False, n_jobs=1)
          
penalty='l2' : 字符串‘l1’或‘l2’,默认‘l2’。用来指定惩罚的基准（正则化参数）。

dual=False : 对偶或者原始方法。Dual只适用于正则化相为l2的‘liblinear’的情况，通常样本数大于特征数的情况下，默认为False。

C=1.0 : C为正则化系数λ的倒数，必须为正数，默认为1。和SVM中的C一样，值越小，代表正则化越强。

fit_intercept=True : 是否存在截距，默认存在。

intercept_scaling=1 : 仅在正则化项为‘liblinear’，且fit_intercept设置为True时有用。

。。。

LogisticRegression类的常用方法
fit(X, y, sample_weight=None)
拟合模型，用来训练LR分类器，其中X是训练样本，y是对应的标记向量
返回对象，self。 

fit_transform(X, y=None, **fit_params)
fit与transform的结合，先fit后transform。返回X_new:numpy矩阵。

predict(X)
用来预测样本，也就是分类，X是测试集。返回array。

predict_proba(X)
输出分类概率。返回每种类别的概率，按照分类类别顺序给出。如果是多分类问题，multi_class="multinomial"，则会给出样本对于每种类别的概率。
返回array-like。

score(X, y, sample_weight=None)
返回给定测试集合的平均准确率（mean accuracy），浮点型数值。
对于多个分类返回，则返回每个类别的准确率组成的哈希矩阵。

In [59]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [60]:
cancer = load_breast_cancer()
print(cancer.DESCR)

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, f

In [63]:
X = cancer.data
y = cancer.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [64]:
model = LogisticRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)



0.956140350877193

In [65]:
model2 = LogisticRegression(penalty='l1')
model2.fit(X_train, y_train)
model2.score(X_test, y_test)



0.956140350877193

In [67]:
prepro = model2.predict_proba(X_test)
print(prepro)

[[9.93780491e-01 6.21950927e-03]
 [2.69531453e-02 9.73046855e-01]
 [1.33340841e-03 9.98666592e-01]
 [1.48258938e-01 8.51741062e-01]
 [5.37251923e-05 9.99946275e-01]
 [2.24632247e-03 9.97753678e-01]
 [6.00957992e-03 9.93990420e-01]
 [9.94317364e-04 9.99005683e-01]
 [3.11987989e-02 9.68801201e-01]
 [1.30612216e-04 9.99869388e-01]
 [3.50587072e-01 6.49412928e-01]
 [1.40108350e-01 8.59891650e-01]
 [3.17754589e-03 9.96822454e-01]
 [7.47538261e-01 2.52461739e-01]
 [1.58297053e-01 8.41702947e-01]
 [9.94214371e-01 5.78562895e-03]
 [1.92716007e-02 9.80728399e-01]
 [9.99999999e-01 5.83426795e-10]
 [9.99043138e-01 9.56861542e-04]
 [1.00000000e+00 5.83130355e-13]
 [9.99978985e-01 2.10146155e-05]
 [9.30516928e-01 6.94830722e-02]
 [1.08558860e-03 9.98914411e-01]
 [8.26679714e-03 9.91733203e-01]
 [9.95566755e-01 4.43324490e-03]
 [7.17184114e-03 9.92828159e-01]
 [1.11211654e-03 9.98887883e-01]
 [8.12148287e-01 1.87851713e-01]
 [2.39105256e-03 9.97608947e-01]
 [1.00000000e+00 2.29600568e-11]
 [2.202765