# 逻辑回归

假设函数：$h(x)=\frac{1}{1+e^{-X\theta}}=p(y=1|X,\theta)=p$

参数：$\theta=(\theta_0,...,\theta_n)^{'}$

最大似然函数：$L=\prod^{m}_{i=1}p^{y^{(i)}}(1-p)^{1-y^{(i)}}$

代价函数：$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\ln(h(x))+(1-y^{(i)})\ln(1-h(x))$

梯度：$\bigtriangledown J(\theta)=\frac{1}{m}X^{'}(h(x)-y)$


In [1]:
import pandas as pd

from sklearn.linear_model import LogisticRegression

## 数据

In [2]:
train = pd.read_csv('/content/sample_data/mnist_train_small.csv')
test = pd.read_csv('/content/sample_data/mnist_test.csv')

In [3]:
train_bin_cat = train[train.iloc[:, 0].isin([0, 1])]
test_bin_cat = test[test.iloc[:, 0].isin([0, 1])]

In [4]:
X_train, y_train = train_bin_cat.iloc[:, 1:], train_bin_cat.iloc[:, 0]
X_test, y_test = test_bin_cat.iloc[:, 1:], test_bin_cat.iloc[:, 0]

## 模型

In [5]:
model = LogisticRegression()
model.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [6]:
model.score(X_test, y_test)

0.9990543735224586