# 线性回归(Linear regression)
机器学习三要素, 模型 + 策略 + 算法;
## 模型
$x$ 表示一个输入样本的特征向量, 假设共有 $n$ 个 特征, 机器学习里我们默认向量是列向量, 这里就是一个 $n$ 维列向量; $w_i$ 就表示每一个特征 $x_i$ 对应的权重,构成一个权重列向量$w$; $b$ 表示偏置量;

标量公式:
$$\hat{y}_t = \sum_i{w_i*x_i} + b \qquad (1)$$

向量公式:
$$\hat{y}_t = w^Tx + b \qquad (2)$$


## 策略(损失函数)
$y^t$ 表示第 t 个样本标签值, $\hat{y}^t$ 是模型输出的预测值. 假设共有 m 个样本.由于房价预测这种预测值是实数空间,属于是回归问题,因此这里的损失函数选择常用的均方值误差MSE(Mean Squared Error);

标量公式:
$$L = \frac{1}{2m}\sum_t{(\hat{y}^t - y^t)^2} \qquad (3)$$

向量公式:
$$L = \frac{1}{2m}(\hat{y}-y)^T(\hat{y}-y) \qquad (4)$$

$$
\begin{equation}
y
=
\begin{bmatrix}
   y^1 \\
   y^2 \\
   \vdots \\
   y^t \\
   \vdots \\
   y^m
\end{bmatrix} \quad
\hat{y}
=
\begin{bmatrix}
   \hat{y}^1 \\
   \hat{y}^2 \\
   \vdots \\
   \hat{y}^t \\
   \vdots \\
   \hat{y}^m
\end{bmatrix}
\end{equation}
$$

多样本堆叠公式:
$$L = \frac{1}{2m}(Xw+B-Y)^T(Xw+B-Y) ,Y=y\qquad (5)$$

$$
\begin{equation}
Xw+B
=
\begin{bmatrix}
   x^1_1 & x^1_2 & \cdots & x^1_i & \cdots & x^1_n \\
   x^2_1 & x^2_2 & \cdots & x^2_i & \cdots & x^2_n \\
   \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
   x^m_1 & x^m_2 & \cdots & x^m_i & \cdots & x^m_n
\end{bmatrix}
\begin{bmatrix}
   w_1 \\
   w_2 \\
   \vdots \\
   w_i \\
   \vdots \\
   w_n
\end{bmatrix}
+
\begin{bmatrix}
   b \\
   b \\
   \vdots \\
   b \\
   \vdots \\
   b
\end{bmatrix}
\end{equation}
$$

## 算法(参数学习方法)
我们的学习目标就是让损失函数最小, 而参数 w 和 b 是未知的, 就是求出使得 L 最小的 w 和 b. 我们知道在微积分中求解一个凸函数的最值一般就是在导数为 0 的地方. 而要找到导数为 0 的点, 通常我们数学上直接解方程式子就可以求出来了. 另一种方式就是梯度下降法, 这是遵循数学上沿着梯度方向下降最快的定理. 而梯度就是我们的导数了. 其实对于线性回归问题, 对于$X^TX$ 是可逆矩阵的时候,是可以直接性解出来的,不需要进行训练, 其中的 $X^T$ 表示所有样本列向量 $x^t$ 在水平方向堆叠起来的矩阵. 我们采用 BGD 进行训练. 

梯度:
$$d_{w_i} = \frac{1}{m}\sum_t{(\hat{y^t}-y^t)x_i} \qquad (5)$$

向量公式:
$$d_{w} = $$


In [20]:
import math

def sigmoid(z):
    return 1.0 / (1 + math.exp(-z))

def tanh(z):
    return (math.exp(z) - math.exp(-z)) / (math.exp(z) + math.exp(-z))

def relu(z):
    return max(0, z)

def leaky_relu(rate, z):
    return max(rate*z, z)
print(sigmoid(0), tanh(0), relu(0), leaky_relu(0.01, 0))

def d_sigmoid(z):
    return sigmoid(z) * (1 - sigmoid(z))

def d_tanh(z):
    return 1 - tanh(z)**2

def d_relu(z):
    return 0 if z <= 0 else 1
    
def d_leaky_relu(rate, z):
    return rate if z <= 0 else 1

(0.5, 0.0, 0, 0.0)


# 逻辑斯特回归(Logistic Regression)
逻辑斯特回归其实是分类算法而不是回归算法, 他和 Linear Regression 的区别就是, 在做完计算之后, 再套一层激活函数,比如 sigmoid. 

$$\sigma(z) = \frac{1}{1+e^{-z}}$$
![LR](./lr/lr-overview.png)

## 模型
$x$ 表示一个输入样本的特征向量, 假设共有 $n$ 个 特征, 机器学习里我们默认向量是列向量, 这里 $x$ 就是一个 $n$ 维列向量; $w_i$ 就表示每一个特征 $x_i$ 对应的权重,构成一个权重列向量$w$; $b$ 表示偏置量;$x_i^t$ 表示第 t 个样本的 i 分量;$a$ 表示输出值, 然后通过和阈值比较来判断属于哪一类, 一般是二分类.

标量公式:
$$z^t = \sum_i{w_i*x_i^t} + b \qquad (1)$$
$$a^t = \sigma(z^t)$$


向量公式:
$$z^t = w^Tx^t + b \qquad (2)$$
$$a^t = \sigma(z^t)$$

前向计算(forward compute):

In [21]:
import random
import math
m = 5
n = 3
b = 0.01
L = .0
z = [0]*m
a = [0]*m
w = [random.uniform(-1, 1) for _ in range(n)]
print(w)
x = [[1,2,3],
     [4,5,6],
     [7,8,9],
     [10,11,12],
     [13,14,15]]
y = [random.randint(0, 1) for _ in range(m)]
print(y)
def dot(u, v):
    n_u = len(u)
    n_v = len(v)
    assert n_u == n_v
    sum = 0
    for i in range(n_u):
        sum += u[i] * v[i]
    return sum
# 使用了双重的 python 循环
for t in range(m):
    z[t] = dot(w, x[t]) + b
    a[t] = sigmoid(z[t])
    L += -(y[t]*math.log(a[t]) + (1-y[t])*math.log(1-a[t]))
L/=m
print(L)

[-0.5252302562023774, 0.3339674909721524, 0.2687073716893833]
[1, 1, 0, 0, 0]
1.21776180878


In [22]:
import numpy as np
m = 5
n = 3
b = 0.01
L = .0
z = [0]*m
a = [0]*m
w = np.asarray(w).reshape((n,1))
assert w.shape == (n, 1)
x = np.asmatrix(x).T
assert x.shape == (n, m)
# 这种是 python 循环和 numpy 混用的垃圾代码
for t in range(m):
    z[t] = np.dot(w.T, x[:,t]) + b
    a[t] = sigmoid(z[t])
    L += -(y[t]*np.log(a[t]) + (1-y[t])*np.log(1-a[t]))
L /= m
print(L)

1.2177618087849436


In [23]:
import numpy as np

def sigmoid(z):
    return 1.0 / (1 + np.exp(-z))

def tanh(z):
    return (np.exp(z) - np.exp(-z)) / (np.exp(z) + np.exp(-z))

def relu(z):
    return max(0, z)

def leaky_relu(rate, z):
    return max(rate*z, z)

def d_sigmoid(z):
    return sigmoid(z) * (1 - sigmoid(z))

def d_tanh(z):
    return 1 - tanh(z)**2

def d_relu(z):
    dz = np.ones_like(z)
    dz[z<=0] = 0
    return dz
    
def d_leaky_relu(rate, z):
    dz = np.ones_like(z)
    dz[z<=0] = rate
    return dz

z = np.random.randn(3,4)
print z
d_relu(z)
d_leaky_relu(-2, z)

[[-0.83914449 -0.79119105 -1.17432944 -0.6069215 ]
 [-1.02534441 -0.0844406   0.93462087  1.30501676]
 [-0.34931739  1.22889803  0.25084471 -0.02991602]]


array([[-2., -2., -2., -2.],
       [-2., -2.,  1.,  1.],
       [-2.,  1.,  1., -2.]])

In [18]:
import numpy as np
m = 5
n = 3
B = 0.01
L = .0
x = [[1,2,3],
     [4,5,6],
     [7,8,9],
     [10,11,12],
     [13,14,15]]
Z = np.zeros((1,m))
A = np.zeros((1,m))
# mu, sigma = 0, 0.1
# w = np.random.normal(mu, sigma, n)
W = np.asarray(w).reshape((n,1))
assert W.shape == (n, 1)
print(W)
X = np.asmatrix(x).T
assert X.shape == (n, m)
print(X)
Y = np.asarray(y).reshape((m, 1))
assert Y.shape == (m, 1)
print(Y)

# 三行numpy 代码即可搞定前向传播 的numpy 优雅代码
Z = np.dot(W.T, X) + B
A = sigmoid(Z)
func = sigmoid
func.__name__
L = -(np.dot(np.log(A), Y) + np.dot(np.log(1-A), 1-Y))
L /= m
print(L)

[[-0.4133842 ]
 [-0.51969963]
 [ 0.76912475]]
[[ 1  4  7 10 13]
 [ 2  5  8 11 14]
 [ 3  6  9 12 15]]
[[0]
 [0]
 [0]
 [0]
 [0]]
[[0.69368283]]


## 策略(损失函数)
$y^t$ 表示第 t 个样本标签值, $a^t$ 是模型输出的预测值. 这里是分类问题,因此我们选择比较常用的交叉熵损失,也是极大似然估计损失; 注意,损失函数都是针对所有样本的评估;

标量公式:
$$L = -\frac{1}{m}\sum_t{y^t\log{a^t} + (1-y^t)\log(1-a^t)} \qquad (3)$$


## 算法(参数学习方法)
我们的学习目标就是让损失函数最小, 而参数 w 和 b 是未知的, 就是求出使得 L 最小的 w 和 b. 我们知道在微积分中求解一个凸函数的最值一般就是在导数为 0 的地方. 而要找到导数为 0 的点, 通常我们数学上采用梯度下降法, 这是遵循数学上沿着梯度方向下降最快的定理. 而梯度就是我们的导数了.这里我们采用 BGD 进行训练. 

梯度:

标量公式
$$
\begin{align*}
d_{w_i} &= \frac{1}{m}\sum_t{-(\frac{\partial L}{\partial a^t}\frac{\partial a^t}{\partial z^t}\frac{\partial z^t}{\partial w_i})} & \\
&= \frac{1}{m}\sum_t{-(\frac{y^t}{a^t} - \frac{1-y^t}{1-a^t})}a^t(1-a^t)x_i^t & \\
&= \frac{1}{m}\sum_t(a^t-y^t)x_i^t \qquad (5)
\end{align*}
$$

$$
\begin{align*}
d_{b} &= \frac{1}{m}\sum_t{-(\frac{\partial L}{\partial a^t}\frac{\partial a^t}{\partial z^t}\frac{\partial z^t}{\partial b})} & \\
&= \frac{1}{m}\sum_t{-(\frac{y^t}{a^t} - \frac{1-y^t}{1-a^t})}a^t(1-a^t) & \\
&= \frac{1}{m}\sum_t(a^t-y^t) \qquad (6)
\end{align*}
$$



In [9]:
import random
import math
m = 5
n = 3
b = 0.01
L = .0
z = [0]*m
a = [0]*m
d_w = [0]*n
d_b = random.uniform(0,1)
print(w)
x = [[1,2,3],
     [4,5,6],
     [7,8,9],
     [10,11,12],
     [13,14,15]]
print(y)
def dot(u, v):
    n_u = len(u)
    n_v = len(v)
    assert n_u == n_v
    sum = 0
    for i in range(n_u):
        sum += u[i] * v[i]
    return sum
# 使用了双重的 python 循环
for t in range(m):
    z[t] = dot(w, x[t]) + b
    a[t] = sigmoid(z[t])
    L += -(y[t]*math.log(a[t]) + (1-y[t])*math.log(1-a[t]))
    for i in range(n):
        d_w[i] += (a[t]-y[t])*x[t][i]
    d_b += (a[t]-y[t])
L /= m
for i in range(n):
    d_w[i] /= m
d_b /= m
print(L)
print(d_w)
print(d_b)

[[ 0.55695896]
 [-0.74966562]
 [ 0.31394363]]
[1, 1, 0, 0, 0]
1.08007605269212
[array([ 4.13352667]), array([ 4.40021933]), array([ 4.666912])]
[ 0.2884678]


向量公式:
$$
\begin{align*}
d_{w} &= \frac{1}{m}\sum_t{-(\frac{\partial L}{\partial a^t}\frac{\partial a^t}{\partial z^t}\frac{\partial z^t}{\partial w})} & \\
&= \frac{1}{m}\sum_t{-(\frac{y^t}{a^t} - \frac{1-y^t}{1-a^t})}a^t(1-a^t)x^t & \\
&= \frac{1}{m}\sum_t(a^t-y^t)x^t \qquad (7)
\end{align*}
$$

## 多样本的矩阵表示公式

### 模型
多样本堆叠公式:
$$
Z = W^TX + B \qquad (1)
$$

$$
\begin{equation}
W^TX+B
=
\begin{bmatrix}
   w_1 & w_2 & \cdots & w_n
\end{bmatrix}
\begin{bmatrix}
   x^1_1 & x^2_1 & \cdots & x^i_1 & \cdots & x^m_1 \\
   x^1_2 & x^2_2 & \cdots & x^i_2 & \cdots & x^m_2 \\
   \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
   x^1_n & x^2_n & \cdots & x^i_n & \cdots & x^m_n
\end{bmatrix}
+
\begin{bmatrix}
   b & b & \cdots & b
\end{bmatrix}
\end{equation}
$$

$$
\begin{align*}
A &=
\begin{bmatrix}
  a^1 & a^2 & \cdots & a^m  
\end{bmatrix} & \\
&=
\begin{bmatrix}
  \sigma(z^1) & \sigma(z^2) & \cdots & \sigma(z^m) 
\end{bmatrix} & \\
&=
\sigma(Z)
\end{align*}
$$

$X$ 是由 m 个样本输入列向量 $x^i$ 水平堆叠得到的矩阵. 注意这里的 $Z$ $A$ $B$ 都是行向量的.

使用向量堆叠, 现在的前向传播只需要一行代码,完全不需要显式的循环;

### 策略
多样本堆叠矩阵公式:
$$
L = -\frac{1}{m}{\log(A)*Y+ \log(1-A)*(1-Y)} \qquad (2)
$$

$$
\begin{equation}
Y
=
\begin{bmatrix}
   y^1 \\
   y^2 \\
   \vdots \\
   y^t \\
   \vdots \\
   y^m
\end{bmatrix} \quad
A
=
\begin{bmatrix}
   a^1 & a^2 & \cdots & a^t & \cdots a^m
\end{bmatrix}
\end{equation}
$$

### 算法(梯度更新)
多样本堆叠矩阵公式:

$$
\begin{align*}
d_{a} &= \frac{\partial L}{\partial A} & \\
&= -\frac{1}{m}(\frac{Y^T}{A}-\frac{1-Y^T}{1-A}) & \\
&= \frac{1}{m}{\frac{A-Y^T}{A(1-A)}} \qquad (3)
\end{align*}
$$
其中的 $A(1-A)$ 以及 $(A-Y^T)/A(1-A)$ 都是element-wise 的矩阵运算;

$$
\begin{align*}
d_{z} &= \frac{\partial L}{\partial A}\frac{\partial A}{\partial Z} & \\
&= d_{a}A(1-A) & \\
&= \frac{1}{m}(A-Y^T) \qquad (4)
\end{align*}
$$
消除后可以化简;

$$
\begin{align*}
d_{w} &= \frac{\partial L}{\partial A}\frac{\partial A}{\partial Z}\frac{\partial Z}{\partial W} & \\
&= Xd_z^T \qquad (5)
\end{align*}
$$
最后,为了保持得到的倒导数矩阵的维度和$W$ 保持一致,我们必须对链式法则求导的结果运算进行转置和交换,来保证维度的一致性;


In [19]:
import numpy as np

m = 5
n = 3
B = 0.01
L = .0
X = [[1,2,3],
     [4,5,6],
     [7,8,9],
     [10,11,12],
     [13,14,15]]
# mu, sigma = 0, 0.1
# w = np.random.normal(mu, sigma, n)
W = np.asarray(w).reshape((n,1))
assert W.shape == (n, 1)
print(W)
X = np.asmatrix(X).T
assert X.shape == (n, m)
print(X)
Y = np.asarray(y).reshape((m, 1))
assert Y.shape == (m, 1)
print(Y)

# 四行numpy 代码即可搞定前向传播 + 导数更新 的numpy 优雅代码
Z = np.dot(W.T, X) + B
A = sigmoid(Z)
print(A.shape)
L = -(np.dot(np.log(A), Y) + np.dot(np.log(1-A), 1-Y))
d_z = A - Y.T
d_w = np.dot(X, d_z.T)
d_b = np.sum(d_z.T)
L /= m
d_w /= m
d_b /= m
print(L)
print(d_w)
print(d_b)

[[-0.4133842 ]
 [-0.51969963]
 [ 0.76912475]]
[[ 1  4  7 10 13]
 [ 2  5  8 11 14]
 [ 3  6  9 12 15]]
[[0]
 [0]
 [0]
 [0]
 [0]]
(1, 5)
[[0.69368283]]
[[2.62430403]
 [3.09770262]
 [3.57110121]]
0.4733985918219136


# 浅层神经网络(shallow neural network)
单隐藏层的神经网络可以看成是以逻辑斯特回归为基础神经节点, 再多加一层再次向前传播而已. 隐藏层的每一个单元就可以看成是一个逻辑斯特回归,
![nn1](./nn/nn1.png)
你可以把每个隐藏层节点都是一个逻辑斯特回归.
![nn2](./nn/nn2.png)
下面用公式来表示一个前向传播的过程:
![nn3](./nn/nn3.png)
![nn4](./nn/nn4.png)
![nn-overview](./nn/nn-overview.png)

In [16]:
import numpy as np
m = 5
n_0 = 3
n_1 = 4
n_2 = 1
B1 = 0.01
B2 = 0.01
L = .0
x = [[1,2,3],
     [4,5,6],
     [7,8,9],
     [10,11,12],
     [13,14,15]]
mu, sigma = 0, 0.1
W1 = np.random.normal(mu, sigma, (n_1, n_0))
W2=  np.random.normal(mu, sigma, (n_2, n_1))
assert W1.shape == (n_1, n_0)
print(W1)
assert W2.shape == (n_2, n_1)
print(W2)
X = np.asarray(x).T
assert X.shape == (n_0, m)
print(X)
Y = np.random.randint(0,2, (m, 1))
assert Y.shape == (m, 1)
print(Y)

# forword
Z1 = np.dot(W1, X) + B1
A1 = sigmoid(Z1)
Z2 = np.dot(W2, A1) + B2
A2 = sigmoid(Z2)
L = -(np.dot(np.log(A2), Y) + np.dot(np.log(1-A2), 1-Y))
L/=m
print(L)

# backword

print(A2 * (1 - A2))
# d_A2 = 1/m *(A2-Y.T) / (A2*(1-A2)) 为了统一,这个 1/m 都放到最后除
d_A2 = (A2-Y.T) / (A2*(1-A2))
d_Z2 = d_A2 * d_sigmoid(Z2)
d_W2 = np.dot(d_Z2, A1.T)
d_B2 = d_Z2
d_A1 = np.dot(W2.T, d_Z2)
d_Z1 = d_A1 * d_sigmoid(Z1)
d_W1 = np.dot(d_Z1, X.T)
d_B1 = d_Z1

W2 -= 1/m * d_W2
B2 -= 1/m * d_B2
W1 -= 1/m * d_W1
B1 -= 1/m * d_B1

print(W2)
print(W1)


[[-0.04150382  0.10106335  0.00486217]
 [ 0.09194814 -0.16178741  0.07048542]
 [-0.0677176   0.17172843 -0.08451493]
 [-0.12691324 -0.02429931  0.17965536]]
[[-0.15748299 -0.24904317 -0.02985743 -0.06977265]]
[[ 1  4  7 10 13]
 [ 2  5  8 11 14]
 [ 3  6  9 12 15]]
[[0]
 [0]
 [0]
 [1]
 [0]]
[[ 0.62201688]]
[[ 0.24593593  0.24563482  0.24533314  0.2450359   0.24474764]]
[[-0.29576876 -0.36447309 -0.15138725 -0.21224172]]
[[-0.00682047  0.14438999  0.05683207]
 [ 0.15479425 -0.08451031  0.16219352]
 [-0.06026754  0.18090151 -0.07361884]
 [-0.1108408  -0.00440934  0.20336287]]


'sigmoid'

In [1]:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9, 10, 11, 12])
a = a.reshape((6,2))

In [2]:
b = np.array_split(a, 3)
b

[array([[1, 2],
        [3, 4]]), array([[5, 6],
        [7, 8]]), array([[ 9, 10],
        [11, 12]])]

In [22]:
c = np.concatenate((*b[0:2], *b[1:]))

In [26]:
a = np.array([[1, 2, 3, 4], 
                  [5, 6, 7, 8], 
                  [9, 10, 11, 12], 
                  [13, 14, 15, 16]])

In [69]:
y = np.array([0,0,1,1])
index = [i for i in range(len(y))]
s = a[index, y]
s


array([ 1,  5, 10, 14])

In [70]:
np.take(a, [1,2,3], axis=1)

array([[ 2,  3,  4],
       [ 6,  7,  8],
       [10, 11, 12],
       [14, 15, 16]])

In [71]:
t = np.maximum(0, a - s.reshape((4,1)) + 1)
t

array([[1, 2, 3, 4],
       [1, 2, 3, 4],
       [0, 1, 2, 3],
       [0, 1, 2, 3]])

In [72]:
t[index, y] = 0
t

array([[0, 2, 3, 4],
       [0, 2, 3, 4],
       [0, 0, 2, 3],
       [0, 0, 2, 3]])

In [73]:
x = np.array([6,7,8,9]).reshape((4,1))
x + np.zeros((4,4))
x

array([[6],
       [7],
       [8],
       [9]])

In [74]:
repeat = np.repeat(x, 4, axis=1)
repeat

array([[6, 6, 6, 6],
       [7, 7, 7, 7],
       [8, 8, 8, 8],
       [9, 9, 9, 9]])

In [80]:
w = t==0
np.sum(w, axis=1)

array([1, 1, 2, 2])

In [2]:
import json
import itertools
import random
from pprint import pprint
from ast import literal_eval as make_tuple
class JM:
    def __init__(self, do_shuffle=False):
        self.do_shuffle = do_shuffle
        
    def _parse_list(self, v): # for mlp
        
        for idx, vv in enumerate(v):
            if isinstance(vv, str) and vv.startswith('('):
                v[idx] = make_tuple(vv)
        return v

    def _parse_tasks(self, fn):
        with open(fn) as fp:
            tmp = json.load(fp)

        def get_par_comb(tmp, clf_name):
            all_par_vals = list(itertools.product(*[self._parse_list(vv)
                                                    for v in tmp['classifiers'][clf_name]
                                                    for vv in v.values()]))
            all_par_name = [vv for v in tmp['classifiers'][clf_name] for vv in v.keys()]
            return [{all_par_name[idx]: vv for idx, vv in enumerate(v)} for v in all_par_vals]

        result = [{v: vv} for v in tmp['classifiers'] for vv in get_par_comb(tmp, v)]
        for v in result:
            for vv in v.values():
                vv.update(tmp['common'])
        if self.do_shuffle:
            random.shuffle(result)
        return result
    
BASELINE_PATH = './baselines.json'
jm = JM()
parsed_jobs = jm._parse_tasks(BASELINE_PATH)
pprint(parsed_jobs)
print(len(parsed_jobs))

[{u'LinearSVC': {u'C': 1.0,
                 u'loss': u'hinge',
                 u'multi_class': u'ovr',
                 u'penalty': u'l1'}},
 {u'LinearSVC': {u'C': 1.0,
                 u'loss': u'hinge',
                 u'multi_class': u'crammer_singer',
                 u'penalty': u'l1'}},
 {u'LinearSVC': {u'C': 10.0,
                 u'loss': u'hinge',
                 u'multi_class': u'ovr',
                 u'penalty': u'l1'}},
 {u'LinearSVC': {u'C': 10.0,
                 u'loss': u'hinge',
                 u'multi_class': u'crammer_singer',
                 u'penalty': u'l1'}},
 {u'LinearSVC': {u'C': 100.0,
                 u'loss': u'hinge',
                 u'multi_class': u'ovr',
                 u'penalty': u'l1'}},
 {u'LinearSVC': {u'C': 100.0,
                 u'loss': u'hinge',
                 u'multi_class': u'crammer_singer',
                 u'penalty': u'l1'}},
 {u'LinearSVC': {u'C': 1.0,
                 u'loss': u'hinge',
                 u'multi_class': u'ovr'

In [101]:
d  = {"1":'asd', "2":"sda","3":"sad"}
for k in d:
    print(k)

1
2
3
