## 오차역전파 (Backpropagation)


### 오차역전파 알고리즘
- 학습 데이터로 정방향(forward) 연산을 통해 손실함수 값(loss)을 구함

- 각 layer별로 역전파학습을 위해 중간값을 저장

- 손실함수를 학습 파라미터(가중치, 편향)로 미분하여  
  마지막 layer로부터 앞으로 하나씩 연쇄법칙을 이용하여 미분
  각 layer를 통과할 때마다 저장된 값을 이용

- 오류(error)를 전달하면서 학습 파라미터를 조금씩 갱신


### 오차역전파 학습의 특징
- 손실함수를 통한 평가를 한 번만 하고, 연쇄법칙을 이용한 미분을 활용하기 때문에  
  학습 소요시간이 매우 단축!

- 미분을 위한 중간값을 모두 저장하기 때문에 메모리를 많이 사용


### 신경망 학습에 있어서 미분가능의 중요성
- 경사하강법(Gradient Descent)에서 손실 함수(cost function)의 최소값,  
  즉, 최적값을 찾기 위한 방법으로 미분을 활용

- 미분을 통해 손실 함수의 학습 매개변수(trainable parameter)를 갱신하여  
  모델의 가중치의 최적값을 찾는 과정

![](https://i.pinimg.com/originals/5d/13/20/5d1320c7b672710834e63b95a7c1037b.png)

<sub>출처: https://www.pinterest.co.kr/pin/424816177350692379/</sub>

### 합성함수의 미분 (연쇄법칙, chain rule)

## $\qquad \frac{d}{dx} [f(g(x))] = f^\prime(g(x))g^\prime(x)$  
 

- 여러 개 연속으로 사용가능  
  ## $ \quad \frac{\partial f}{\partial x} = \frac{\partial f}{\partial u} \times \frac{\partial u}{\partial m} \times \frac{\partial m}{\partial n} \times \ ... \ \frac{\partial l}{\partial k} \times \frac{\partial k}{\partial g} \times \frac{\partial g}{\partial x} $
- 각각에 대해 편미분 적용가능

![](https://cdn-media-1.freecodecamp.org/images/1*_KMMFvRP5X9kC59brI0ykw.png)
<sub>출처: https://www.freecodecamp.org/news/demystifying-gradient-descent-and-backpropagation-via-logistic-regression-based-image-classification-9b5526c2ed46/</sub>

- **오차역전파의 직관적 이해**
  - 학습을 진행하면서, 즉 손실함수의 최소값(minimum)을 찾아가는 과정에서 가중치 또는 편향의 변화에 따라 얼마나 영향을 받는지 알 수 있음



#### 합성함수 미분(chain rule) 예제

![](https://miro.medium.com/max/1000/1*azqHvbrNsZ8AIZ7H75tbIQ.jpeg)

<sub>출처: https://medium.com/spidernitt/breaking-down-neural-networks-an-intuitive-approach-to-backpropagation-3b2ff958794c</sub>

  #### $\quad a=-1, \ b=3, \ c=4$,
  #### $\quad x = a + b, \ y = b + c, \ f = x * y \ 일 때$    



  ### $\quad \begin{matrix}\frac{\partial f}{\partial x} &=& y\ + \ x \ \frac{\partial y}{\partial x} \\ &=& (b \ + \ c) \ + \ (a \ +\ b)\ \times \ 0 \\  &=& 7 \end{matrix}$

  ### $\quad \begin{matrix}\frac{\partial f}{\partial y} &=& x\ + \ \frac{\partial x}{\partial y} \ y \\ &=& (a \ + \ b) \ + \ 0 \times (b \ +\ c) \\ &=& 2 \end{matrix}$

   <br>

  ### $ \quad \begin{matrix} \frac{\partial x}{\partial a} &=& 1 \ + \ a \ \frac{\partial b}{\partial a} \\ &=& 1 \end{matrix} $
  ### $ \quad \begin{matrix} \frac{\partial y}{\partial c} &=& \frac{\partial b}{\partial c}\ + 1 \\ &=& 1 \end{matrix} $
  
  <br>

  ### $ \quad \begin{matrix} \frac{\partial f}{\partial a} &=& \frac{\partial f}{\partial x} \times \frac{\partial x}{\partial a} \\ &=& y \times 1 \\  &=& 7 \times 1 = 7 \\ &=& 7  \end{matrix} $
    
  ### $ \quad \begin{matrix} \frac{\partial f}{\partial b} \\ &=& \frac{\partial x}{\partial b} \ y \ + \ x \ \frac{\partial y}{\partial b}  \\ &=& 1 \times 7 + 2 \times 1  \\ &=& 9 \end{matrix} $
  

  
  

### 덧셈, 곱셈 계층의 역전파
- 위 예제를 통해 아래 사항을 알 수 있음

  #### 1. $\quad z = x + y$ 일 때,
  ## $\frac{\partial z}{\partial x} = 1, \frac{\partial z}{\partial y} = 1 $

  #### 2. $\quad t = xy$ 일 때,
  ## $\frac{\partial t}{\partial x} = y, \frac{\partial t}{\partial y} = x$


In [None]:
from sklearn. n

In [1]:
class Mul:
    def __init__(self):
        self.x = None
        self.y = None

    def forward(self, x, y):
        self.x = x
        self.y = y
        result = x * y
        return result

    def backward(self, dresult):
        dx = dresult * self.y
        dy = dresult * self.x
        return dx, dy


In [2]:
class Add:
    def __init__(self):
        self.x = None
        self.y = None

    def forward(self, x, y):
        self.x = x
        self.y = y
        result = x + y
        return result

    def backward(self, dresult):
        dx = dresult * 1
        dy = dresult * 1
        return dx, dy


In [3]:
a, b, c = -1, 3, 4
x = Add()
y = Add()
f = Mul()

In [4]:
x_result = x.forward(a, b)
y_result = y.forward(b, c)

print(x_result)
print(y_result)
print(f.forward(x_result, y_result))

2
7
14


In [7]:
dresult = 1
dx_mul, dy_mul = f.backward(dresult)

da_add, db_add_1 = x.backward(dx_mul)
db_add_2, dc_add = y.backward(dy_mul)

print(dx_mul, dy_mul)
print(da_add)
print(db_add_1 + db_add_2)
print(dc_add)

7 2
7
9
2


![](https://miro.medium.com/max/2000/1*U3mVDYuvnaLhJzIFw_d5qQ.png)
<sub>출처: https://medium.com/spidernitt/breaking-down-neural-networks-an-intuitive-approach-to-backpropagation-3b2ff958794c</sub>

### 활성화 함수(Activation)에서의 역전파

#### 시그모이드(Sigmoid) 함수

![](https://media.geeksforgeeks.org/wp-content/uploads/20190911181329/Screenshot-2019-09-11-18.05.46.png)

<sub>출처: https://www.geeksforgeeks.org/implement-sigmoid-function-using-numpy/</sub>

- 수식 
  # $\quad y = \frac{1}{1 + e^{-x}} $일 때,

  ## $\quad \begin{matrix}y' &=& (\frac{1}{1 + e^{-x}})' \\ &=& \frac{-1}{(1 + e^{-x})^2}\ \times \ (-e^{-x}) \\ &=& \frac{1}{1 + e^{-x}} \ \times \ \frac{e^{-x}}{1 + e^{-x}} \\ &=& \frac{1}{1 + e^{-x}} \ \times \ (1 - \frac{1}{1 + e^{-x}}) \\ &=& y\ (1\ - \ y)\end{matrix}$

In [6]:
class Sigmoid:
    def __init__(self):
        self.out = None

    def forward(self, x):
        out = 1 / (1 + exp(-x))
        return out

    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.dout


#### ReLU 함수

![](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2018/10/Line-Plot-of-Rectified-Linear-Activation-for-Negative-and-Positive-Inputs.png)

<sub>출처: https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/</sub>


- 수식  

  ### $\qquad y= \begin{cases} x & (x \ge 0)  \\ 0 & (x < 0) \end{cases}$ 일 때,

  <br>

  ### $\qquad \frac{\partial y}{\partial x}= \begin{cases} 1 & (x \ge 0)  \\ 0 & (x < 0) \end{cases}$

In [7]:
class ReLU:
    def __init__(self):
        self.out = None

    def forward(self, x):
        self.mask = x < 0
        out = x.copy()
        out[x < 0] = 0
        return out

    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout
        return dx


### 행렬 연산에 대한 역전파

# $\qquad Y = X \bullet W + B$

#### 순전파(forward)
  
  - 형상(shape)을 맞춰줘야함
  - 앞서 봤던 곱셈, 덧셈 계층을 합친 형태

In [8]:
import numpy as np

X = np.random.rand(3)
W = np.random.rand(3, 2)
B = np.random.rand(2)

print(X.shape)
print(W.shape)
print(B.shape)

(3,)
(3, 2)
(2,)


In [9]:
Y = np.dot(X, W) + B
print(Y.shape)

(2,)


#### 역전파(1)

##  $\  Y = X \bullet W$
- $X :\ \ (2,\ )$

- $W :\ \ (2,\ 3)$

- $X \bullet W :\ \ (3,\ )$

- $\frac{\partial L}{\partial Y} :\ \ (3,\ )$

- $\frac{\partial L}{\partial X} = \frac{\partial L}{\partial Y}\bullet W^T ,\ (2,\ )$

- $\frac{\partial L}{\partial W} = X^T \bullet \frac{\partial L}{\partial Y} ,\ (2,\ 3)$



In [10]:
X = np.random.randn(2)
W = np.random.randn(2, 3)
Y = np.dot(X, W)

print("X\n{}".format(X))
print("W\n{}".format(W))
print("Y\n{}".format(Y))


X
[-0.23442751 -0.72206733]
W
[[-0.391033    0.32149019 -0.57136943]
 [-1.80593175 -0.50900601  1.03459522]]
Y
[ 1.39567321  0.29217046 -0.6131027 ]


In [11]:
dL_dY = np.random.randn(3)
dL_dX = np.dot(dL_dY, W.T)
dL_dW = np.dot(X.reshape(-1, 1), dL_dY.reshape(1, -1))

print("dL_dY\n{}".format(dL_dY))
print("dL_dX\n{}".format(dL_dX))
print("dL_dW\n{}".format(dL_dW))


dL_dY
[-1.51761032 -2.14494256 -0.04283224]
dL_dX
[-0.07166926  3.78817528]
dL_dW
[[0.35576961 0.50283355 0.01004105]
 [1.09581683 1.54879295 0.03092776]]


#### 역전파(2)

## $\ (2)\  Y = X \bullet W + B$
- $X, W$는 위와 동일

- $B: \ (3, )$

- $\frac{\partial L}{\partial B} = \frac{\partial L}{\partial Y}, \ (3,\ )$

In [12]:
X = np.random.randn(2)
W = np.random.randn(2, 3)
B = np.random.randn(3)
Y = np.dot(X, W) + B
print(Y)


[ 0.19460719  3.4046792  -1.89300541]


In [13]:
dL_dY = np.random.randn(3)
dL_dX = np.dot(dL_dY, W.T)
dL_dW = np.dot(X.reshape(-1, 1), dL_dY.reshape(1, -1))
dL_dB = dL_dY

print("dL_dY\n{}".format(dL_dY))
print("dL_dX\n{}".format(dL_dX))
print("dL_dW\n{}".format(dL_dW))
print("dL_dB\n{}".format(dL_dB))


dL_dY
[ 2.21233548 -0.5613641  -0.60824735]
dL_dX
[1.21140224 1.90864567]
dL_dW
[[ 3.31016834 -0.83993123 -0.91007947]
 [-2.23487957  0.5670845   0.6144455 ]]
dL_dB
[ 2.21233548 -0.5613641  -0.60824735]


#### 배치용 행렬 내적 계층
- N개의 데이터에 대해,  
# $\qquad Y = X \bullet W + B$

  - $X : \quad  (N,\ 3)$

  - $W : \quad  (3,\ 2)$

  - $B : \quad  (2,\ )$
  

In [14]:
X = np.random.rand(4, 3)
W = np.random.rand(3, 2)
B = np.random.rand(2)

print(X.shape)
print(W.shape)
print(B.shape)

print("X\n{}".format(X))
print("W\n{}".format(W))
print("Y\n{}".format(Y))


(4, 3)
(3, 2)
(2,)
X
[[0.22438278 0.53096824 0.38574017]
 [0.28256197 0.37497114 0.82503975]
 [0.16163145 0.26934977 0.51464912]
 [0.87995791 0.74512737 0.67521338]]
W
[[0.44427255 0.3275227 ]
 [0.71547745 0.83075951]
 [0.22690339 0.74021256]]
Y
[ 0.19460719  3.4046792  -1.89300541]


In [15]:
Y = np.dot(X, W) + B

print(Y.shape)
print("Y\n{}".format(Y))


(4, 2)
Y
[[0.86906879 0.96754027]
 [0.88298237 1.18217426]
 [0.68325786 0.82506577]
 [1.3792313  1.57444243]]


In [16]:
dL_dY = np.random.randn(4, 2)
dL_dX = np.dot(dL_dY, W.T)
dL_dW = np.dot(X.T, dL_dY)
dL_dB = np.sum(dL_dY, axis=0)

print("dL_dY\n{}".format(dL_dY))
print("dL_dX\n{}".format(dL_dX))
print("dL_dW\n{}".format(dL_dW))
print("dL_dB\n{}".format(dL_dB))


dL_dY
[[-0.3746253   0.89502433]
 [ 0.36901383  0.01416156]
 [ 0.39550306  1.3322131 ]
 [-0.12463498 -0.39932581]]
dL_dX
[[ 0.12670505  0.47551402  0.5775045 ]
 [ 0.16858095  0.27578593  0.09421306]
 [ 0.61204118  1.38972222  1.07586186]
 [-0.18616017 -0.42091723 -0.32386608]]
dL_dW
[[-0.025538    0.06876719]
 [-0.04688488  0.54182237]
 [ 0.27933315  0.77292286]]
dL_dB
[0.26525661 1.84207318]


In [17]:
class Layer:
    def __init__(self):
        self.W = np.random.randn(3, 2)
        self.b = np.random.randn(2)
        self.x = None
        self.dW = None
        self.db = None

    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b
        return out

    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0)
        return dx

In [18]:
np.random.seed(111)

layer = Layer()

In [19]:
X = np.random.rand(2, 3)
Y = layer.forward(X)

print(Y)

[[-0.60469438 -0.83761226]
 [-0.7345356  -0.5993055 ]]


In [20]:
dout = Y
dout_dx = layer.backward(dout)

print(dout_dx)

[[ 0.3637152  -0.60728509  0.86104877]
 [ 0.60252002 -0.88628947  0.85381569]]


### MNIST 분류 with 역전파


#### Modules Import

In [21]:
import numpy as np
import matplotlib.pyplot as plt

plt.style.use("seaborn-v0_8-whitegrid")
from collections import OrderedDict
import tensorflow as tf




#### 데이터 로드

In [135]:
np.random.seed(42)
mnist = tf.keras.datasets.mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
num_classes = 10


#### 데이터 전처리

In [136]:
X_train, X_test = X_train.reshape(-1, 28 * 28).astype(np.float32), X_test.reshape(
    -1, 28 * 28
).astype(np.float32)

X_train /= 0.255
X_test /= 0.255

y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]


In [137]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(60000, 784)
(60000, 10)
(10000, 784)
(10000, 10)


#### Hyper Parameters

In [138]:
epochs = 1000
learning_rate = 1e-3
batch_size = 100
train_size = X_train.shape[0]

#### Util Functions

In [139]:
def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T

    x = x - np.max(x)
    return np.exp(x) / np.sum(np.exp(x))
    return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)


def mean_squared_error(pred_y, true_y):
    return np.mean(np.square(pred_y - true_y))


def cross_entropy_error(pred_y, true_y):
    if pred_y.ndim == 1:
        true_y = true_y.reshape(1, true_y.size)
        pred_y = pred_y.reshape(1, pred_y.size)

    if true_y.size == pred_y.size:
        true_y = true_y.argmax(axis=1)

    batch_size = pred_y.shape[0]
    return (
        -np.mean(
            np.log(pred_y[np.arange(batch_size), true_y[np.arange(batch_size)]]) + 1e-7
        )
        / batch_size
    )


def softmax_loss(X, true_y):
    pred_y = softmax(X)
    return cross_entropy_error(pred_y, true_y)


#### Util Classes

##### ReLU

In [140]:
class ReLU:
    def __init__(self):
        self.out = None

    def forward(self, x):
        self.mask = x < 0
        out = x.copy()
        out[x < 0] = 0
        return out

    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout
        return dx


##### Sigmoid

In [141]:
class Sigmoid:
    def __init__(self):
        self.out = None

    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        return out

    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out


##### Layer

In [142]:
class Layer:
    def __init__(self, W, b):
        self.W = W
        self.b = b

        self.x = None
        self.origin_x_shape = None

        self.dL_dW = None
        self.dL_db = None

    def forward(self, x):
        self.origin_x_shape = x.shape

        x = x.reshape(x.shape[0], -1)
        self.x = x
        out = np.dot(self.x, self.W) + self.b

        return out

    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dL_dW = np.dot(self.x.T, dout)
        self.dL_db = np.sum(dout, axis=0)
        dx = dx.reshape(*self.origin_x_shape)
        return dx

#### Softmax

In [143]:
class Softmax:
    def __init__(self):
        self.loss = None
        self.y = None
        self.t = None

    def forward(self, x, t):
        self.t = t
        self.y = softmax(x)
        self.loss = cross_entropy_error(self.y, self.t)
        return self.loss

    def backward(self, dout=1):
        batch_size = self.t.shape[0]

        if self.t.size == self.y.size:
            dx = (self.y - self.t) / batch_size
        else:
            dx = self.y.copy()
            dx[np.arange(batch_size), self.t] -= 1
            dx = dx / batch_size
        return dx

In [144]:
class MyModel:
    def __init__(self, input_size, hidden_size_list, output_size, activation="relu"):
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_size_list = hidden_size_list
        self.hidden_layer_num = len(hidden_size_list)
        self.params = {}

        self.__init_weights(activation)

        activation_layer = {"sigmoid": Sigmoid, "relu": ReLU}
        self.layers = OrderedDict()
        for idx in range(1, self.hidden_layer_num + 1):
            self.layers["Layer" + str(idx)] = Layer(
                self.params["W" + str(idx)], self.params["b" + str(idx)]
            )
            self.layers["Activation_function" + str(idx)] = activation_layer[
                activation
            ]()

        idx = self.hidden_layer_num + 1

        self.layers["Layer" + str(idx)] = Layer(
            self.params["W" + str(idx)], self.params["b" + str(idx)]
        )

        self.last_layer = Softmax()

    def __init_weights(self, activation):
        weight_std = None
        all_size_list = [self.input_size] + self.hidden_size_list + [self.output_size]
        for idx in range(1, len(all_size_list)):
            if activation.lower() == "relu":
                weight_std = np.sqrt(2.0 / self.input_size)
            elif activation.lower() == "sigmoid":
                weight_std = np.sqrt(1.0 / self.input_size)

            self.params["W" + str(idx)] = weight_std * np.random.randn(
                all_size_list[idx - 1], all_size_list[idx]
            )
            self.params["b" + str(idx)] = np.random.randn(all_size_list[idx])

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        return x

    def loss(self, x, true_y):
        pred_y = self.predict(x)

        return self.last_layer.forward(pred_y, true_y)

    def accuracy(self, x, true_y):
        pred_y = self.predict(x)
        pred_y = np.argmax(pred_y, axis=1)

        if true_y.ndim != 1:
            true_y = np.argmax(true_y, axis=1)
        accuracy = np.sum(pred_y == true_y) / float(x.shape[0])
        return accuracy

    def gradient(self, x, t):
        self.loss(x, t)

        dout = 1
        dout = self.last_layer.backward(dout)

        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        grads = {}
        for idx in range(1, self.hidden_layer_num + 2):
            grads["W" + str(idx)] = self.layers["Layer" + str(idx)].dL_dW
            grads["b" + str(idx)] = self.layers["Layer" + str(idx)].dL_db
        return grads


#### 모델 생성 및 학습

In [145]:
model = MyModel(28 * 28, [100, 64, 32], 10, activation="relu")

In [146]:
train_lost_list = []
train_acc_list = []
test_acc_list = []


In [147]:
for epoch in range(epochs):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = X_train[batch_mask]
    y_batch = y_train[batch_mask]

    grad = model.gradient(x_batch, y_batch)

    for key in model.params.keys():
        model.params[key] -= learning_rate * grad[key]

    loss = model.loss(x_batch, y_batch)
    train_lost_list.append(loss)

    if epoch % 50 == 0:
        train_acc = model.accuracy(X_train, y_train)
        test_acc = model.accuracy(X_test, y_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(
            f"Epoch: {epoch + 1} Train Accuracy: {train_acc:.2f} Test Accuracy: {test_acc:.2f}"
        )


Epoch: 1 Train Accuracy: 0.13 Test Accuracy: 0.14
Epoch: 51 Train Accuracy: 0.62 Test Accuracy: 0.64
Epoch: 101 Train Accuracy: 0.71 Test Accuracy: 0.72
Epoch: 151 Train Accuracy: 0.76 Test Accuracy: 0.77
Epoch: 201 Train Accuracy: 0.79 Test Accuracy: 0.80
Epoch: 251 Train Accuracy: 0.80 Test Accuracy: 0.81
Epoch: 301 Train Accuracy: 0.82 Test Accuracy: 0.83
Epoch: 351 Train Accuracy: 0.83 Test Accuracy: 0.84
Epoch: 401 Train Accuracy: 0.84 Test Accuracy: 0.85
Epoch: 451 Train Accuracy: 0.85 Test Accuracy: 0.85
Epoch: 501 Train Accuracy: 0.86 Test Accuracy: 0.86
Epoch: 551 Train Accuracy: 0.86 Test Accuracy: 0.86
Epoch: 601 Train Accuracy: 0.87 Test Accuracy: 0.87
Epoch: 651 Train Accuracy: 0.87 Test Accuracy: 0.87
Epoch: 701 Train Accuracy: 0.87 Test Accuracy: 0.87
Epoch: 751 Train Accuracy: 0.88 Test Accuracy: 0.88
Epoch: 801 Train Accuracy: 0.88 Test Accuracy: 0.88
Epoch: 851 Train Accuracy: 0.88 Test Accuracy: 0.88
Epoch: 901 Train Accuracy: 0.89 Test Accuracy: 0.89
Epoch: 951 Trai

In [148]:
model.params


{'W1': array([[ 0.02508785, -0.0069834 ,  0.03271321, ...,  0.01318528,
          0.00025827, -0.01184844],
        [-0.07148702, -0.0212458 , -0.0173097 , ...,  0.00776429,
          0.00293998, -0.05772872],
        [ 0.01807099,  0.0283239 ,  0.05470235, ...,  0.01552097,
          0.04105574,  0.03180106],
        ...,
        [ 0.05401468, -0.00202092, -0.05087976, ..., -0.03479323,
          0.02310204,  0.07023103],
        [ 0.03119555, -0.04362645,  0.03101997, ...,  0.04102392,
         -0.0321798 ,  0.04195167],
        [ 0.06480688,  0.01539501,  0.02188217, ...,  0.0205104 ,
         -0.00721685,  0.09803838]]),
 'b1': array([-0.00932938, -1.05858811, -0.03443521, -1.34425762, -1.23381831,
         0.24164396, -1.32587764, -0.61663245, -1.16857995,  0.14029183,
        -0.71183808,  0.53211302,  0.45133199, -1.30435484,  0.92085465,
        -0.17156631, -1.22450551,  0.37291647,  1.76068778, -0.19642116,
        -1.64816245,  0.92283241,  1.57541118, -1.08022151, -0.795708