<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#When-can-be-used?" data-toc-modified-id="When-can-be-used?-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>When can be used?</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Generate-synthetic-data" data-toc-modified-id="Generate-synthetic-data-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Generate synthetic data</a></span></li><li><span><a href="#Linear-regression-via-stochastic-gradient" data-toc-modified-id="Linear-regression-via-stochastic-gradient-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Linear regression via stochastic gradient</a></span></li><li><span><a href="#Linear-regression-via-non-stochastic-gradient" data-toc-modified-id="Linear-regression-via-non-stochastic-gradient-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Linear regression via non-stochastic gradient</a></span></li><li><span><a href="#References" data-toc-modified-id="References-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>References</a></span></li></ul></div>

# Introduction
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-warning">
<font color=black>

**What?** A wrapper around scipy optimise callable by keras

</font>
</div>

# When can be used?
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>


- It is used to run **full batch** optimization rather than mini-batch stochastic gradient descent. 
- It is applicable to factorization of very sparse matrices where stochastic gradient descent is not able to converge.

</font>
</div>

# Imports
<hr style = "border:2px solid black" ></hr>

In [1]:
from keras_opt import scipy_optimizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer
import numpy as np

# Generate synthetic data
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>

- Let's consider this linear system: `Ax=y`
- Our goal would be to find the weights hence x.

</font>
</div>

In [7]:
np.random.seed(42)
X = np.random.uniform(size=40).reshape(10, 4)
y = np.dot(X, np.array([1, 2, 3, 4])[:, np.newaxis])

In [10]:
X.shape

(10, 4)

# Linear regression via stochastic gradient
<hr style = "border:2px solid black" ></hr>

In [19]:
model = Sequential()
model.add(InputLayer(input_shape=(4,)))
model.add(Dense(1, use_bias=False))
model.compile(loss='mse', optimizer="adam")

In [20]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 1)                 4         
                                                                 
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [28]:
history = model.fit(X, y, verbose=1)



In [29]:
model.trainable_weights

[<tf.Variable 'dense_3/kernel:0' shape=(4, 1) dtype=float32, numpy=
 array([[ 0.5509066 ],
        [ 0.15642364],
        [-0.4742774 ],
        [-0.0551316 ]], dtype=float32)>]

# Linear regression via non-stochastic gradient
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>

- The default option is `cg` which stands for conjugate-gradient.

</font>
</div>

In [12]:
model = Sequential()
model.add(InputLayer(input_shape=(4,)))
model.add(Dense(1, use_bias=False))
model.compile(loss='mse')

In [13]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 1)                 4         
                                                                 
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [16]:
#%%
# Use scipy.optimize to minimize the cost
model.train_function = scipy_optimizer.make_train_function(
            model, maxiter=20)
history = model.fit(X, y, verbose=1)

      0/Unknown - 0s 0s/step - loss: 1.5154e-10Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1


In [17]:
# Show weights.
model.trainable_weights

[<tf.Variable 'dense_1/kernel:0' shape=(4, 1) dtype=float32, numpy=
 array([[0.99998045],
        [2.0000176 ],
        [3.0000188 ],
        [3.9999766 ]], dtype=float32)>]

# References
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-warning">
<font color=black>

- https://github.com/pedro-r-marques/keras-opt

</font>
</div>