# Linear Regression with ``BigDL-Keras``

Now that we've implemented a whole neural network from scratch, using nothing but `numpy`, let's see how we can make the same model while doing a lot less work. 

Again, let's import some packages, this time adding ``mxnet.gluon`` to the list of dependencies.

In [2]:
from __future__ import print_function
import numpy as np

## Build the Dataset

Again we'll look at the problem of linear regression and stick with the same synthetic data. 

In [4]:
num_inputs = 2
num_outputs = 1
num_examples = 10000
input_mean = [0, 0]
input_variance = [[1, 0],[0, 1]]
noise_mean = 0
noise_variance = 0.01

def real_fn(X):
    return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
    
X = np.random.multivariate_normal(input_mean, input_variance, num_examples)
X /= np.max(X)
noise = np.random.normal(noise_mean, noise_variance, num_examples)
Y = real_fn(X) + noise

## Define the Model

When we implemented things from scratch, 
we had to individually allocate parameters 
and then compose them together as a model. 
While it's good to know how to do things from scratch, 
with `bigdl-keras`, we can just compose a network from predefined layers. 
For a linear model, the appropriate layer is called `Dense`. 
It's called a *dense* layer because every node in the input 
is connected to every node in the subsequent layer. 
That description seems excessive 
because we only have one (non-input) layer here, 
and that layer only contains one node!
But in subsequent chapters we'll typically work 
with networks that have multiple outputs, 
so we might as well start thinking in terms of layers of nodes. 
Because a linear model consists of just a single `Dense` layer, we can instantiate it with one line.

As in [the previous notebook](linear-regression-scratch.ipynb), 
we have an input dimension of 2 and an output dimension of 1. 
the most direct way to instantiate a ``Dense`` layer with these dimensions
is to specify the number of inputs and the number of outputs. You can pass a name to the dense layer. Let's call it "Linear". Also, you can specify whether to add a bias in your linear model. Although the default boolean value for 'bias' is 'True', we just put it explicitly to get a better view of usage.

In [6]:
from bigdl.nn.keras.layer import Dense
dense = Dense(1, input_dim = 2, name="Linear", bias=True)

creating: createKerasDense


In [7]:
from bigdl.nn.keras.topology import Sequential
linear_model = Sequential()
linear_model.add(dense)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


creating: createKerasSequential


<bigdl.nn.keras.topology.Sequential at 0x7fbd60796c50>

# Shape Inference
After creating a model, we can use the `get_input_shape` and `get_output_shape` to
access the input or output shape of a model, which is a shape tuple. 
The first entry is `None` representing the batch dimension. 
For a model with multiple inputs or outputs, a list of shape tuples will be returned.

You can use `flatterned_layers` to know what inside layers are given the unknown model. 
It will return a list of the contained layers in order.

In [28]:
print(linear_model.get_input_shape())
print(linear_model.get_output_shape())
for layer in linear_model.flattened_layers(): print(layer)

(None, 2)
(None, 1)
Dense[Linear]


# Configure Training
You might have noticed that it was a bit more concise to express our model in BigDL-Keras. 
For example, we didn't have to individually allocate parameters, define our loss function,
implement stochastic gradient descent or create a validation method.
Simply, we use `compile` to set all these parameters in one station. 
The benefits of relying on BigDL's abstractions will grow substantially 
once we start working with much more complex models. 

In this case, we specify MSE(mean_squared_error) as our loss, 
SGD(stochastic gradient descent) as our optimizer 
and Top1Accuracy as our validation method.

In [8]:
linear_model.compile(loss='mse',
                  optimizer='sgd',
                  metrics=['accuracy'])

creating: createDefault
creating: createSGD
creating: createMSECriterion
creating: createTop1Accuracy


# Execute Training
With all these pieces together, we can start our training in a much easier way than that in scratch. The Linear Regression model training process will be like for each epoch, 
- grabbing mini-batch of the inputs and feeding them into the model
- compare with the corresponding ground-truth labels to compute loss and graident 
- Use the graident to update the model weight and biases

You don't need to initialize the weights and biases of the model since `fit` method will automatically do it for you. 
You only need to specify the batch size and number of epochs. If additional validation dataset is available, you can pass 
it to `fit` as well. We will try validation in our later chapters so we assign `validation_data=None` for now. The `distributed=True` will indicate the training will be executed in a distributed mode.

In [11]:
linear_model.fit(X, Y, batch_size = 8, nb_epoch=10, validation_data=None, distributed=True)

*Note: See the loss and accuracy in the terminal. We will provide performance visualization in later topics.*

You can find in the terminal, the training process utilize Spark's high-speed distributed computing to accelerate itself. Normally, the loss decreases after each iteration. You can explore what will change if batch size and epoch number is tuned.

## Conclusion 

As you can see, even for a simple example like linear regression, ``BigDL-Keras`` API can help you to write quick and clean code. Next, we'll repeat this exercise for multi-layer perceptrons, extending these lessons to deep neural networks and (comparatively) real datasets. 