# Linear Regression

In this tutorial, we will introduce how to use BigDL to train to a simple linear regression model. The first thing we need to do it to import necessary packages and inilialize the engine.

In [1]:
import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkContext

import com.intel.analytics.bigdl._
import com.intel.analytics.bigdl.utils.{Engine, LoggerFilter, T, Table}
import com.intel.analytics.bigdl.dataset.{DataSet, Sample}
import com.intel.analytics.bigdl.nn.{Sequential, Linear, MSECriterion}
import com.intel.analytics.bigdl.optim._
import com.intel.analytics.bigdl.models.lenet.Utils._
import com.intel.analytics.bigdl.optim.{SGD, Top1Accuracy}
import com.intel.analytics.bigdl.tensor._
import com.intel.analytics.bigdl.numeric.NumericFloat

Engine.init

Then we randomly create datasets for training.

In [2]:
val featuresDim = 2
val dataLen = 100

def GetRandSample() = {
    val features = Tensor(featuresDim).rand(0, 1)
    val label = (0.4 + features.sum * 2).toFloat
    val sample = Sample[Float](features, label)
    sample
}

val rddTrain = sc.parallelize(0 until dataLen).map(_ => GetRandSample())

Then we specify the necessary parameters and construct a linear regression model using BigDL. Please notice that batch_size should be devided by the number of cores you use. In this example, it was set as 8 since there are 4 cores when running the example.

In [3]:
// Parameters
val learningRate = 0.2
val trainingEpochs = 5
val batchSize = 4
val nInput = featuresDim
val nOutput = 1 

def LinearRegression(nInput: Int, nOutput: Int) = {
    // Initialize a sequential container
    val model = Sequential()
    // Add a linear layer
    model.add(Linear(nInput, nOutput))
    model
}

val model = LinearRegression(nInput, nOutput)

Here we construct the optimizer to optimize the linear regression problem. You can specific your own learning rate in $SGD()$ method, also, you can replace the $SGD()$ with other optimizer such like $Adam()$. Click [here](https://github.com/intel-analytics/BigDL/tree/master/spark/dl/src/main/scala/com/intel/analytics/bigdl/optim) to see more optimizer.

In [4]:
val optimizer = Optimizer(model = model, sampleRDD = rddTrain, criterion = MSECriterion[Float](), batchSize = batchSize)
optimizer.setOptimMethod(new SGD(learningRate=learningRate))
optimizer.setEndWhen(Trigger.maxEpoch(trainingEpochs))

com.intel.analytics.bigdl.optim.DistriOptimizer@2ec4ba5b

In [5]:
// Start to train
val trainedModel = optimizer.optimize()

In [6]:
val predictResult = trainedModel.predict(rddTrain)
val p = predictResult.take(5).map(_.toTensor.valueAt(1)).mkString(",")
println("Predict result:")
println(p)

Predict result:
3.7649865,2.7541423,1.9586959,1.5578532,3.7649865


To test the trained model, we construct a dataset for testing and print the result of _Mean Square Error_.

In [8]:
val r = new scala.util.Random(100)
val totalLength = 10
val features = Tensor(totalLength, featuresDim).rand(0, 1)
var label = (0.4 + features.sum).toFloat
val prediction = sc.parallelize(0 until totalLength).map(r => Sample[Float](features(r + 1), label))
val predictResult = trainedModel.predict(prediction)
val p = predictResult.take(6).map(_.toTensor.valueAt(1))
val groundLabel = Tensor(T(
                    | T(-0.47596836f),
                    | T(-0.37598032f),
                    | T(-0.00492062f),
                    | T(-0.5906958f),
                    | T(-0.12307882f),
                    | T(-0.77907401f)))

var mse = 0f
for (i <- 1 to 6) {
    mse += (p(i - 1) - groundLabel(i).valueAt(1)) * (p(i - 1) - groundLabel(i).valueAt(1))
}
mse /= 6f
println(mse)

5.747768


Finally, we stop the Spark.

In [9]:
sc.stop()