Skip to content

Commit 15c7b78

Browse files
authored
add dllib user guide (#3212)
* add dllib user guide * add spark-submit-with-bigdl script * copy scripts to zip
1 parent 739ed78 commit 15c7b78

File tree

5 files changed

+401
-2
lines changed

5 files changed

+401
-2
lines changed

docs/readthedocs/source/doc/DLlib/Overview/dllib.md

Lines changed: 312 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,324 @@ It includes the functionalities of the [original BigDL](https://github.com/intel
1313

1414
### 2.1 Install
1515

16+
#### 2.1.1 **Download a pre-built library**
17+
You can download the bigdl-dllib build from the [Release Page](../release.md).
18+
19+
#### 2.1.2 **Link with a release version**
20+
21+
Currently, dllib releases are hosted on maven central; here's an example to add the dllib dependency to your own project:
22+
```xml
23+
<dependency>
24+
<groupId>com.intel.analytics.bigdl</groupId>
25+
<artifactId>bigdl-dllib-[spark_2.4.6|spark_3.1.2]</artifactId>
26+
<version>${BIGD_DLLIB_VERSION}</version>
27+
</dependency>
28+
```
29+
Please choose the suffix according to your Spark platform.
30+
31+
SBT developers can use
32+
```sbt
33+
libraryDependencies += "com.intel.analytics.bigdl" % "dllib-[spark_2.4.6|spark_3.1.2]" % "${BIGDL_DLLIB_VERSION}"
34+
```
35+
1636
### 2.2 Run
37+
#### 2.2.1 **Set Environment Variables**
38+
Set **BIGDL_HOME** and **SPARK_HOME**:
39+
40+
* If you download bigdl-dllib from the [Release Page](../release-download.md)
41+
```bash
42+
export SPARK_HOME=folder path where you extract the spark package
43+
export BIGDL_HOME=folder path where you extract the bigdl package
44+
```
45+
46+
---
47+
#### 2.2.2 **Use Interactive Spark Shell**
48+
You can try bigdl-dllib easily using the Spark interactive shell. Run below command to start spark shell with bigdl-dllib support:
49+
```bash
50+
${BIGDL_HOME}/bin/spark-shell-with-dllib.sh
51+
```
52+
You will see a welcome message looking like below:
53+
```
54+
Welcome to
55+
____ __
56+
/ __/__ ___ _____/ /__
57+
_\ \/ _ \/ _ `/ __/ '_/
58+
/___/ .__/\_,_/_/ /_/\_\ version 2.4.3
59+
/_/
60+
61+
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
62+
Spark context available as sc.
63+
scala>
64+
```
65+
66+
To use BigDL, you should first initialize the environment as below.
67+
```scala
68+
scala> import com.intel.analytics.bigdl.dllib.NNContext
69+
import com.intel.analytics.bigdl.dllib.NNContext
70+
scala> NNContext.initNNContext()
71+
2021-10-25 10:12:36 WARN SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
72+
2021-10-25 10:12:36 WARN SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
73+
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@525c0f74
74+
```
75+
76+
Once the environment is successfully initiated, you'll be able to play with dllib API's.
77+
For instance, to experiment with the ````dllib.keras```` APIs in dllib, you may try below code:
78+
```scala
79+
scala> import com.intel.analytics.bigdl.dllib.keras.layers._
80+
scala> import com.intel.analytics.bigdl.numeric.NumericFloat
81+
scala> import com.intel.analytics.bigdl.dllib.utils.Shape
82+
83+
scala> val seq = Sequential()
84+
val layer = ConvLSTM2D(32, 4, returnSequences = true, borderMode = "same",
85+
inputShape = Shape(8, 40, 40, 32))
86+
seq.add(layer)
87+
```
88+
89+
---
90+
91+
#### 2.2.3 **Run as a Spark Program**
92+
You can run a bigdl-dllib program, e.g., the [Image Inference](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/nnframes/imageInference), as a standard Spark program (running on either a local machine or a distributed cluster) as follows:
93+
94+
1. Download the pretrained caffe model and prepare the images
95+
96+
2. Run the following command:
97+
```bash
98+
# Spark local mode
99+
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
100+
--master local[2] \
101+
--class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
102+
${BIGDL_HOME}/jars/bigdl-dllib-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.14.0
103+
-f DATA_PATH \
104+
-b 4 \
105+
--numLayers 2 --vocab 100 --hidden 6 \
106+
--numSteps 3 --learningRate 0.005 -e 1 \
107+
--learningRateDecay 0.001 --keepProb 0.5
108+
109+
# Spark standalone mode
110+
## ${SPARK_HOME}/sbin/start-master.sh
111+
## check master URL from http://localhost:8080
112+
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
113+
--master spark://... \
114+
--executor-cores cores_per_executor \
115+
--total-executor-cores total_cores_for_the_job \
116+
--class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
117+
${BIGDL_HOME}/jars/bigdl-dllib-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.14.0
118+
-f DATA_PATH \
119+
-b 4 \
120+
--numLayers 2 --vocab 100 --hidden 6 \
121+
--numSteps 3 --learningRate 0.005 -e 1 \
122+
--learningRateDecay 0.001 --keepProb 0.5
123+
124+
# Spark yarn client mode
125+
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
126+
--master yarn \
127+
--deploy-mode client \
128+
--executor-cores cores_per_executor \
129+
--num-executors executors_number \
130+
--class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
131+
${BIGDL_HOME}/jars/bigdl-dllib-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.14.0
132+
-f DATA_PATH \
133+
-b 4 \
134+
--numLayers 2 --vocab 100 --hidden 6 \
135+
--numSteps 3 --learningRate 0.005 -e 1 \
136+
--learningRateDecay 0.001 --keepProb 0.5
137+
138+
# Spark yarn cluster mode
139+
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
140+
--master yarn \
141+
--deploy-mode cluster \
142+
--executor-cores cores_per_executor \
143+
--num-executors executors_number \
144+
--class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
145+
${BIGDL_HOME}/jars/bigdl-dllib-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.14.0
146+
-f DATA_PATH \
147+
-b 4 \
148+
--numLayers 2 --vocab 100 --hidden 6 \
149+
--numSteps 3 --learningRate 0.005 -e 1 \
150+
--learningRateDecay 0.001 --keepProb 0.5
151+
```
152+
153+
The parameters used in the above command are:
154+
155+
* -f: The path where you put your PTB data.
156+
* -b: The mini-batch size. The mini-batch size is expected to be a multiple of *total cores* used in the job. In this example, the mini-batch size is suggested to be set to *total cores * 4*
157+
* --learningRate: learning rate for adagrad
158+
* --learningRateDecay: learning rate decay for adagrad
159+
* --hidden: hiddensize for lstm
160+
* --vocabSize: vocabulary size, default 10000
161+
* --numLayers: numbers of lstm cell, default 2 lstm cells
162+
* --numSteps: number of words per record in LM
163+
* --keepProb: the probability to do dropout
164+
165+
If you are to run your own program, do remember to do the initialize before call other bigdl-dllib API's, as shown below.
166+
```scala
167+
// Scala code example
168+
import com.intel.analytics.bigdl.dllib.NNContext
169+
NNContext.initNNContext()
170+
```
171+
---
172+
173+
### 2.3 Get started
174+
---
175+
176+
This section show a single example of how to use dllib to build a deep learning application on Spark, using Keras APIs
177+
178+
---
179+
#### **LeNet Model on MNIST using Keras-Style API**
180+
181+
This tutorial is an explanation of what is happening in the [lenet](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras) example
182+
183+
A bigdl-dllib program starts with initialize as follows.
184+
````scala
185+
val conf = Engine.createSparkConf()
186+
.setAppName("Train Lenet on MNIST")
187+
.set("spark.task.maxFailures", "1")
188+
val sc = new SparkContext(conf)
189+
Engine.init
190+
````
191+
192+
After the initialization, we need to:
193+
194+
1. Load train and validation data by _**creating the [```DataSet```](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/feature/dataset/DataSet.scala)**_ (e.g., ````SampleToGreyImg````, ````GreyImgNormalizer```` and ````GreyImgToBatch````):
195+
196+
````scala
197+
val trainSet = (if (sc.isDefined) {
198+
DataSet.array(load(trainData, trainLabel), sc.get, param.nodeNumber)
199+
} else {
200+
DataSet.array(load(trainData, trainLabel))
201+
}) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(
202+
param.batchSize)
203+
204+
val validationSet = DataSet.array(load(validationData, validationLabel), sc) ->
205+
BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(
206+
param.batchSize)
207+
````
17208

18-
### 2.3 Get started (example)
209+
2. We then define Lenet model using Keras-style api
210+
````scala
211+
val input = Input(inputShape = Shape(28, 28, 1))
212+
val reshape = Reshape(Array(1, 28, 28)).inputs(input)
213+
val conv1 = Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5").inputs(reshape)
214+
val pool1 = MaxPooling2D().inputs(conv1)
215+
val conv2 = Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5").inputs(pool1)
216+
val pool2 = MaxPooling2D().inputs(conv2)
217+
val flatten = Flatten().inputs(pool2)
218+
val fc1 = Dense(100, activation = "tanh").setName("fc1").inputs(flatten)
219+
val fc2 = Dense(classNum, activation = "softmax").setName("fc2").inputs(fc1)
220+
Model(input, fc2)
221+
````
222+
223+
3. After that, we configure the learning process. Set the ````optimization method```` and the ````Criterion```` (which, given input and target, computes gradient per given loss function):
224+
````scala
225+
model.compile(optimizer = optimMethod,
226+
loss = ClassNLLCriterion[Float](logProbAsInput = false),
227+
metrics = Array(new Top1Accuracy[Float](), new Top5Accuracy[Float](), new Loss[Float]))
228+
````
229+
230+
Finally we _**train the model**_ by calling ````model.fit````:
231+
````scala
232+
model.fit(trainSet, nbEpoch = param.maxEpoch, validationData = validationSet)
233+
````
234+
235+
---
19236

20237
## 3. Python user guide
21238

22239
### 3.1 Install
23240

241+
#### 3.1.1 Official Release
242+
243+
Run below command to install _bigdl-dllib_.
244+
245+
```bash
246+
conda create -n my_env python=3.7
247+
conda activate my_env
248+
pip install bigdl-dllib
249+
```
250+
251+
#### 3.1.2 Nightly build
252+
253+
You can install the latest nightly build of bigdl-dllib as follows:
254+
```bash
255+
pip install --pre --upgrade bigdl-dllib
256+
```
257+
258+
24259
### 3.2 Run
25260

26-
### 3.3 Get started (example)
261+
#### **3.2.1 Interactive Shell**
262+
263+
You may test if the installation is successful using the interactive Python shell as follows:
264+
265+
* Type `python` in the command line to start a REPL.
266+
* Try to run the example code below to verify the installation:
267+
268+
```python
269+
from bigdl.dllib.utils.nncontext import *
270+
271+
sc = init_nncontext() # Initiation of bigdl-dllib on the underlying cluster.
272+
```
273+
274+
#### **3.2.2 Jupyter Notebook**
275+
276+
You can start the Jupyter notebook as you normally do using the following command and run bigdl-dllib programs directly in a Jupyter notebook:
277+
278+
```bash
279+
jupyter notebook --notebook-dir=./ --ip=* --no-browser
280+
```
281+
282+
#### **3.2.3 Python Script**
283+
284+
You can directly write bigdl-dlllib programs in a Python file (e.g. script.py) and run in the command line as a normal Python program:
285+
286+
```bash
287+
python script.py
288+
```
289+
---
290+
### 3.3 Get started
291+
---
292+
293+
#### **Autograd Examples using bigdl-dllb keras Python API**
294+
295+
This tutorial describes the [Autograd](https://github.com/intel-analytics/BigDL/tree/branch-2.0/python/dllib/examples/autograd).
296+
297+
The example first do the initializton using `init_nncontext()`:
298+
```python
299+
sc = init_nncontext()
300+
```
301+
302+
It then generate the input data X_, Y_
303+
304+
```python
305+
data_len = 1000
306+
X_ = np.random.uniform(0, 1, (1000, 2))
307+
Y_ = ((2 * X_).sum(1) + 0.4).reshape([data_len, 1])
308+
```
309+
310+
It then define the custom loss
311+
312+
```python
313+
def mean_absolute_error(y_true, y_pred):
314+
result = mean(abs(y_true - y_pred), axis=1)
315+
return result
316+
```
317+
318+
After that, the example creates the model as follows and set the criterion as the custom loss:
319+
```python
320+
a = Input(shape=(2,))
321+
b = Dense(1)(a)
322+
c = Lambda(function=add_one_func)(b)
323+
model = Model(input=a, output=c)
324+
325+
model.compile(optimizer=SGD(learningrate=1e-2),
326+
loss=mean_absolute_error)
327+
```
328+
Finally the example trains the model by calling `model.fit`:
329+
330+
```python
331+
model.fit(x=X_,
332+
y=Y_,
333+
batch_size=32,
334+
nb_epoch=int(options.nb_epoch),
335+
distributed=False)
336+
```

scala/assembly/src/main/assembly/assembly.xml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,13 @@
1515
<include>spark-bigdl.conf</include>
1616
</includes>
1717
</fileSet>
18+
<fileSet>
19+
<outputDirectory>/bin</outputDirectory>
20+
<directory>${project.parent.basedir}/../scripts</directory>
21+
<includes>
22+
<include>*with-dllib.sh</include>
23+
</includes>
24+
</fileSet>
1825
<fileSet>
1926
<outputDirectory>/jars</outputDirectory>
2027
<directory>${project.parent.basedir}/dllib/target</directory>

scala/make-dist.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,10 +65,12 @@ if [ ! -d "$DIST_DIR" ]
6565
then
6666
mkdir -p $DIST_DIR/lib
6767
mkdir -p $DIST_DIR/conf
68+
mkdir -p $DIST_DIR/bin
6869
else
6970
rm -r $DIST_DIR
7071
mkdir -p $DIST_DIR/lib
7172
mkdir -p $DIST_DIR/conf
73+
mkdir -p $DIST_DIR/bin
7274
fi
7375

7476
cp -r $BASEDIR/dllib/target/bigdl-dllib*-jar-with-dependencies.jar $DIST_DIR/lib
@@ -82,3 +84,4 @@ if [ -f $BASEDIR/friesian/target/bigdl-friesian*-python-api.zip ]; then
8284
cp -r $BASEDIR/friesian/target/bigdl-friesian*-python-api.zip $DIST_DIR/lib
8385
fi
8486
cp -r $BASEDIR/dllib/src/main/resources/spark-bigdl.conf $DIST_DIR/conf
87+
cp -r $BASEDIR/../scripts/* $DIST_DIR/bin

scripts/spark-shell-with-dllib.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/bin/bash
2+
3+
# Check environment variables
4+
if [ -z "${BIGDL_HOME}" ]; then
5+
echo "Please set BIGDL_HOME environment variable"
6+
exit 1
7+
fi
8+
9+
if [ -z "${SPARK_HOME}" ]; then
10+
echo "Please set SPARK_HOME environment variable"
11+
exit 1
12+
fi
13+
14+
#setup paths
15+
export BIGDL_JAR_NAME=`find ${BIGDL_HOME}/jars -name bigdl-dllib*jar-with-dependencies.jar`
16+
export BIGDL_JAR="$BIGDL_JAR_NAME"
17+
export BIGDL_CONF=${BIGDL_HOME}/conf/spark-bigdl.conf
18+
echo $BIGDL_JAR
19+
20+
# Check files
21+
if [ ! -f ${BIGDL_CONF} ]; then
22+
echo "Cannot find ${BIGDL_CONF}"
23+
exit 1
24+
fi
25+
26+
if [ ! -f $BIGDL_JAR ]; then
27+
echo "Cannot find $BIGDL_JAR"
28+
exit 1
29+
fi
30+
31+
${SPARK_HOME}/bin/spark-shell \
32+
--properties-file ${BIGDL_CONF} \
33+
--jars ${BIGDL_JAR} \
34+
--conf spark.driver.extraClassPath=${BIGDL_JAR} \
35+
--conf spark.executor.extraClassPath=${BIGDL_JAR} \
36+
$*

0 commit comments

Comments
 (0)