Add Developer Guide for Python users (#1512)

* fix style for python * add pip build doc * update * add IDE instructions * minor
intel-analytics · Jul 13, 2019 · d7d1f13 · d7d1f13
1 parent 94c27ac
commit d7d1f13
Show file tree

Hide file tree

Showing 6 changed files with 113 additions and 16 deletions.
diff --git a/docs/docs/DeveloperGuide/python.md b/docs/docs/DeveloperGuide/python.md
@@ -0,0 +1,63 @@
+This page gives some general instructions and tips to build and develop Analytics Zoo for Python developers.
+
+You are very welcome to add customized functionalities to Analytics Zoo to meet your own demands. 
+You are also highly encouraged to contribute to Analytics Zoo for extra features so that other community users would get benefits as well.
+
+---
+## **Download Analytics Zoo Source Code**
+Analytics Zoo source code is available at [GitHub](https://github.com/intel-analytics/analytics-zoo):
+
+```bash
+git clone https://github.com/intel-analytics/analytics-zoo.git
+```
+
+By default, `git clone` will download the development version of Analytics Zoo. If you want a release version, you can use the command `git checkout` to change the specified version.
+
+
+---
+## **Build whl package for pip install**
+If you have modified some Python code and want to newly generate the [whl](https://pythonwheels.com/) package for pip install, you can run the following script:
+
+```bash
+bash analytics-zoo/pyzoo/dev/build.sh linux default
+```
+
+**Arguments:**
+
+- The first argument is the __platform__ to build for. Either 'linux' or 'mac'.
+- The second argument is the analytics-zoo __version__ to build for. 'default' means the default version for the current branch. You can also specify a different version if you wish, e.g., '0.6.0.dev1'.
+- You can also add other profiles to build the package, especially Spark and BigDL versions.
+For example, under the situation that `pyspark==2.4.3` is a dependency, you need to add profiles `-Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x` to build Analytics Zoo for Spark 2.4.3.
+
+
+After running the above command, you will find a `whl` file under the folder `analytics-zoo/pyzoo/dist/`. You can then directly pip install it to your local Python environment:
+```bash
+pip install analytics-zoo/pyzoo/dist/analytics_zoo-VERSION-py2.py3-none-PLATFORM_x86_64.whl     # for Python 2.7
+pip3 install analytics-zoo/pyzoo/dist/analytics_zoo-VERSION-py2.py3-none-PLATFORM_x86_64.whl    # for Python 3.5 and Python 3.6
+```
+
+See [here](../PythonUserGuide/install/#install-from-pip-for-local-usage) for more remarks related to pip install.
+
+See [here](../PythonUserGuide/run/#run-after-pip-install) for more instructions to run analytics-zoo after pip install.
+
+
+---
+## **Run in IDE**
+You need to do the following preparations before starting the Integrated Development Environment (IDE) to successfully run an Analytics Zoo Python program in the IDE:
+
+- Build Analytics Zoo. See [here](../ScalaUserGuide/install/#build-with-script-recommended) for more instructions.
+- Prepare Spark environment by either setting `SPARK_HOME` as the environment variable or pip install `pyspark`. Note that the Spark version should match the one you build Analytics Zoo on.
+- Set BIGDL_CLASSPATH:
+```bash
+export BIGDL_CLASSPATH=analytics-zoo/dist/lib/analytics-zoo-*-jar-with-dependencies.jar
+```
+
+- Prepare BigDL Python environment by either downloading BigDL from [GitHub](https://github.com/intel-analytics/BigDL) or pip install `bigdl`. Note that the BigDL version should match the one you build Analytics Zoo on.
+- Add `pyzoo` and `spark-analytics-zoo.conf` to `PYTHONPATH`:
+```bash
+export PYTHONPATH=analytics-zoo/pyzoo:analytics-zoo/dist/conf/spark-analytics-zoo.conf:$PYTHONPATH
+```
+If you download BigDL from [GitHub](https://github.com/intel-analytics/BigDL), you also need to add `BigDL/pyspark` to `PYTHONPATH`:
+```bash
+export PYTHONPATH=BigDL/pyspark:$PYTHONPATH
+```
diff --git a/docs/docs/PythonUserGuide/install.md b/docs/docs/PythonUserGuide/install.md
@@ -24,34 +24,33 @@ sc = init_nncontext()
 ```
 
 **Remarks:**
+
 1. We've tested this package with pip 9.0.1. `pip install --upgrade pip` if necessary.
 2. Pip install supports __Mac__ and __Linux__ platforms.
 3. You need to install Java __>= JDK8__ before running Analytics Zoo, which is required by `pyspark`.
 4. `pyspark==2.4.3`, `bigdl==0.8.0` and their dependencies will automatically be installed if they haven't been detected in the current Python environment.
 
+---
 ## **Install from pip for yarn cluster**
 
 You only need to following these steps on your driver node and we only support yarn-client mode for now.
 
-1) Install [Conda](https://docs.conda.io/projects/conda/en/latest/commands/install.html) and create a conda-env(i.e in the name of "zoo")
+1) Install [Conda](https://docs.conda.io/projects/conda/en/latest/commands/install.html) and create a conda-env (i.e in the name of "zoo").
 
-2) Install Analytics-Zoo into the created conda-env
+2) Install Analytics-Zoo into the created conda-env.
 
 ```
 source activate zoo
 pip install analytics-zoo
-
 ```
 3) Download JDK8 and set the environment variable: JAVA_HOME (recommended).
+
    - You can also install JDK via conda without setting the JAVA_HOME manually:
    `conda install -c anaconda openjdk=8.0.152`
 
-4) Start python and then execute the following code for verification.
-
-- Create a SparkContext on Yarn
+4) Start python and then execute the following code to create a SparkContext on Yarn for verification.
 
 ``` python
-
 from zoo import init_spark_on_yarn
 
 sc = init_spark_on_yarn(

diff --git a/docs/docs/PythonUserGuide/run.md b/docs/docs/PythonUserGuide/run.md
@@ -49,14 +49,14 @@ export BIGDL_JARS=...
 export BIGDL_PACKAGES=...
 ```
 
-## **Run on yarn after pip install
+---
+## **Run on yarn after pip install**
+
+You should use `init_spark_on_yarn` rather than `init_nncontext()` here to create a SparkContext on Yarn.
 
 Start python and then execute the following code:
-Caveat: You should use `init_spark_on_yarn` rather than `init_nncontext()` here.
-- Create a SparkContext on Yarn
 
 ``` python
-
 from zoo import init_spark_on_yarn
 
 sc = init_spark_on_yarn(
@@ -68,7 +68,6 @@ sc = init_spark_on_yarn(
     driver_memory="2g",
     driver_cores=4,
     extra_executor_memory_for_ray="10g")
-
 ```
 
 ---

diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -25,6 +25,8 @@ pages:
     - Install: ScalaUserGuide/install.md
     - Run: ScalaUserGuide/run.md
     - Examples: ScalaUserGuide/examples.md
+- Developer Guide:
+  - For Python Developers: DeveloperGuide/python.md
 - Programming Guide: 
   - Pipeline APIs:
     - DataFrame and ML Pipeline: ProgrammingGuide/nnframes.md

diff --git a/pyzoo/dev/build.sh b/pyzoo/dev/build.sh
@@ -0,0 +1,35 @@
+#!/usr/bin/env bash
+
+#
+# Copyright 2018 Analytics Zoo Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+set -e
+RUN_SCRIPT_DIR=$(cd $(dirname $0) ; pwd)
+echo $RUN_SCRIPT_DIR
+
+if (( $# < 2)); then
+  echo "Usage: build.sh platform version mvn_parameters"
+  echo "Usage example: bash release.sh linux default"
+  echo "Usage example: bash release.sh linux 0.6.0.dev0"
+  echo "If needed, you can also add other profiles such as: -Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x"
+  exit -1
+fi
+
+platform=$1
+version=$2
+profiles=${*:3}
+
+bash ${RUN_SCRIPT_DIR}/release.sh ${platform} ${version} false ${profiles}
diff --git a/pyzoo/dev/release/release.sh → pyzoo/dev/release.sh b/pyzoo/dev/release/release.sh → pyzoo/dev/release.sh
@@ -19,15 +19,15 @@
 set -e
 RUN_SCRIPT_DIR=$(cd $(dirname $0) ; pwd)
 echo $RUN_SCRIPT_DIR
-export ANALYTICS_ZOO_HOME="$(cd ${RUN_SCRIPT_DIR}/../../../; pwd)"
+export ANALYTICS_ZOO_HOME="$(cd ${RUN_SCRIPT_DIR}/../../; pwd)"
 echo $ANALYTICS_ZOO_HOME
-ANALYTICS_ZOO_PYTHON_DIR="$(cd ${RUN_SCRIPT_DIR}/../../../pyzoo; pwd)"
+ANALYTICS_ZOO_PYTHON_DIR="$(cd ${RUN_SCRIPT_DIR}/../../pyzoo; pwd)"
 echo $ANALYTICS_ZOO_PYTHON_DIR
 
 if (( $# < 3)); then
   echo "Usage: release.sh platform version upload mvn_parameters"
   echo "Usage example: bash release.sh linux default true"
-  echo "Usage example: bash release.sh linux 0.6.0.dev0 true -Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x"
+  echo "Usage example: bash release.sh linux 0.6.0.dev0 true"
   echo "If needed, you can also add other profiles such as: -Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x"
   exit -1
 fi
@@ -92,4 +92,3 @@ if [ ${upload} == true ]; then
     echo "Command for uploading to pypi: $upload_command"
     $upload_command
 fi
-