Skip to content

Commit

Permalink
Update FAQ for how to use h2o4gpu with pygdf/mapd and update API to a…
Browse files Browse the repository at this point in the history
…llow integer/raw pointer values (#655)
  • Loading branch information
pseudotensor committed Jul 26, 2018
1 parent 1dad27d commit 81e0902
Show file tree
Hide file tree
Showing 11 changed files with 2,889 additions and 669 deletions.
28 changes: 27 additions & 1 deletion DEVEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,17 +134,43 @@ git clone https://github.com/h2oai/xgboost
cd xgboost
git checkout h2oai
make -f Makefile2
pip install python-package/dist/xgboost-0.71-py3-none-any.whl --upgrade
pip install python-package/dist/xgboost*.whl --upgrade
```
Note: By default the GPU NCCL version is installed using your local cuda version.

If fully understand build, can do jump to latter steps of
"fullinstall", but when in doubt always do "fullinstall."

## Re-builds:

"fullinstall" compiles and installs the entire package, which can take a while. If you only changes certain files, a more limited re-build can be done:

If only change C++ files, can just do:
```
make cpp
```

If only change python files, can just do:
```
make py
```

If only changed how python files were packages, can just do:
```
make install
```

If changed cpp files and want the python install to build and install, do:
```
make build install # same as make cpp py install
```

## Build flags and options:

To find a full list of used flags and options please refer to `make/config.mk`. Here are the most useful ones:

To find a full list of used flags and options please refer to `make/config.mk`. Here are the most useful ones:

##### Debug mode

To build the code in debug mode set `CMAKE_BUILD_TYPE=Debug` when building e.g. `make fullinstall CMAKE_BUILD_TYPE=Debug`.
Expand Down
2 changes: 2 additions & 0 deletions Dockerfile-build
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ RUN yum install -y \
libpng-devel \
freetype-devel \
blas-devel \
epel-release \
zeromq-devel \
openblas-devel && \
wget https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-`arch`.sh && \
bash Miniconda3-${MINICONDA_VERSION}-Linux-`arch`.sh -b -p /opt/h2oai/h2o4gpu/python && \
Expand Down
2 changes: 2 additions & 0 deletions Dockerfile-runtime
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ RUN yum install -y \
make \
ncurses-devel \
zlib-devel \
epel-release \
zeromq-devel \
wget \
blas-devel \
openblas-devel \
Expand Down
150 changes: 150 additions & 0 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,3 +190,153 @@ pipeline.
[GTSVM](http://ttic.uchicago.edu/~cotter/projects/gtsvm/) has kernelized SVMs, and
[cuSVM](http://patternsonascreen.net/cuSVM.html) has SVMs for classification and regression.


### How can I use pygdf with h2o4gpu?

### How can I use pygdf with h2o4gpu inside DAI environment?

1) Get DAI cuda9.0 rpm/deb installed
```
wget https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/rel-1.2.2-6/x86_64-centos7/dai_1.2.2_amd64.deb
dpkg -i dai_1.2.2_amd64.deb
apt-get update
apt-get install dai_1.2.2_amd64.deb
```

2) Install gdf libs:

```
mkdir gdf ; cd gdf
# Go to:
https://anaconda.org/gpuopenanalytics/libgdf/files
# Download https://anaconda.org/gpuopenanalytics/libgdf/0.1.0a2.dev/download/linux-64/libgdf-0.1.0a2.dev-cuda9.0_67.tar.bz2
tar jxvf libgdf-0.1.0a2.dev-cuda9.0_67.tar.bz2
# This will create three folders in the current directory, lib, include and info. Ignore the info folder. We need only lib and include for the next step
sudo cp -a include/gdf /opt/h2oai/dai/python/include/
sudo cp -a lib/libgdf.so /opt/h2oai/dai/python/lib
sudo chmod a+rx /opt/h2oai/dai/python/lib/libgdf.so
```

3) Make and install wheel files

```
# Make wheel files
git clone https://github.com/gpuopenanalytics/pygdf.git
# Created a docker container for building the product
cd pygdf/ ; docker build -t pygdf .
# Logged in interactively to the image:
mkdir -p ~/tmpw ; chmod u+rwx ~/tmpw ; docker run -it -v ~/tmpw:/tmpw pygdf bash
# Activate the conda environment
cd /
source activate gdf
cd pygdf
python setup.py bdist_wheel
cp dist/*.whl /tmpw/
cd /libgdf/build
cmake ..
make install
make copy_python
python setup.py bdist_wheel
cp dist/*.whl /tmpw/
exit
# Now install the wheel files
cd ~/tmpw/
sudo /opt/h2oai/dai/dai-env.sh python3.6 -m wheel install `ls libgdf_cffi*.whl` --force
sudo /opt/h2oai/dai/dai-env.sh python3.6 -m wheel install `ls pygdf-*.whl` --force
sudo chmod -R a+rx /opt/h2oai/dai/python/
```

4a) Install mapd servers

```
# https://www.mapd.com/docs/v3.1.3/getting-started/installation/
sudo apt install default-jre-headless
export LD_LIBRARY_PATH=/usr/lib/jvm/default-java/jre/lib/amd64/server:$LD_LIBRARY_PATH # can be added to env, like to end of ~/.bashrc file
# Go to here and choose install needed. The instructions are nicely setup, but look out for the typos.
https://www.mapd.com/platform/download-community/
# typo: sudo apt update sudo apt install mapd -> sudo apt update && sudo apt install mapd
# typo: cd $MAPD_PATH/systemd sudo ./install_mapd_systemd.sh -> cd $MAPD_PATH/systemd && sudo ./install_mapd_systemd.sh
# when following install_mapd_systemd.sh, just hit enter to accept all defaults (root as who runs, and ensure ~/.bashrc has correct MAPD_USER and MAPD_GROUP as root
# Once reach "Activation" step, change slightly what one does: sudo $MAPD_PATH/insert_sample_data -> cd $MAPD_PATH ; sudo $MAPD_PATH/insert_sample_data
```

or 4b) Install mapd servers from open-source repo:
```
https://github.com/mapd/mapd-core
```

5) Install mapd for python

```
https://arrow.apache.org/docs/python/development.html#development # but uses conda
sudo /opt/h2oai/dai/dai-env.sh conda install gxx_linux-64
sudo /opt/h2oai/dai/dai-env.sh conda install python=3.6.4
sudo chmod -R a+rx /opt/h2oai/dai/python
sudo /opt/h2oai/dai/dai-env.sh python3.6 -m pip install arrow cython
sudo chmod -R a+rx /opt/h2oai/dai/python
sudo /opt/h2oai/dai/dai-env.sh python3.6 -m pip install pyarrow
sudo chmod -R a+rx /opt/h2oai/dai/python
sudo /opt/h2oai/dai/dai-env.sh python3.6 -m pip install pymapd # needs libraries like arrow and arrow_python, which above arrow webpage says how to install everything from source but that requires conda. Stuck? I just need the libs, not conda, so annoying.
sudo chmod -R a+rx /opt/h2oai/dai/python
```

Next time you reboot, don't have to re-run mapd as servers will already be going. If disabled mapd servers, redo this by doing:
```
cd $MAPD_PATH
sudo systemctl start mapd_server
sudo systemctl start mapd_web_server
sudo systemctl enable mapd_server
sudo systemctl enable mapd_web_server
```

6) Install cuda toolkit for conda (should install cuda9.0)

```
sudo /opt/h2oai/dai/dai-env.sh conda install cudatoolkit
```

7) Smoke test

```
/opt/h2oai/dai/dai-env.sh python
import h2o4gpu
import pygdf
import pymapd
```

8) Import data test

```
cd ~/
git clone git@github.com:h2oai/gpuopenai.git
cp gpuopenai/pygdf/notebooks/ipums_easy.csv.gz .
cp gpuopenai/pygdf/notebooks/create_table_ipums_easy.txt .
gunzip ipums_easy.csv.gz
cd /opt/mapd/
sudo cp ~/ipums_easy.csv .
sudo /opt/h2oai/dai/dai-env.sh ./bin/mapdql
# use password: HyperInteractive
# paste into interactive mapdql shell the contents of entire create_table_ipums_easy.txt but change CLUSTER type to DOUBLE instead of INTEGER to avoid overflow issue.
COPY ipums_easy FROM './ipums_easy.csv';
# if any records rejected, check: /var/lib/mapd/data/mapd_log/mapd_server.INFO
```

9) Install other python dependencies for notebook

```
cd ~/h2o4gpu/src/interface_py
sudo /opt/h2oai/dai/dai-env.sh pip install -r requirements_runtime_demos.txt
sudo chmod -R a+rx /opt/h2oai/dai/python/
```

10) Notebook test

```
emacs -nw ~/.local/./share/jupyter/kernels/python3/kernel.json # and edit so python (just after argv line) is instead /opt/h2oai/dai/python/bin/python and edit display name to "python (dai)" to ensure see this name in jupyter notebook
cd ~/h2o4gpu/examples/py/goai/
/opt/h2oai/dai/dai-env.sh /opt/h2oai/dai/python/bin/jupyter notebook
# Choose http://localhost:8888/notebooks/mapd_to_pygdf_to_h2oaiglm.ipynb
# Choose tab Cell and then select "Run All"
```

4 changes: 2 additions & 2 deletions examples/py/goai/GLM.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1529,7 +1529,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3.6.4(dai)",
"language": "python",
"name": "python3"
},
Expand All @@ -1543,7 +1543,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.6.4"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 81e0902

Please sign in to comment.