Skip to content

Commit

Permalink
Support for pyheavydb instead of pyomniscidb (#33)
Browse files Browse the repository at this point in the history
Co-authored-by: Pearu Peterson <pearu.peterson@gmail.com>
  • Loading branch information
tupui and pearu committed Apr 15, 2022
1 parent eb08ae2 commit 4de2f97
Show file tree
Hide file tree
Showing 21 changed files with 159 additions and 147 deletions.
50 changes: 50 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
[![PyPi package link](https://img.shields.io/pypi/v/heavyai?style=for-the-badge)](https://pypi.org/project/heavyai/)
[![Conda package link](https://img.shields.io/conda/vn/conda-forge/heavyai?style=for-the-badge)](https://anaconda.org/conda-forge/heavyai)


heavyai
=======

This package enables using common Python data science toolkits with
[HeavyDB](http://heavy.ai).
It brings data frame support on CPU and GPU as well as support for arrow.
See the [documentation](http://heavyai.readthedocs.io/en/latest/?badge=latest)
for more.

Quick Install (CPU)
-------------------

Packages are available on conda-forge and PyPI:

```bash
# using conda-forge
conda install -c conda-forge heavyai

# using pip
pip install heavyai
```

Quick Install (GPU)
-------------------

We recommend creating a fresh conda 3.8 or 3.9 environment when installing
heavyai with GPU capabilities.

To install heavyai for GPU Dataframe support (conda-only):

```bash
conda create -n heavyai-gpu -c rapidsai -c nvidia -c conda-forge -c defaults python cudf cudatoolkit heavyai
```

Note that `pyheavydb` needs to be installed in the environment with pip
until `heavydb` is available on conda-forge.

```bash
conda activate heavyai-gpu
pip install pyheavydb
```

Documentation
-------------

Further documentation for heavyai usage is available at: http://heavyai.readthedocs.io/
39 changes: 0 additions & 39 deletions README.rst

This file was deleted.

3 changes: 0 additions & 3 deletions ci/Dockerfile

This file was deleted.

5 changes: 3 additions & 2 deletions ci/build-conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,16 @@ while [[ $# != 0 ]]; do
done

build_test_cpu() {
mamba env create -f environment.yml
mamba env create -f ci/environment.yml
conda activate heavyai-dev
pip install --no-deps -e .
pytest -sv tests/
}

build_test_gpu() {
mamba env create -f environment_gpu.yml
mamba env create -f ci/environment_gpu.yml
conda activate heavyai-gpu-dev
python -c "import cudf"
pip install --no-deps -e .
pytest -sv tests/
}
Expand Down
9 changes: 6 additions & 3 deletions environment.yml → ci/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ channels:
- conda-forge
- defaults
dependencies:
- pandas
- pyomniscidb
- python >=3.7.0
- pyarrow>=3.0.0
- thrift >=0.13
- sqlalchemy # 3.10 issue with one of its dependency if pip
- pandas
- python >=3.7.0
- geopandas
- shapely
- numpy
Expand All @@ -20,3 +20,6 @@ dependencies:
- pytest-mock
- sphinx
- sphinx_rtd_theme
- pip
- pip:
- pyheavydb
10 changes: 7 additions & 3 deletions environment_gpu.yml → ci/environment_gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@ channels:
- conda-forge
- defaults
dependencies:
- cudf>=0.16
- cudatoolkit=11.0
- cudf
- cudatoolkit
- python >=3.7.0
- pyomniscidb
- pyarrow>=3.0.0=*cuda
- thrift >=0.13
- sqlalchemy # 3.10 issue with one of its dependency if pip
- pandas
- geopandas
- shapely
Expand All @@ -23,3 +24,6 @@ dependencies:
- pytest-mock
- sphinx
- sphinx_rtd_theme
- pip
- pip:
- pyheavydb
2 changes: 1 addition & 1 deletion docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ API Reference
Exceptions
----------

.. automodule:: omnisci.exceptions
.. automodule:: heavyai.exceptions
:members: Error, InterfaceError, DatabaseError, OperationalError, IntegrityError, InternalError, ProgrammingError, NotSupportedError
36 changes: 16 additions & 20 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Development Environment Setup
-----------------------------

heavyai is written in plain Python 3 (i.e. no Cython), and as such, doesn't require any specialized development
environment outside of installing the dependencies. However, we do suggest creating a new conda development enviornment
environment outside of installing the dependencies. However, we do suggest creating a new conda development environment
with the provided conda `environment.yml` file to ensure that your changes work without relying on unspecified system-level
Python packages.

Expand All @@ -35,7 +35,7 @@ CPU Environment
# clone heavyai repo
git clone https://github.com/heavyai/heavyai.git && cd heavyai
conda env create -f ./environment.yml
conda env create -f ci/environment.yml
# ensure you have activated the environment
conda activate heavyai-dev
Expand All @@ -50,7 +50,7 @@ GPU Environment
.. code-block:: shell
# from the heavyai project root
conda env create -f environment_gpu.yml
conda env create -f ci/environment_gpu.yml
# ensure you have activated the environment
conda activate heavyai-gpu-dev
Expand Down Expand Up @@ -138,7 +138,7 @@ installation instructions.
Updating Apache Thrift Bindings
-------------------------------

When the upstream `mapd-core`_ project updates its Apache Thrift definition file, the bindings shipped with
When the upstream `HeavyDB`_ project updates its Apache Thrift definition file, the bindings shipped with
``heavyai`` need to be regenerated. Note that the `heavydb` repository must be cloned locally.

.. code-block:: shell
Expand All @@ -150,7 +150,7 @@ When the upstream `mapd-core`_ project updates its Apache Thrift definition file
cd ./heavydb
# Use Thrift to generate the Python bindings
thrift -gen py -r omnisci.thrift
thrift -gen py -r heavy.thrift
# Copy the generated bindings to the heavyai root
cp -r ./gen-py/heavydb/* ../heavyai/heavydb/
Expand All @@ -171,7 +171,7 @@ you need to install sphinx and sphinx-rtd-theme into your development environmen
pip install sphinx sphinx-rtd-theme
Once you have sphinx installed, to build the documentation switch to the ``heavyai/docs`` directory and run ``make html``. This will update the documentation
in the ``heavyai/docs/build/html`` directory. From that directory, running ``python -m http.server`` will allow you to preview the site on ``localhost:8000``
in the ``heavyai/docs/build/html`` directory. From that directory, ``index.html`` can be opened
in the browser. Run ``make html`` each time you save a file to see the file changes in the documentation.

--------------------------------
Expand All @@ -182,24 +182,21 @@ heavyai doesn't currently follow a rigid release schedule; rather, when enough f
version to be released, or a sufficiently serious bug/issue is fixed, we will release a new version. heavyai is distributed via `PyPI`_
and `conda-forge`_.

Prior to submitting to PyPI and/or conda-forge, create a new `release tag`_ on GitHub (with notes), then run ``git pull`` to bring this tag to your
local heavyai repository folder.

****
PyPI
****

To publish to PyPI, we use the `twine`_ package via the CLI. twine only allows for submitting to PyPI by registered users
(currently, internal Heavy.AI employees):
To publish to PyPI, we use `flit`_ in the CI. Upon a new tag push, the
package is built and published on PyPI. Be sure to have a matching version
in `pyproject.toml` and tag.

.. code-block:: shell
Authorized users can also publish a new version locally:

conda install twine
python setup.py sdist
twine upload dist/*
.. code-block:: shell
Publishing a package to PyPI is near instantaneous after runnning ``twine upload dist/*``. Before running ``twine upload``, be sure
the ``dist`` directory only has the current version of the package you are intending to upload.
conda install flit
flit build
flit publish
***********
conda-forge
Expand All @@ -212,7 +209,7 @@ nothing that needs to be done to speed this up, just be patient.
When the conda-forge bot opens a PR on the heavyai-feedstock repo, one of the feedstock maintainers needs to validate the correctness
of the PR, check the accuracy of the package versions on the `meta.yaml`_ recipe file, and then merge once the CI tests pass.

.. _mapd-core: https://github.com/omnisci/mapd-core
.. _HeavyDB: https://github.com/heavyai/heavydb
.. _Docker: https://hub.docker.com/u/omnisci
.. _CPU image: https://hub.docker.com/r/omnisci/core-os-cpu
.. _HeavyDB Core GPU-enabled: https://hub.docker.com/r/omnisci/core-os-cuda
Expand All @@ -224,6 +221,5 @@ of the PR, check the accuracy of the package versions on the `meta.yaml`_ recipe
.. _pull requests: https://github.com/heavyai/heavyai/pulls
.. _PyPI: https://pypi.org/project/heavyai/
.. _conda-forge: https://github.com/conda-forge/heavyai-feedstock
.. _release tag: https://github.com/heavyai/heavyai/releases
.. _twine: https://pypi.org/project/twine/
.. _flit: https://pypi.org/project/flit/
.. _meta.yaml: https://github.com/conda-forge/heavyai-feedstock/blob/main/recipe/meta.yaml
9 changes: 2 additions & 7 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,15 @@
.. heavyai documentation master file, created by
sphinx-quickstart on Fri Jun 23 12:29:54 2017.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
heavyai
=========

`heavyai` provides a python DB API 2.0-compliant `HeavyDB`_
interface (formerly MapD). In addition, it provides methods to get results in
interface (formerly OmniSci and MapD). In addition, it provides methods to get results in
the `Apache Arrow`_-based `cudf GPU DataFrame`_ format for efficient data interchange.

.. code-block:: python
>>> from heavyai import connect
>>> con = connect(user="admin", password="HyperInteractive", host="localhost",
... dbname="heavydb")
... dbname="heavyai")
>>> df = con.select_ipc_gpu("SELECT depdelay, arrdelay"
... "FROM flights_2008_10k"
... "LIMIT 100")
Expand Down
32 changes: 18 additions & 14 deletions docs/source/releasenotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Release Notes
=============

The release notes for pymapd are managed on the GitHub repository in the `Releases tab`_. Since pymapd
The release notes for heavyai are managed on the GitHub repository in the `Releases tab`_. Since heavyai
releases try to track new features in the main OmniSci Core project, it's highly recommended that you check
the Releases tab any time you install a new version of pymapd or upgrade OmniSci so that you understand any breaking
the Releases tab any time you install a new version of heavyai or upgrade OmniSci so that you understand any breaking
changes that may have been made during a new pymapd release.

Some notable breaking changes include:
Expand All @@ -17,6 +17,8 @@ Some notable breaking changes include:
======= ===============
Release Breaking Change
======= ===============
`1.0`_ Change dependency from `pyomniscidb` to `pyheavydb`
`0.30`_ New name `heavyai`
`0.17`_ Added preliminary support for Runtime User-Defined Functions
`0.15`_ Support for binary TLS Thrift connections
`0.14`_ Updated Thrift bindings to 4.8
Expand All @@ -37,15 +39,17 @@ Some notable breaking changes include:



.. _Releases tab: https://github.com/omnisci/pymapd/releases
.. _0.6: https://github.com/omnisci/pymapd/releases/tag/v0.6.0
.. _0.7: https://github.com/omnisci/pymapd/releases/tag/v0.7.0
.. _0.8: https://github.com/omnisci/pymapd/releases/tag/v0.8.0
.. _0.9: https://github.com/omnisci/pymapd/releases/tag/v0.9.0
.. _0.10: https://github.com/omnisci/pymapd/releases/tag/v0.10.0
.. _0.11: https://github.com/omnisci/pymapd/releases/tag/v0.11.0
.. _0.12: https://github.com/omnisci/pymapd/releases/tag/v0.12.0
.. _0.13: https://github.com/omnisci/pymapd/releases/tag/v0.13.0
.. _0.14: https://github.com/omnisci/pymapd/releases/tag/v0.14.0
.. _0.15: https://github.com/omnisci/pymapd/releases/tag/v0.15.0
.. _0.17: https://github.com/omnisci/pymapd/releases/tag/v0.17.0
.. _Releases tab: https://github.com/heavyai/heavyai/releases
.. _0.6: https://github.com/heavyai/heavyai/releases/tag/v0.6.0
.. _0.7: https://github.com/heavyai/heavyai/releases/tag/v0.7.0
.. _0.8: https://github.com/heavyai/heavyai/releases/tag/v0.8.0
.. _0.9: https://github.com/heavyai/heavyai/releases/tag/v0.9.0
.. _0.10: https://github.com/heavyai/heavyai/releases/tag/v0.10.0
.. _0.11: https://github.com/heavyai/heavyai/releases/tag/v0.11.0
.. _0.12: https://github.com/heavyai/heavyai/releases/tag/v0.12.0
.. _0.13: https://github.com/heavyai/heavyai/releases/tag/v0.13.0
.. _0.14: https://github.com/heavyai/heavyai/releases/tag/v0.14.0
.. _0.15: https://github.com/heavyai/heavyai/releases/tag/v0.15.0
.. _0.17: https://github.com/heavyai/heavyai/releases/tag/v0.17.0
.. _0.30: https://github.com/heavyai/heavyai/releases/tag/v0.30.0
.. _1.0: https://github.com/heavyai/heavyai/releases/tag/v1.0.0
10 changes: 5 additions & 5 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,16 +74,16 @@ To create a :class:`Connection` using the ``connect()`` method along with ``user
>>> from heavyai import connect
>>> con = connect(user="admin", password="HyperInteractive", host="localhost",
... dbname="heavydb")
... dbname="heavyai")
>>> con
Connection(mapd://admin:***@localhost:6274/heavydb?protocol=binary)
Connection(heavydb://admin:***@localhost:6274/heavyai?protocol=binary)
Alternatively, you can pass in a `SQLAlchemy`_-compliant connection string to
the ``connect()`` method:

.. code-block:: python
>>> uri = "mapd://admin:HyperInteractive@localhost:6274/heavydb?protocol=binary"
>>> uri = "heavydb://admin:HyperInteractive@localhost:6274/heavyai?protocol=binary"
>>> con = connect(uri=uri)
Connection(mapd://admin:***@localhost:6274/heavydb?protocol=binary)
Expand Down Expand Up @@ -171,7 +171,7 @@ install, ``pandas.read_sql()`` works everywhere):
>>> from heavyai import connect
>>> import pandas as pd
>>> con = connect(user="admin", password="HyperInteractive", host="localhost",
... dbname="heavydb")
... dbname="heavyai")
>>> df = pd.read_sql("SELECT depdelay, arrdelay FROM flights_2008_10k limit 100", con)
Expand All @@ -190,7 +190,7 @@ Or by using a context manager:

.. code-block:: python
>>> with con as c:
>>> with con.cursor() as c:
... print(c)
<heavyai.cursor.Cursor object at 0x1041f9630>
Expand Down

0 comments on commit 4de2f97

Please sign in to comment.