This project package is meant to be an interface between Hbase and spark that moves information directly from the thrift api to spark rdd.
NOTE: THIS PACKAGE IS UNDER HEAVY DEVELOPEMENT AND IS NOT MATURE IN ANY MEANS. BUGS AND CHANGES TO THE API SHOULD BE EXPECTED.
The current developement environment is as follows:
- python 3.6.9
- happybase 1.2.0
- pyspark 3.2.0
The target development environment is as follows:
- python 2.7.5
- happybase 1.2.0
- (spark) 2.2.0.cloudera1
Currently, dependency requirements through the package may be inconsistent. If issues persist, please emulate the developement environment provided above.
This package has been created following the https://packaging.python.org/tutorials/packaging-projects/
tutorial.
- pyproject.toml:
- Determines dependencies for PIP
- setup.cfg:
- Static configuration for setuptools (packagemanagement)
In order to install the package, pip can be used:
pip install hbspark
And for usage documentation, please refer to the readthedocs page which includes an in depth API.