Skip to content

Commit

Permalink
First commit into master (#38)
Browse files Browse the repository at this point in the history
* initial upload

* dev branch workflow

* Update README.md

* starting to setup coverage

* flake err cleanup

* deleted more unused code

* can't find a good githubactions coverage

* can't find a good githubactions coverage

* bug fixes

* consolidating tests

* XGB Regressor is failing

* commiting lgbm regressor tests

* using params

* fixing lgbm max_depth bug

* better test output. TODO: fix the max_depth for lgbm and xgb to not fall through to None, need to compute

* adding failure case test. TODO: why does RF  not have extra_config in  regressor

* pinning to xgboost .90 for now

* refactoring tree's extra_config for xgb and lgbm

* fixing reversed param

* adding gbdt test file

* refactoring beam params functions

* making  all beam params  as numpy

* increasing coverege by shifting label starts  and by deleting unused model.infer_initial_types()

* extra config for rf reg

* flake8

* more error testing

* using onnxconverter types instead of copypaste

* more consolidation

* more test coverage

* first step in refactor

* cleaning up batch params

* adding beam++ to node size 1 test

* there is a bug, documenting

* renaming trees to match paper

* test

* adding precommit hooks

* README.md

* readme update

* commit hooks

* Fixing badge link to be relative

* notebook for demo

* notebook for demo

* notebook params change

* reveriting 2c95f48 and reopening issue #9; this solution is too clunky

* bumping pyt req

* Fix pytorch requirements

* Fix to brackets for alpha in xgboost

* Few minor fixes to comments in tests

* Removed unecessary regression tests

* Add binary classification tests for gemm, tree_trav and perf_tree_trav

* Fixes to whitespaces

* updating readme

* filling out contrib section

* expanding readme example so that (1) it actually runs (2) it actually does a thing

* cleaning notebook example

* Fix to typo and update to the requirements

* Fix to flake8 errors

* readme changes from this morning

* changes based on feedback

* Few edits to contributing

* Few edits in the README file

* fixing mailto: syntax

* Remove initial_types from the converter API

* Rename Skl2PyTorch container into HBPyTorch

* Add convert_xgboost and convert_lightgbm API

* Fix to spacing

* remove pandas check (for the moment)

* fix import

* Fix readme to use the new API

* removed common directory

* add some documentation

* renamed few things

* code refactoring for trees

* refactor lightgbm and xgboost by moving stuff into gbdt_commons

* done with a pass on gbdt after moving everything to _gbdt_common

* final refactoring of gbdt classes

* rename random forest stuff into decision tree

* major refactoring for tree implementations

* some renaming here and there

* minor fix

* Add test to validate that issue #7 is closed.

* import container stuff from onnx-common

* fix the parser to use the topology in onnx-common

* remove unnecessary files

* address first chunk of Karla's comments

* fix typo in calibration

* Another round of comments addressed

* fix typo

* these two lines seem unnecessary

* moving notebooks from broken branch

* adding notebooks with new API changes

* removing comment

* removed few unnecessary code and edited some documentation

* Update CONTRIBUTING.md

* remove . from git clone

* Final pass over non-converters files documentation / API

* add constants for converters

* simplify a bit the API by using extra_config for optional parameters

* Update CONTRIBUTING.md

* done with documentation over public classes , methods

* add contants and extra config management

* addressing Karla's comments

* pip install pdoc; pdoc --html hummingbird

* pdoc3, using overrides to get extra doc if we want it

* add few tests to check that we actually pick the correct implementation

* Update README.md

* Reformat doc

* add HB logo to readme file

* Add HB logo in doc

* add assertion on model being not None

Co-authored-by: Karla Saur <karla.saur@microsoft.com>
Co-authored-by: Matteo Interlandi <mainterl@microsoft.com>
  • Loading branch information
3 people committed Apr 29, 2020
1 parent fb4e437 commit 290a4bd
Show file tree
Hide file tree
Showing 55 changed files with 10,267 additions and 14 deletions.
5 changes: 5 additions & 0 deletions .coveragerc
@@ -0,0 +1,5 @@
[run]
branch = True
source = hummingbird
omit =
*tests*
5 changes: 5 additions & 0 deletions .flake8
@@ -0,0 +1,5 @@
[flake8]
ignore = E203, E266, E501, W503, F403, F401, C901
max-line-length = 127
max-complexity = 10
select = B,C,E,F,W,T4,B9
49 changes: 49 additions & 0 deletions .github/workflows/pythonapp.yml
@@ -0,0 +1,49 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python application

on:
push:
branches:
- master
- develop

pull_request:
branches:
- master
- develop

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python 3.7
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# The GitHub editor is 127 chars wide
flake8 . --count --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pip install -r requirements.txt && pip install -e . && pip install pytest
pytest
- name: Coverage
run: |
pip install -r requirements.txt && pip install -e . && pip install coverage
coverage run -m pytest tests
MINIMUM=70
SCORE=$(coverage report -m | tail -n 1 | awk '{print $NF}' | rev | cut -c2- | rev)
if [ $SCORE -ge $MINIMUM ]; then echo "COVERAGE ($SCORE) OK"; else echo "WARNING: Coverage is $SCORE but should be at least $MINIMUM"; fi
80 changes: 80 additions & 0 deletions .gitignore
@@ -0,0 +1,80 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Jupyter Notebook
.ipynb_checkpoints

# Environments
.env
.venv
env
env/
venv
venv/
ENV/
env.bak/
venv.bak/

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json


# project specific
.vscode/*
configs/db/*.config
configs/github/*.token

17 changes: 17 additions & 0 deletions .pre-commit-config.yaml
@@ -0,0 +1,17 @@
repos:
- repo: https://github.com/psf/black
rev: stable
hooks:
- id: black
language_version: python3.6
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v1.2.3
hooks:
- id: flake8
- id: check-added-large-files
- id: check-ast
- id: check-byte-order-marker
- id: check-merge-conflict
- id: detect-private-key
- id: trailing-whitespace
- id: no-commit-to-branch
80 changes: 80 additions & 0 deletions CONTRIBUTING.md
@@ -0,0 +1,80 @@
# Contributing

## Welcome

If you are here, it means you are interested in helping us out. A hearty welcome and thank you! There are many ways you can contribute to Hummingbird:

* Offer PR's to fix bugs or implement new features;
* Give us feedback and bug reports regarding the software or the documentation;
* Improve our examples, and documentation.
This project welcomes contributions and suggestions.

## Getting Started

Please join the community on Gitter *gitter badge*. Also please make sure to take a look at the project [roadmap](wiki/Roadmap-for-Upcoming-Features-and-Support).


### Pull requests
If you are new to GitHub [here](https://help.github.com/categories/collaborating-with-issues-and-pull-requests/) is a detailed help source on getting involved with development on GitHub.

As a first time contributor, you will be invited to sign the Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. You will only need to do this once across all repos using our CLA.

Your pull request needs to reference a filed issue. Please fill in the template that is populated for the pull request. Only pull requests addressing small typos can have no issues associated with them.

All commits in a pull request will be [squashed](https://github.blog/2016-04-01-squash-your-commits/) to a single commit with the original creator as author.

### Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

## Developing
The simplest setup is:
```
mkdir hummingbird
cd hummingbird
git clone https://github.com/microsoft/hummingbird.git .
pip install -e .
```

#### Pre-commit
This project uses [pre-commit](https://pre-commit.com/) hooks. Run `pip install pre-commit` if you don't already have this in your machine. Afterward, run `pre-commit install` to install pre-commit into your git hooks.

And before you commit, you can run it like this `pre-commit run --all-files` and should see output such as:

```
black............................Passed
Flake8...........................Passed
...
Don't commit to branch...........Passed
```

If you have installed your pre-commit hooks successfully, you should see something like this if you try to commit something non-conformant:
```
$ git commit -m "testing"
black............................Failed
- hook id: black
- files were modified by this hook
reformatted hummingbird/convert.py
All done!
1 file reformatted.
```

#### Formatting
We generally use all pep8 checks, with the exception of line length 127.

To do a quick check-up before commit, try:
```
flake8 . --count --max-complexity=10 --max-line-length=127 --statistics
```

#### Coverage

For coverage, we use [coverage.py](https://coverage.readthedocs.io/en/coverage-5.0.4/) in our Github Actions. Run `pip install coverage` if you don't already have this, and any code you commit should generally not significantly impact coverage.

We strive to keep our test coverage about 70%. To run all unit tests:
```
coverage run -m pytest tests
```
89 changes: 75 additions & 14 deletions README.md
@@ -1,14 +1,75 @@

# Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
[![](https://i.imgur.com/0pp9lMS.png?1)](https://github.com/microsoft/hummingbird/)

# Hummingbird

![](https://github.com/microsoft/hummingbird/workflows/Python%20application/badge.svg?branch=develop)

## Introduction
*Hummingbird* converts trained traditional Machine Learning models into [PyTorch](https://pytorch.org/). Once in the PyTorch format, <!--you can further convert to [ONNX](https://github.com/onnx/onnx) or [TorchScript](https://pytorch.org/docs/stable/jit.html), and --> you can run the models on GPU for high performance native scoring. For full details, see [our paper](https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf).

Currently we support [these](https://github.com/microsoft/hummingbird/blob/develop/hummingbird/_supported_operators.py#L26) tree-based classifiers and regressors. These models include
[scikit-learn](https://scikit-learn.org/stable/) models such as Decision Trees and Random Forest, and also [LightGBM](https://github.com/Microsoft/LightGBM) and [XGBoost](https://github.com/dmlc/xgboost) Classifiers/Regressors.

## Installation

This was tested on Python 3.7 on a Linux machine.
```
mkdir hummingbird
cd hummingbird
git clone https://github.com/microsoft/hummingbird.git .
python setup.py install
```

## Examples

See the [notebooks](notebooks) section for examples that demonstrate use and speedups.

In general, the syntax is very similar to [skl2onnx](https://github.com/onnx/sklearn-onnx), as hummingbird started as a fork of that project.

```python
import torch
import numpy as np
import lightgbm as lgb
from hummingbird import convert_lightgbm

# Create some random data for binary classification
num_classes = 2
X = np.array(np.random.rand(100000, 28), dtype=np.float32)
y = np.random.randint(num_classes, size=100000)

# Create and train a model (LightGBM in this case)
model = lgb.LGBMClassifier()
model.fit(X, y)

# Use Hummingbird to convert the model to pytorch
pytorch_model = convert_lightgbm(model)

# Run Hummingbird on CPU
pytorch_model.to('cpu')
hb_cpu = pytorch_model(torch.from_numpy(X))

# Run Hummingbird on GPU
pytorch_model.to('cuda')
hb_gpu = pytorch_model(torch.from_numpy(X).to('cuda'))
```

# Contributing

We welcome contributions! Please see the guide on [Contributing](CONTRIBUTING.md).

Also, see our [roadmap](wiki/Roadmap-for-Upcoming-Features-and-Support) of planned features.

# Community

Join our community! *gitter badge here*

For more formal enquiries, you can [contact us](mailto:hummingbird-dev@microsoft.com).

# Authors

* Supun Nakandala
* Matteo Interlandi
* Karla Saur

# License
[MIT License](LICENSE)
Binary file added doc/html/hummingbird-logo.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 290a4bd

Please sign in to comment.