Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
452 changes: 124 additions & 328 deletions .gitignore

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
MD013:
code_blocks: false
headers: false
line_length: 120
tables: false

MD046:
style: fenced
69 changes: 69 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-ast
- id: check-builtin-literals
- id: check-docstring-first
- id: check-merge-conflict
- id: check-yaml
- id: check-toml
- id: debug-statements
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/asottile/pyupgrade
rev: v3.3.1
hooks:
- id: pyupgrade
args: ["--py37-plus"]
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
args: [--safe]
- repo: https://github.com/asottile/blacken-docs
rev: 1.13.0
hooks:
- id: blacken-docs
additional_dependencies: [black==23.1]
# - repo: https://github.com/pre-commit/pygrep-hooks
# rev: v1.10.0
# hooks:
# - id: rst-backticks
- repo: https://github.com/tox-dev/pyproject-fmt
rev: "0.9.2"
hooks:
- id: pyproject-fmt
# - repo: https://github.com/PyCQA/flake8
# rev: 6.0.0
# hooks:
# - id: flake8
# additional_dependencies:
# - flake8-bugbear==23.3.12
# - flake8-comprehensions==3.11.1
# - flake8-pytest-style==1.7.2
# - flake8-spellcheck==0.28
# - flake8-unused-arguments==0.0.13
# - flake8-noqa==1.3.1
# - pep8-naming==0.13.3
# - flake8-pyproject==1.2.3
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v2.7.1"
hooks:
- id: prettier
additional_dependencies:
- prettier@2.7.1
- "@prettier/plugin-xml@2.2"
args: ["--print-width=120", "--prose-wrap=always"]
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.33.0
hooks:
- id: markdownlint
- repo: meta
hooks:
- id: check-hooks-apply
- id: check-useless-excludes
15 changes: 15 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3"
python:
install:
- method: pip
path: .
extra_requirements:
- docs
sphinx:
builder: html
configuration: docs/conf.py
fail_on_warning: true
83 changes: 59 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,68 @@
# Project
# Batch Inference Toolkit

> This repo has been populated by an initial template to help get you started. Please
> make sure to update the content to build a great experience for community-building.
Batch Inference Toolkit(batch-inference) is a Python package that batches model input tensors coming from multiple users dynamically, executes the model, un-batches output tensors and then returns them back to each user respectively. This will improve system throughput because of a better cache locality. The entire process is transparent to developers.

As the maintainer of this project, please make a few updates:
## Installation

- Improving this README.MD file to provide a great experience
- Updating SUPPORT.MD with content about this project's support experience
- Understanding the security reporting process in SECURITY.MD
- Remove this section from the README
**Install from Pip** _(Coming Soon)_

## Contributing
```bash
python -m pip install batch-inference --upgrade
```

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
**Build and Install from Source** _(for developers)_

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
```bash
git clone https://github.com/microsoft/batch-inference.git
python -m pip install -e .[docs,testing]

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
# if you want to format the code before commit
pip install pre-commit
pre-commit install

## Trademarks
# run unittests
python -m unittest discover tests
```

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
## Example

```python
import threading
import numpy as np
from batch_inference import batching


@batching(max_batch_size=32)
class MyModel:
def __init__(self, k, n):
self.weights = np.random.randn((k, n)).astype("f")

# x: [batch_size, m, k], self.weights: [k, n]
def predict_batch(self, x):
y = np.matmul(x, self.weights)
return y


with MyModel.host(3, 3) as host:
def send_requests():
for _ in range(0, 10):
x = np.random.randn(1, 3, 3).astype("f")
y = host.predict(x)

threads = [threading.Thread(target=send_requests) for i in range(0, 32)]
[th.start() for th in threads]
[th.join() for th in threads]

```

## Build the Docs

Run the following commands and open `docs/_build/html/index.html` in browser.

```bash
pip install sphinx myst-parser sphinx-rtd-theme sphinxemoji
cd docs/

make html # for linux
.\make.bat html # for windows
```
25 changes: 0 additions & 25 deletions SUPPORT.md

This file was deleted.

20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
78 changes: 78 additions & 0 deletions docs/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
=============================
Batch Inference Toolkit
=============================

Batch Inference Toolkit(batch-inference) is a Python package that batches model input tensors coming from multiple users dynamically, executes the model, un-batches output tensors and then returns them back to each user respectively. This will improve system throughput because of a better cache locality. The entire process is transparent to developers.

.. figure:: figures/batching_overview.png
:width: 500
:align: center
:alt: How Batching Inference Works

Installation
============================

**Install from Pip** *(Coming Soon)*

.. code:: bash

python -m pip install batch-inference --upgrade

**Build and Install from Source**

.. code:: bash

git clone https://github.com/microsoft/batch-inference.git
cd python
python -m pip install -e .

# if you want to format the code before commit
pip install pre-commit
pre-commit install

# run unittests
python -m unittest discover tests

Example
============================

.. code:: python

import threading
import numpy as np
from batch_inference import batching


@batching(max_batch_size=32)
class MyModel:
def __init__(self, k, n):
self.weights = np.random.randn((k, n)).astype("f")

# x: [batch_size, m, k], self.weights: [k, n]
def predict_batch(self, x):
y = np.matmul(x, self.weights)
return y


with MyModel.host(3, 3) as host:
def send_requests():
for _ in range(0, 10):
x = np.random.randn(1, 3, 3).astype("f")
y = host.predict(x)

threads = [threading.Thread(target=send_requests) for i in range(0, 32)]
[th.start() for th in threads]
[th.join() for th in threads]

Build the Docs
=============================

Run the following commands and open ``docs/_build/html/index.html`` in browser.

.. code:: bash

pip install sphinx myst-parser sphinx-rtd-theme sphinxemoji
cd docs/

make html # for linux
.\make.bat html # for windows
10 changes: 10 additions & 0 deletions docs/batcher/bucket_seq_batcher.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==========================
Bucket Sequence Batcher
==========================

Similar to `SeqBatcher`, `BucketSeqBatcher` provides batching support for sequence inputs with variant lengths. The difference is that it will group sequences with similar lengths into the same batch, instead of having all input sequences into a single batch, to reduce the padding cost. This is useful for sequences with significantly different lengths, where some of the sequences are short, but the others are very long.

The following example defines 4 buckets to accommodate sequence of different lenghts: `<=1024`, `(1024, 2048]`, `(2048, 4096]` and `>4096`. The `BucketSeqBatcher` will sort input sequences by lengths, put them in corresponding buckets and then batch sequences within the same bucket. For example, if the sequence lenght is 2000, the `BucketSeqBatcher` will put it into the 2nd bucket. It won't be batched with a sequence of length 500, which is in the 1st bucket.

.. literalinclude:: ./bucket_seq_batcher_example.py
:language: python
22 changes: 22 additions & 0 deletions docs/batcher/bucket_seq_batcher_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

from batch_inference import ModelHost
from batch_inference.batcher import BucketSeqBatcher


class MyModel:
def __init__(self):
pass

# input: [batch_size, n]
def predict_batch(self, seq):
res = seq
return res


model_host = ModelHost(
MyModel,
batcher=BucketSeqBatcher(padding_tokens=[0, 0], buckets=[1024, 2048, 4096]),
max_batch_size=32,
)()
8 changes: 8 additions & 0 deletions docs/batcher/concat_batcher.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
==========================
Concatenate Batcher
==========================

The `ConcatBatcher` simply concatenates input numpy arrays into larger ones. It requires the input arrays to have campatible shapes. No padding will be performed.

.. literalinclude:: ./concat_batcher_example.py
:language: python
24 changes: 24 additions & 0 deletions docs/batcher/concat_batcher_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import numpy as np

from batch_inference import ModelHost
from batch_inference.batcher import ConcatBatcher


class MyModel:
def __init__(self):
self.op = np.matmul

# x.shape: [batch_size, m, k], y.shape: [batch_size, k, n]
def predict_batch(self, x, y):
res = self.op(x, y)
return res


model_host = ModelHost(
MyModel,
batcher=ConcatBatcher(),
max_batch_size=32,
)()
Loading