microsoft · Toudsour · Apr 24, 2023 · Apr 24, 2023
diff --git a/.gitignore b/.gitignore
diff --git a/.markdownlint.yaml b/.markdownlint.yaml
@@ -0,0 +1,8 @@
+MD013:
+  code_blocks: false
+  headers: false
+  line_length: 120
+  tables: false
+
+MD046:
+  style: fenced
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,69 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.4.0
+    hooks:
+      - id: check-ast
+      - id: check-builtin-literals
+      - id: check-docstring-first
+      - id: check-merge-conflict
+      - id: check-yaml
+      - id: check-toml
+      - id: debug-statements
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
+  - repo: https://github.com/asottile/pyupgrade
+    rev: v3.3.1
+    hooks:
+      - id: pyupgrade
+        args: ["--py37-plus"]
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.12.0
+    hooks:
+      - id: isort
+  - repo: https://github.com/psf/black
+    rev: 23.1.0
+    hooks:
+      - id: black
+        args: [--safe]
+  - repo: https://github.com/asottile/blacken-docs
+    rev: 1.13.0
+    hooks:
+      - id: blacken-docs
+        additional_dependencies: [black==23.1]
+  #  - repo: https://github.com/pre-commit/pygrep-hooks
+  #    rev: v1.10.0
+  #    hooks:
+  #      - id: rst-backticks
+  - repo: https://github.com/tox-dev/pyproject-fmt
+    rev: "0.9.2"
+    hooks:
+      - id: pyproject-fmt
+  #  - repo: https://github.com/PyCQA/flake8
+  #    rev: 6.0.0
+  #    hooks:
+  #      - id: flake8
+  #        additional_dependencies:
+  #          - flake8-bugbear==23.3.12
+  #          - flake8-comprehensions==3.11.1
+  #          - flake8-pytest-style==1.7.2
+  #          - flake8-spellcheck==0.28
+  #          - flake8-unused-arguments==0.0.13
+  #          - flake8-noqa==1.3.1
+  #          - pep8-naming==0.13.3
+  #          - flake8-pyproject==1.2.3
+  - repo: https://github.com/pre-commit/mirrors-prettier
+    rev: "v2.7.1"
+    hooks:
+      - id: prettier
+        additional_dependencies:
+          - prettier@2.7.1
+          - "@prettier/plugin-xml@2.2"
+        args: ["--print-width=120", "--prose-wrap=always"]
+  - repo: https://github.com/igorshubovych/markdownlint-cli
+    rev: v0.33.0
+    hooks:
+      - id: markdownlint
+  - repo: meta
+    hooks:
+      - id: check-hooks-apply
+      - id: check-useless-excludes
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,15 @@
+version: 2
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3"
+python:
+  install:
+    - method: pip
+      path: .
+      extra_requirements:
+        - docs
+sphinx:
+  builder: html
+  configuration: docs/conf.py
+  fail_on_warning: true
diff --git a/README.md b/README.md
@@ -1,33 +1,68 @@
-# Project
+# Batch Inference Toolkit
 
-> This repo has been populated by an initial template to help get you started. Please
-> make sure to update the content to build a great experience for community-building.
+Batch Inference Toolkit(batch-inference) is a Python package that batches model input tensors coming from multiple users dynamically, executes the model, un-batches output tensors and then returns them back to each user respectively. This will improve system throughput because of a better cache locality. The entire process is transparent to developers.
 
-As the maintainer of this project, please make a few updates:
+## Installation
 
-- Improving this README.MD file to provide a great experience
-- Updating SUPPORT.MD with content about this project's support experience
-- Understanding the security reporting process in SECURITY.MD
-- Remove this section from the README
+**Install from Pip** _(Coming Soon)_
 
-## Contributing
+```bash
+python -m pip install batch-inference --upgrade
+```
 
-This project welcomes contributions and suggestions.  Most contributions require you to agree to a
-Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
-the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+**Build and Install from Source** _(for developers)_
 
-When you submit a pull request, a CLA bot will automatically determine whether you need to provide
-a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
-provided by the bot. You will only need to do this once across all repos using our CLA.
+```bash
+git clone https://github.com/microsoft/batch-inference.git
+python -m pip install -e .[docs,testing]
 
-This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
-For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
-contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
+# if you want to format the code before commit
+pip install pre-commit
+pre-commit install
 
-## Trademarks
+# run unittests
+python -m unittest discover tests
+```
 
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
-trademarks or logos is subject to and must follow 
-[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
-Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
-Any use of third-party trademarks or logos are subject to those third-party's policies.
+## Example
+
+```python
+import threading
+import numpy as np
+from batch_inference import batching
+
+
+@batching(max_batch_size=32)
+class MyModel:
+    def __init__(self, k, n):
+        self.weights = np.random.randn((k, n)).astype("f")
+
+    # x: [batch_size, m, k], self.weights: [k, n]
+    def predict_batch(self, x):
+        y = np.matmul(x, self.weights)
+        return y
+
+
+with MyModel.host(3, 3) as host:
+    def send_requests():
+        for _ in range(0, 10):
+            x = np.random.randn(1, 3, 3).astype("f")
+            y = host.predict(x)
+
+    threads = [threading.Thread(target=send_requests) for i in range(0, 32)]
+    [th.start() for th in threads]
+    [th.join() for th in threads]
+
+```
+
+## Build the Docs
+
+Run the following commands and open `docs/_build/html/index.html` in browser.
+
+```bash
+pip install sphinx myst-parser sphinx-rtd-theme sphinxemoji
+cd docs/
+
+make html         # for linux
+.\make.bat html   # for windows
+```
diff --git a/SUPPORT.md b/SUPPORT.md
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/README.rst b/docs/README.rst
@@ -0,0 +1,78 @@
+=============================
+Batch Inference Toolkit
+=============================
+
+Batch Inference Toolkit(batch-inference) is a Python package that batches model input tensors coming from multiple users dynamically, executes the model, un-batches output tensors and then returns them back to each user respectively. This will improve system throughput because of a better cache locality. The entire process is transparent to developers.
+
+.. figure:: figures/batching_overview.png
+  :width: 500
+  :align: center
+  :alt: How Batching Inference Works
+
+Installation
+============================
+
+**Install from Pip** *(Coming Soon)*
+
+.. code:: bash
+
+    python -m pip install batch-inference --upgrade
+
+**Build and Install from Source**
+
+.. code:: bash
+
+    git clone https://github.com/microsoft/batch-inference.git
+    cd python
+    python -m pip install -e .
+
+    # if you want to format the code before commit
+    pip install pre-commit
+    pre-commit install
+
+    # run unittests
+    python -m unittest discover tests
+
+Example
+============================
+
+.. code:: python
+
+    import threading
+    import numpy as np
+    from batch_inference import batching
+
+
+    @batching(max_batch_size=32)
+    class MyModel:
+        def __init__(self, k, n):
+            self.weights = np.random.randn((k, n)).astype("f")
+
+        # x: [batch_size, m, k], self.weights: [k, n]
+        def predict_batch(self, x):
+            y = np.matmul(x, self.weights)
+            return y
+
+
+    with MyModel.host(3, 3) as host:
+        def send_requests():
+            for _ in range(0, 10):
+                x = np.random.randn(1, 3, 3).astype("f")
+                y = host.predict(x)
+
+        threads = [threading.Thread(target=send_requests) for i in range(0, 32)]
+        [th.start() for th in threads]
+        [th.join() for th in threads]
+
+Build the Docs
+=============================
+
+Run the following commands and open ``docs/_build/html/index.html`` in browser.
+
+.. code:: bash
+
+    pip install sphinx myst-parser sphinx-rtd-theme sphinxemoji
+    cd docs/
+
+    make html         # for linux
+    .\make.bat html   # for windows
diff --git a/docs/batcher/bucket_seq_batcher.rst b/docs/batcher/bucket_seq_batcher.rst
@@ -0,0 +1,10 @@
+==========================
+Bucket Sequence Batcher
+==========================
+
+Similar to `SeqBatcher`, `BucketSeqBatcher` provides batching support for sequence inputs with variant lengths. The difference is that it will group sequences with similar lengths into the same batch, instead of having all input sequences into a single batch, to reduce the padding cost. This is useful for sequences with significantly different lengths, where some of the sequences are short, but the others are very long.
+
+The following example defines 4 buckets to accommodate sequence of different lenghts: `<=1024`, `(1024, 2048]`, `(2048, 4096]` and `>4096`. The `BucketSeqBatcher` will sort input sequences by lengths, put them in corresponding buckets and then batch sequences within the same bucket. For example, if the sequence lenght is 2000, the `BucketSeqBatcher` will put it into the 2nd bucket. It won't be batched with a sequence of length 500, which is in the 1st bucket.
+
+.. literalinclude:: ./bucket_seq_batcher_example.py
+    :language: python
diff --git a/docs/batcher/bucket_seq_batcher_example.py b/docs/batcher/bucket_seq_batcher_example.py
@@ -0,0 +1,22 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+from batch_inference import ModelHost
+from batch_inference.batcher import BucketSeqBatcher
+
+
+class MyModel:
+    def __init__(self):
+        pass
+
+    # input: [batch_size, n]
+    def predict_batch(self, seq):
+        res = seq
+        return res
+
+
+model_host = ModelHost(
+    MyModel,
+    batcher=BucketSeqBatcher(padding_tokens=[0, 0], buckets=[1024, 2048, 4096]),
+    max_batch_size=32,
+)()
diff --git a/docs/batcher/concat_batcher.rst b/docs/batcher/concat_batcher.rst
@@ -0,0 +1,8 @@
+==========================
+Concatenate Batcher
+==========================
+
+The `ConcatBatcher` simply concatenates input numpy arrays into larger ones. It requires the input arrays to have campatible shapes. No padding will be performed.
+
+.. literalinclude:: ./concat_batcher_example.py
+    :language: python
diff --git a/docs/batcher/concat_batcher_example.py b/docs/batcher/concat_batcher_example.py
@@ -0,0 +1,24 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import numpy as np
+
+from batch_inference import ModelHost
+from batch_inference.batcher import ConcatBatcher
+
+
+class MyModel:
+    def __init__(self):
+        self.op = np.matmul
+
+    # x.shape: [batch_size, m, k], y.shape: [batch_size, k, n]
+    def predict_batch(self, x, y):
+        res = self.op(x, y)
+        return res
+
+
+model_host = ModelHost(
+    MyModel,
+    batcher=ConcatBatcher(),
+    max_batch_size=32,
+)()