Skip to content
Permalink
Browse files

Feature: Automatic quantization using SeeDot (#88)

* Added SeeDot compiler

* Adding antlr jar file

* Added Predictor project

* Tested on cr-binary

* Updating .proj file

* Updating .gitignore

* Adding Streamer project

* Added code to read profile file

* Updated .pyproj file

* Fixed issue in visitBop2

* Fixed HLS codegen

* Refactored Predictor project

* Deleted files

* Few more deletionS

* Updated gitignore

* Added 2 line copyright prefix

* Updated SeeDot.Antlr and SeeDot.AST

* More changes to the structure of the repo

* Small fix

* More changes

* Minor changes'

* Minor

* More changes

* Minor

* Minor

* Major changes in IRBuilder.py

* Few more

* More changes

* More changes

* First iteration done. Tested on previous commit

* Minor

* Minor change in library.h

* Codegen change

* Minor

* Changes to msbuild path

* Supporting libsvm format

* Minor

* Updated the Predictor project to remove dirs

* Removed windows.h in Predictor

* Added Makefile

* Updated scripts to work with Makefile

* Added support to move intermediate files to scratch directory

* Dir restructuring

* Minor

* PEP8 formatting

* Minor

* Minor 2

* Removed Lenet

* Temp removing files

* New naming convention

* Removed irGen

* Tested on bonsai

* Removed Windows-specific files

* Removed HLS and verilog
:

* Removed workers parameter

* More comments

* Comments and cleaning

* Moved SeeDot

* Updated edgeml refs

* removed examples/seedot

* Deleting init.py

* Updated dir structure

* Rename

* Minor

* Removing requirements

* Updated README. Expects class IDs from 0

* Added support for zero index labels

* Updated README

* Minor

* Readme update:

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Minor change for Bonsai

* Update README.md

* Directory re-org

* Adding pre-loaded models

* Update README.md

* Update README.md

* Minor

* Update README.md

* Update README.md

* Create README.md

* Update README.md

* Update README.md

* Updated python package requirements
  • Loading branch information...
sridhargopinath authored and harsha-simhadri committed May 28, 2019
1 parent 6b19e4f commit 84bbb4628ff0590acfefc5939c377b880a24aa69
Showing with 10,159 additions and 0 deletions.
  1. +132 −0 Tools/SeeDot/README.md
  2. +94 −0 Tools/SeeDot/SeeDot.py
  3. +44 −0 Tools/SeeDot/seedot/Predictor/Makefile
  4. +143 −0 Tools/SeeDot/seedot/Predictor/bonsai_float.cpp
  5. +76 −0 Tools/SeeDot/seedot/Predictor/bonsai_float_model.h
  6. +18 −0 Tools/SeeDot/seedot/Predictor/datatypes.h
  7. +540 −0 Tools/SeeDot/seedot/Predictor/library.cpp
  8. +40 −0 Tools/SeeDot/seedot/Predictor/library.h
  9. +218 −0 Tools/SeeDot/seedot/Predictor/main.cpp
  10. +10 −0 Tools/SeeDot/seedot/Predictor/predictors.h
  11. +50 −0 Tools/SeeDot/seedot/Predictor/profile.cpp
  12. +13 −0 Tools/SeeDot/seedot/Predictor/profile.h
  13. +137 −0 Tools/SeeDot/seedot/Predictor/protonn_float.cpp
  14. +47 −0 Tools/SeeDot/seedot/Predictor/protonn_float_model.h
  15. +161 −0 Tools/SeeDot/seedot/Predictor/seedot_fixed.cpp
  16. +68 −0 Tools/SeeDot/seedot/Predictor/seedot_fixed_model.h
  17. +1 −0 Tools/SeeDot/seedot/arduino/README.md
  18. +163 −0 Tools/SeeDot/seedot/arduino/arduino.ino
  19. +36 −0 Tools/SeeDot/seedot/arduino/config.h
  20. +142 −0 Tools/SeeDot/seedot/arduino/floating-point/bonsai_float.cpp
  21. +75 −0 Tools/SeeDot/seedot/arduino/floating-point/protonn_float.cpp
  22. +542 −0 Tools/SeeDot/seedot/arduino/library.h
  23. +138 −0 Tools/SeeDot/seedot/arduino/model.h
  24. +91 −0 Tools/SeeDot/seedot/arduino/predict.cpp
  25. +8 −0 Tools/SeeDot/seedot/arduino/predict.h
  26. +48 −0 Tools/SeeDot/seedot/common.py
  27. +2 −0 Tools/SeeDot/seedot/compiler/__init__.py
  28. +2 −0 Tools/SeeDot/seedot/compiler/antlr/__init__.py
  29. +91 −0 Tools/SeeDot/seedot/compiler/antlr/seedot.g4
  30. +61 −0 Tools/SeeDot/seedot/compiler/antlr/seedot.tokens
  31. +196 −0 Tools/SeeDot/seedot/compiler/antlr/seedotLexer.py
  32. +61 −0 Tools/SeeDot/seedot/compiler/antlr/seedotLexer.tokens
  33. +1,065 −0 Tools/SeeDot/seedot/compiler/antlr/seedotParser.py
  34. +118 −0 Tools/SeeDot/seedot/compiler/antlr/seedotVisitor.py
  35. +2 −0 Tools/SeeDot/seedot/compiler/ast/__init__.py
  36. +146 −0 Tools/SeeDot/seedot/compiler/ast/ast.py
  37. +109 −0 Tools/SeeDot/seedot/compiler/ast/astBuilder.py
  38. +47 −0 Tools/SeeDot/seedot/compiler/ast/astVisitor.py
  39. +104 −0 Tools/SeeDot/seedot/compiler/ast/printAST.py
  40. +2 −0 Tools/SeeDot/seedot/compiler/codegen/__init__.py
  41. +193 −0 Tools/SeeDot/seedot/compiler/codegen/arduino.py
  42. +293 −0 Tools/SeeDot/seedot/compiler/codegen/codegenBase.py
  43. +103 −0 Tools/SeeDot/seedot/compiler/codegen/x86.py
  44. +83 −0 Tools/SeeDot/seedot/compiler/compiler.py
  45. +2 −0 Tools/SeeDot/seedot/compiler/converter/__init__.py
  46. +396 −0 Tools/SeeDot/seedot/compiler/converter/bonsai.py
  47. +58 −0 Tools/SeeDot/seedot/compiler/converter/converter.py
  48. +380 −0 Tools/SeeDot/seedot/compiler/converter/protonn.py
  49. +553 −0 Tools/SeeDot/seedot/compiler/converter/util.py
  50. +2 −0 Tools/SeeDot/seedot/compiler/ir/__init__.py
  51. +324 −0 Tools/SeeDot/seedot/compiler/ir/ir.py
  52. +1,571 −0 Tools/SeeDot/seedot/compiler/ir/irBuilder.py
  53. +180 −0 Tools/SeeDot/seedot/compiler/ir/irUtil.py
  54. +329 −0 Tools/SeeDot/seedot/compiler/type.py
  55. +404 −0 Tools/SeeDot/seedot/main.py
  56. +130 −0 Tools/SeeDot/seedot/predictor.py
  57. +92 −0 Tools/SeeDot/seedot/util.py
  58. +23 −0 Tools/SeeDot/seedot/writer.py
  59. +1 −0 tf/requirements-cpu.txt
  60. +1 −0 tf/requirements-gpu.txt
@@ -0,0 +1,132 @@
# SeeDot

SeeDot is an automatic quantization tool that generates efficient machine learning (ML) inference code for IoT devices.

### **Overview**

ML models are usually expressed in floating-point, and IoT devices typically lack hardware support for floating-point arithmetic. Hence, running such ML models on IoT devices involves simulating floating-point arithmetic in software, which is very inefficient. SeeDot addresses this issue by generating fixed-point code with only integer operations. To enable this, SeeDot takes as input trained floating-point models (like [Bonsai](https://github.com/microsoft/EdgeML/blob/master/docs/publications/Bonsai.pdf) or [ProtoNN](https://github.com/microsoft/EdgeML/blob/master/docs/publications/ProtoNN.pdf)) and generates efficient fixed-point code that can run on microcontrollers. The SeeDot compiler uses novel compilation techniques like automatically inferring certain parameters used in the fixed-point code, optimized exponentiation computation, etc. With these techniques, the generated fixed-point code has comparable classification accuracy and performs significantly faster than the floating-point code.

To know more about SeeDot, please refer to our publication [here](https://www.microsoft.com/en-us/research/publication/compiling-kb-sized-machine-learning-models-to-constrained-hardware/).

This document describes the tool usage with an example.

### **Software requirements**

1. [**Python 3**](https://www.python.org/) with following packages:
- **[Antrl4](http://www.antlr.org/)** (antlr4-python3-runtime; tested with version 4.7.2)
- **[Numpy](http://www.numpy.org/)** (tested with version 1.16.2)
- **[Scikit-learn](https://scikit-learn.org/)** (tested with version 0.20.3)
2. Linux packages:
- **[gcc](https://www.gnu.org/software/gcc/)** (tested with version 7.3.0)
- **[make](https://www.gnu.org/software/make/)** (tested with version 4.1)

### **Usage**

SeeDot can be invoked using **`SeeDot.py`** file. The arguments for the script are supplied as follows:

```
usage: SeeDot.py [-h] [-a] --train --test --model [--tempdir] [-o]
optional arguments:
-h, --help show this help message and exit
-a , --algo Algorithm to run ('bonsai' or 'protonn')
--train Training set file
--test Testing set file
--model Directory containing trained model (output from
Bonsai/ProtoNN trainer)
--tempdir Scratch directory for intermediate files
-o , --outdir Directory to output the generated Arduino sketch
```

An example invocation is as follows:
```
python SeeDot.py -a bonsai --train path/to/train.npy --test path/to/test.npy --model path/to/Bonsai/model
```

SeeDot expects the `train` and the `test` data files in a specific format. Each data file should be of the shape `[numberOfDataPoints, numberOfFeatures + 1]`, where the class label is in the first column. The tool currently support the following file formats for the data files: numpy arrays (.npy), tab-separated values (.tsv), comma-separated values (.csv), and libsvm (.txt).

The path to the trained Bonsai/ProtoNN model is specified in the `--model` argument. After training, the learned parameters are stored in this directory in a specific format. For Bonsai, the learned parameters are `Z`, `W`, `V`, `T`, `Sigma`, `Mean`, and `Std`. For ProtoNN, the learned parameters are `W`, `B`, and `Z`. These parameters can be either numpy arrays (.npy) or plaintext files.

The `tempdir` directory is used to store the intermediate files generated by the compiler. The device-specific fixed-point code is stored in the `outdir` directory.


## Getting started: Quantizing ProtoNN on usps10

To help get started with SeeDot, we provide 1) a pre-loaded fixed-point model, and 2) instructions to generate fixed-point code for the ProtoNN predictor on the **[usps10](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/)** dataset. The process for generating fixed-point code for the Bonsai predictor is similar.

### Pre-loaded model

To make it easy to test the SeeDot-generated code, a ready-to-upload Arduino sketch is provided that can be run on an Arduino device without any changes. The sketch is located at `Tools/SeeDot/seedot/arduino/arduino.ino` and contains pre-loaded ProtoNN model on the usps10 dataset. To upload the sketch to the device, skip steps 1-3 in the below guide and follow the [step 4: Prediction on the device](https://github.com/microsoft/EdgeML/tree/Feature/SeeDot/Tools/SeeDot#step-4-prediction-on-the-device).

### Generating fixed-point code

This process consists of four steps: 1) installing EdgeML TensorFlow library, 2) training ProtoNN on usps10, 3) quantizing the trained model with SeeDot, and 4) performing prediction on the device.

#### **Step 1: Installing EdgeML TensorFlow library**

1. Clone the EdgeML repository and navigate to the right directory.
```
git clone https://github.com/Microsoft/EdgeML
cd EdgeML/tf/
```

2. Install the EdgeML library.
```
pip install -r requirements-cpu.txt
pip install -e .
```

#### **Step 2: Training ProtoNN on usps10**

1. Navigate to the ProtoNN examples directory.
```
cd examples/ProtoNN
```

2. Fetch usps10 data and create output directory.
```
python fetch_usps.py
python process_usps.py
mkdir usps10/output
```

3. Invoke ProtoNN trainer using the following command.
```
python protoNN_example.py --data-dir ./usps10 --projection-dim 25 --num-prototypes 55 --epochs 100 -sW 0.3 -o usps10/output
```
This would give around 90.035% classification accuracy. The trained model is stored in the `output` directory.

More information on using the ProtoNN trainer can be found [here](https://github.com/Microsoft/EdgeML/tree/master/tf/examples/ProtoNN).

#### **Step 3: Quantizing with SeeDot**

1. Navigate to the SeeDot directory and create the output directory.
```
cd ../../../Tools/SeeDot
mkdir arduino
```

2. Invoke SeeDot using the following command.
```
python SeeDot.py -a protonn --train ../../tf/examples/ProtoNN/usps10/train.npy --test ../../tf/examples/ProtoNN/usps10/test.npy --model ../../tf/examples/ProtoNN/usps10/output -o arduino
```

The SeeDot-generated code would give around 89.985% classification accuracy. The difference in classification accuracy is 0.05% compared to the floating-point code. The generated code is stored in the `arduino` folder which contains the sketch along with two files: model.h and predict.cpp. `model.h` contains the quantized model and `predict.cpp` contains the inference code.

#### **Step 4: Prediction on the device**

Follow the below steps to perform prediction on the device, where the SeeDot-generated code is run on a single data-point stored on the device's flash memory.

1. Open the Arduino sketch file located at `arduino/arduino.ino` in the [Arduino IDE](https://www.arduino.cc/en/main/software).
2. Connect the Arduino microcontroller to the computer and choose the correct board configuration.
3. Upload the sketch to the device.
4. Open the Serial Monitor and select baud rate specified in the sketch (default is 115200) to monitor the output.
5. The average prediction time is computed every 100 iterations. On an Arduino Uno, the average prediction time is 35991 micro seconds.

More device-specific details on extending the Arduino sketch for other use cases can be found in [`arduino/README.md`](https://github.com/microsoft/EdgeML/blob/Feature/SeeDot/Tools/SeeDot/seedot/arduino/README.md).


The above workflow has been tested on Arduino Uno and Arduino MKR1000. It is expected to work on other Arduino devices as well.


Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT license.
@@ -0,0 +1,94 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

import argparse
import datetime
from distutils.dir_util import copy_tree
import os
import shutil
import operator
import tempfile
import traceback

import seedot.common as Common
from seedot.main import Main
import seedot.util as Util


class MainDriver:

def parseArgs(self):
parser = argparse.ArgumentParser()

parser.add_argument("-a", "--algo", choices=Common.Algo.All,
metavar='', help="Algorithm to run ('bonsai' or 'protonn')")
parser.add_argument("--train", required=True,
metavar='', help="Training set file")
parser.add_argument("--test", required=True,
metavar='', help="Testing set file")
parser.add_argument("--model", required=True, metavar='',
help="Directory containing trained model (output from Bonsai/ProtoNN trainer)")
#parser.add_argument("-v", "--version", default=Common.Version.Fixed, choices=Common.Version.All, metavar='',
# help="Datatype of the generated code (fixed-point or floating-point)")
parser.add_argument("--tempdir", metavar='',
help="Scratch directory for intermediate files")
parser.add_argument("-o", "--outdir", metavar='',
help="Directory to output the generated Arduino sketch")

self.args = parser.parse_args()

# Verify the input files and directory exists
assert os.path.isfile(self.args.train), "Training set doesn't exist"
assert os.path.isfile(self.args.test), "Testing set doesn't exist"
assert os.path.isdir(self.args.model), "Model directory doesn't exist"

if self.args.tempdir is not None:
assert os.path.isdir(
self.args.tempdir), "Scratch directory doesn't exist"
Common.tempdir = self.args.tempdir
else:
Common.tempdir = os.path.join(tempfile.gettempdir(
), "SeeDot", datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
os.makedirs(Common.tempdir, exist_ok=True)

if self.args.outdir is not None:
assert os.path.isdir(
self.args.outdir), "Output directory doesn't exist"
Common.outdir = self.args.outdir
else:
Common.outdir = os.path.join(Common.tempdir, "arduino")
os.makedirs(Common.outdir, exist_ok=True)

def checkMSBuildPath(self):
found = False
for path in Common.msbuildPathOptions:
if os.path.isfile(path):
found = True
Common.msbuildPath = path

if not found:
raise Exception("Msbuild.exe not found at the following locations:\n%s\nPlease change the path and run again" % (
Common.msbuildPathOptions))

def run(self):
if Util.windows():
self.checkMSBuildPath()

algo, version, trainingInput, testingInput, modelDir = self.args.algo, Common.Version.Fixed, self.args.train, self.args.test, self.args.model

print("\n================================")
print("Executing on %s for Arduino" % (algo))
print("--------------------------------")
print("Train file: %s" % (trainingInput))
print("Test file: %s" % (testingInput))
print("Model directory: %s" % (modelDir))
print("================================\n")

obj = Main(algo, version, Common.Target.Arduino,
trainingInput, testingInput, modelDir, None)
obj.run()

if __name__ == "__main__":
obj = MainDriver()
obj.parseArgs()
obj.run()
@@ -0,0 +1,44 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

CC=g++
CFLAGS= -Wall -p -g -fPIC -O3 -std=c++11

PREDICTOR_INCLUDES = bonsai_float_model.h \
datatypes.h \
library.h predictors.h \
profile.h \
protonn_float_model.h \
seedot_fixed_model.h

PREDICTOR_OBJS = bonsai_float.o library.o \
main.o profile.o \
protonn_float.o \
seedot_fixed.o

all: Predictor

Predictor: bonsai_float.o library.o main.o profile.o protonn_float.o seedot_fixed.o
$(CC) -o $@ $^ $(CFLAGS)

bonsai_float.o: bonsai_float.cpp $(PREDICTOR_INCLUDES)
$(CC) -c -o $@ $(CFLAGS) $<

library.o: library.cpp $(PREDICTOR_INCLUDES)
$(CC) -c -o $@ $(CFLAGS) $<

main.o: main.cpp $(PREDICTOR_INCLUDES)
$(CC) -c -o $@ $(CFLAGS) $<

profile.o: profile.cpp $(PREDICTOR_INCLUDES)
$(CC) -c -o $@ $(CFLAGS) $<

protonn_float.o: protonn_float.cpp $(PREDICTOR_INCLUDES)
$(CC) -c -o $@ $(CFLAGS) $<

seedot_fixed.o: seedot_fixed.cpp $(PREDICTOR_INCLUDES)
$(CC) -c -o $@ $(CFLAGS) $<

clean:
rm -f *.o
rm -f Predictor

0 comments on commit 84bbb46

Please sign in to comment.
You can’t perform that action at this time.