EMDL

Embedded and mobile deep learning research notes

Docs

Paper

General

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv '17, Megvii]
DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys '17]
DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys '17]
MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL '17]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv '17, Google]
DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys '16]
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN '16]
EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA '16]
MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys '16]
DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE '16]
Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM '16]

Quantization

The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
Loss-aware Binarization of Deep Networks [ICLR'17]
Towards the Limit of Network Quantization [ICLR'17]
Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]

Pruning

Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
Pruning Filters for Efficient ConvNets [ICLR'17]
Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
Soft Weight-Sharing for Neural Network Compression [ICLR'17]
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
Dynamic Network Surgery for Efficient DNNs [NIPS'16]
Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]

Low Rank Approximation

Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
Convolutional neural networks with low-rank regularization [arXiv'15]
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]

Guide

Squeezing Deep Learning Into Mobile Phones
Deep Learning – Tutorial and Recent Trends
Efficient Convolutional Neural Network Inference on Mobile GPUs
Deep learning systems, UW course schedule(focused on systems design, not learning)

Code

General

ARM-software/ComputeLibrary: The ARM Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies, Intro
Apple CoreML
Tencent/ncnn: ncnn is a high-performance neural network inference framework optimized for the mobile platform
Microsoft Embedded Learning Library

OpenCL, Vulkan, RenderScript

SaschaWillems/Vulkan: Examples and demos for the new Vulkan API
ARM-software/vulkan-sdk: ARM Vulkan SDK
alexhultman/libvc: Vulkan Compute for C++ (experimentation project)
Deep Learning in a Single File for Smart Devices — mxnet
TensorFlow Android Camera Demo
bwasti/AICamera: Demonstration of using Caffe2 inside an Android application.
mtmd/Mobile_ConvNet: RenderScript based implementation of Convolutional Neural Networks for Android phones
harvardnlp/nmt-android: Neural Machine Translation on Android
hollance/TensorFlow-iOS-Example: Source code for my blog post "Getting started with TensorFlow on iOS"

Tutorial

ARM® Mali™ GPU OpenCL Developer Guide, pdf
Optimal Compute on ARM MaliTM GPUs
GPU Compute for Mobile Devices
Compute for Mobile Devices Performance focused
Hands On OpenCL
Adreno OpenCL Programming Guide
Better OpenCL Performance on Qualcomm Adreno GPU

Others

mil-tokyo/webdnn: Fastest DNN Execution Framework on Web Browser

Hardware

GPU

Bifrost GPU architecture and ARM Mali-G71 GPU
Midgard GPU Architecture, ARM Mali-T880 GPU
Mobile GPU market share

Driver

[Adreno] csarron/qcom_vendor_binaries: Common Proprietary Qualcomm Binaries
[Mali] Fevax/vendor_samsung_hero2ltexx: Blobs from s7 Edge G935F

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EMDL

Docs

Paper

General

Quantization

Pruning

Low Rank Approximation

Guide

Code

General

OpenCL, Vulkan, RenderScript

Tutorial

Others

Hardware

GPU

Driver

Files

README.md

Latest commit

History

README.md

File metadata and controls

EMDL

Docs

Paper

General

Quantization

Pruning

Low Rank Approximation

Guide

Code

General

OpenCL, Vulkan, RenderScript

Tutorial

Others

Hardware

GPU

Driver