Quantized Conv2D layer implementation

Final exam project for CS492 Systems for Machine Learning course, Spring 2020, KAIST

About

This is an implementation of convolution operation using following optimization strategies:

Naive quantization. Linear quantization using uniform scale value.
CPU parallelization. Parallelization using multithreading (pthread) and AVX instructions.
GPU parallelization. Parallelization using CUDA.

data
- group1 and gruop2:
  - Contains 3 different input and kernel tensors
tools
- im2col and col2im implementations in C for cpu and CUDA for GPU
- matrix multiplication implementations in CUDA for GPU
src
- ./src/conv2d.c: Naive implementation of convolution in C language.
- ./src/conv2d_quantized.c: Quantizing the naive implemetation of convolution using lower precisions.
- ./src/conv2d_avx.c: Applying CPU vectorization using AVX instructions and Pthreads.
- ./src/conv2d_cuda.c: GPU vectorization using CUDA.
report.pdf
- Includes details and analysis of implemented algorithms and quantization errors

Usage

Input / Output file format

conv2d_* programs take 2 binary files as input.

input tensor. First 16 bytes are for (N, H, W, IC), where N is the batch size, H is the height, W is the width, and IC is the channel.
kernel tensor. First 16 bytes are for (KH, KW, OC, IC), where KH is the kernel height, KW is the kernel width, OC is the output channel, and IC is the input channel.

They produce 1 binary file as output.

output tensor. First 16 bytes are for (N, H, W, OC), where N is the batch size, H is the height, W is the width, and OC is the channel.

For all binary files, following bytes after first 16 bytes are the real tensor data, which follows the memory order corresponding to the dimension rule written above.

Running guide

At src/ directory,

$ make

conv_vanilla

$ ./conv_vanila $(INPUT_BIN_PATH) $(OUTPUT_BIN_PATH)

conv_cpu

$ ./conv_cpu $(INPUT_BIN_PATH) $(OUTPUT_BIN_PATH) [8,16,32]

Third argument is mandatory and indicates the level of precision.

conv_avx

$ ./conv_avx $(INPUT_BIN_PATH) $(OUTPUT_BIN_PATH) [FP32/INT32/INT16]

Third argument is mandatory. For FP32, no quantization is applied. For INT*, quantization using integer of corresponding number of bits is applied.

conv_gpu

$ ./conv_gpu $(INPUT_BIN_PATH) $(OUTPUT_BIN_PATH) [FP32/INT32/INT16]

Same as conv_avx

Output will be Normalized Mean Square Error obtained from quantization operation and output_tensor.bin

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
src		src
tools		tools
.gitignore		.gitignore
CS492_Final_Report.pdf		CS492_Final_Report.pdf
README.md		README.md
check_script.py		check_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantized Conv2D layer implementation

About

Table of Contents:

Usage

Input / Output file format

Running guide

About

Releases

Packages

Languages

miraliahmadli/Conv2D

Folders and files

Latest commit

History

Repository files navigation

Quantized Conv2D layer implementation

About

Table of Contents:

Usage

Input / Output file format

Running guide

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages