Hyperdimensional Computing Library

Prerequisites

python3.6
numpy
pytorch4.0

The packages can be installed easily with conda and the _config.yml file:

$ conda env create -f hdlib-env.yml -n hdlib-env
$ source activate hdlib-env

Author

Michael Hersche - Initial work - MHersche
Sebastian Kurella - skurella
Tibor Schneider - tiborschneider

Optimizations

CPU (bit packing, cicular buffers, hamming distance LUT)
GPU with global memory
GPU with shared memory
GPU with thread-local memory
GPU with thread-local memory and batches (does not work well)
better batching
memory coalescing
memory bank alignment
clipping and inference to GPU

Measurements

Sources of parallelism

HD vector encoding of n-gramms is an embarrasingly parallel problem. There are two major ways to split the computations to parallelise them:

HD vector can be split into arbitrarily short chunks to encode a single n-gramm in parallel
- simply rotating the whole HD vector for feature slightly widens the dependency to neighbouring HD vector chunks
- this can be prevented by rotating each feature within the chunk, as long as the chunk has no fewer elements than the n-gramm size
- processing each chunk reads the HD vectors and inputs (constant) and exclusively writes to a part of the output
input can be split into multiple chunks
- input lenghts range from a couple tens or hundreds of features during inference to millions of features during training
- each split is accompanied by an overlap of (n-gramm size - 1)

No input-data-parallelism

| D pack | characters in input sample --> | Block |
|--------|--------------------------------|-------|
| 0      |                                |       |  ^
| :      |                                |       |  |
| :      | 128 threads                    | 0     | 128
| :      |                                |       |  |
| 127    |                                |       |  v
|--------|--------------------------------|-------|
| 128    |                                |       |
| :      | 128 threads                    | 1     |
| 255    |                                |       |
|--------|--------------------------------|-------|
| 256    |                                |       |
| :      | 128 threads                    | 2     |
| 312    |                                |       |
|--------|--------------------------------|-------|
          <------------- n_x ------------>

`m`-input-data-parallelism

| D pack | characters in input sample --> | Block |
|--------|--------------------------------|-------|
| 0      |                |               |       |   ^
| :      |                |               |       |   |
| :      | 64 threads     | 64 threads    | 0     | 128 / m
| :      |                |               |       |   |
| 63     |                |               |       |   v
|--------|--------------------------------|-------|
| 64     |                |               |       |
| :      | 64 threads     | 64 threads    | 1     |
| 127    |                |               |       |
|--------|--------------------------------|-------|
| 128    |                |               |       |
| :      | 64 threads     | 64 threads    | 2     |
| 191    |                |               |       |
|--------|--------------------------------|-------|
| 192    |                |               |       |
| :      | 64 threads     | 64 threads    | 3     |
| 255    |                |               |       |
|--------|--------------------------------|-------|
| 256    |                |               |       |
| :      | 64 threads     | 64 threads    | 4     |
| 319    |                |               |       |
|--------|--------------------------------|-------|
          <--thread_n_x-->

          ^               ^
          |               |
       start_0         start_1

Input dimension division m
HD vector dimension D (packed) for each thread block: 128 / m
Input intervals
- First two samples for each thread do not produce an output. We also need to overlap the range, so that no result is lost during this process. We divide the number of valid encoded ngramms (ngramm_sum_count = n_x - (ngramm - 1)) equally amongst the threads.
- thread_n_x: Number of input features for each thread
  - for threadIdx.y < m: floor((n_x - (ngramm-1)) / m)
  - for threadIdx.y = m: n_x - (m - 1) * floor(n_x / m)
- Start-index for threads with threadIdx.y = y:
  - start_y = y * floor((n_x - (ngramm-1)) / m)
- End-index for threads with threadIdx.y = y:
  - end_y = (y + 1) * floor((n_x - (ngramm-1)) / m) + (ngramm - 2)
Shared memory size (per block):
- item_lookup: (n_items * 128 / m) * 4 [bytes]
- ngramm_sum_buffer: ((32 * 128 / m) * m) * 4 [bytes] = (32 * 128) * 4 [bytes]
- Total: (with n_items = 29):
  - sizeof(item_lookup) = 14336 / m [bytes]
  - sizeof(ngramm_sum_buffer) = 16384 [bytes]
  - sizeof(shared_memory) <= 30720 [bytes] < 65536 [bytes]

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
examples/language_classif		examples/language_classif
plots		plots
pyhdlib		pyhdlib
.gitignore		.gitignore
README.md		README.md
_hd_classifier.h		_hd_classifier.h
_hd_encoder.h		_hd_encoder.h
build.sh		build.sh
build_nvcc.sh		build_nvcc.sh
grid_calc_data.sh		grid_calc_data.sh
hd_batch_encoder.c		hd_batch_encoder.c
hd_batch_encoder.cu		hd_batch_encoder.cu
hd_batch_encoder.h		hd_batch_encoder.h
hd_classifier.c		hd_classifier.c
hd_classifier.h		hd_classifier.h
hd_encoder.c		hd_encoder.c
hd_encoder.cu		hd_encoder.cu
hd_encoder.h		hd_encoder.h
hd_encoder_common.c		hd_encoder_common.c
hdlib-env.yml		hdlib-env.yml
test_inference.c		test_inference.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperdimensional Computing Library

Prerequisites

Author

Optimizations

Measurements

Sources of parallelism

No input-data-parallelism

`m`-input-data-parallelism

About

Releases

Packages

Languages

skurella/hdlib

Folders and files

Latest commit

History

Repository files navigation

Hyperdimensional Computing Library

Prerequisites

Author

Optimizations

Measurements

Sources of parallelism

No input-data-parallelism

m-input-data-parallelism

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`m`-input-data-parallelism

Packages