Skip to content

pwitkiewicz/rle-compression-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLE compression in CUDA

This is mine version of compression algorithm based on run length encoding implemented for nVidia cards having CUDA compute capability.

I based my work on:

While compression algorithm might be farily complex it goes as follows (as a run I define a sequence of repeated character):

  1. Compute mask out of input data - put 1 where new run starts, 0 otherwise
  2. Compute inclusive prefix sum (scan) for the mask, which encodes the output location of all the compressed pairs.
  3. Compute compacted mask which encodes starting index for all the runs.
  4. Finally compute occurences (which are difference between elements in compacted mask) and save character to output matrix of each run (which can be found by using compacted mask value as index).

Decompression algorithm is a more straightforward:

  1. Compute exclusive prefix sum out of occurences matrix, it's elemnts will point to the place of start of each run.
  2. Compute orginal data, in my approach each thread is responsible for filling out output matrix for one run. (It might be inefficient approach for a highly RLE compressible data)

Comparison with simple sequential CPU RLE compression implementation (averages for a 256 colour 11520x6480 bitmap)

Device Compression time Decompression time
CPU 2229ms 7481ms
GPU 955ms 320ms

About

My first attempt at CUDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published