simdcomp vs roaringBitmap #22

sreekanth-cb · 2019-09-27T07:45:34Z

@lemire , thanks a lot for these wonderful libraries.

Just being curious about one thing here, how do these two libraries (simdcomp vs roaringBitmap) stand against each other in terms of encoding/decoding speed and compression performances when it comes to ordered numbers?

Though i have used roaringBitmaps, never benchmarked/verified the compression ratio there..
But while glancing over this simdcomp, end up running the go/test and realised that the compression ratio was ~32.

Another side note, is it would be great to have a sample for the variable length array encoding/decoding for the Go users in simdcomp.

thanks!

lemire · 2019-09-27T11:50:46Z

Thanks.

I don't think that it is comparable, at all. The simdcomp library is a low level library that supports some algorithms, but it does not provide an integrated data structure. It is up to you to build up a format, data structure.

Roaring bitmaps are a standard format, a set of data structures and so forth.

If you are interested in something like variable-length coding, I suggest you look up the streamvbyte library instead.

sreekanth-cb · 2019-09-28T02:42:50Z

Thanks for the quick reply, one more doubt.
As per the https://github.com/powturbo/TurboPFor#StreamVByte, streamvbyte isn't looking at the best position. Is it too old or streamvbyte changed after that?

I was checking simdcomp as it was doing very well against all others,
And would be eyeing to use the simdcomp for random sized array compressions from golang.

(sorry, as there are many materials out there, i may not be looking in the right order too)

lemire · 2019-09-28T14:54:43Z

You have looked at
https://github.com/lemire/simdcomp/blob/master/go/test.go

No doubt?

I am not sure I know what you mean by "a sample".

streamvbyte isn't looking at the best position. Is it too old or streamvbyte changed after that?

Please refer to https://github.com/lemire/streamvbyte and the paper at https://arxiv.org/abs/1709.08990

Stream VByte is extremely fast. I don't think that there is any byte-oriented codec out there that can match it.

sreekanth-cb · 2019-09-29T01:54:39Z

Thats cool,
I wasn't clear enough(wrongly used the words) on my ask for that go/test.go extension.
Currently it depicts the use of a 128 length block. And my request was for a cgo example for any random length like here -https://github.com/lemire/simdcomp/blob/master/example.c#L13

sreekanth-cb · 2019-09-29T02:08:09Z

I was playing with a few benchmark tests like below. (exploring cgo, so might be not the right way/ some mistakes there)
Both simdcomp and streamvbyte over cgo.

go test -benchmem -bench=.
goos: darwin
goarch: amd64
BenchmarkTestSIMDEncode-8 20000000 73.9 ns/op 0 B/op 0 allocs/op
BenchmarkTestSIMDDecode-8 20000000 74.7 ns/op 0 B/op 0 allocs/op
BenchmarkTestSvbEncode-8 10000000 123 ns/op 0 B/op 0 allocs/op
BenchmarkTestSvbDecode-8 20000000 96.5 ns/op 0 B/op 0 allocs/op

code

func BenchmarkTestSIMDEncode(t *testing.B) {
	CallSIMDEncode(t.N)
}

func BenchmarkTestSIMDDecode(t *testing.B) {
	CallSIMDDecode(t.N)
}

func BenchmarkTestSvbEncode(t *testing.B) {
	CallSvbEncode(t.N)
}

func BenchmarkTestSvbDecode(t *testing.B) {
	CallSvbDecode(t.N)
}

package svb

/*
#cgo LDFLAGS: -lsimdcomp
#include <simdcomp.h>
*/
import "C"

func CallSIMDEncode(n int) {
	var data [128]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(128); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	b := C.maxbits(&data[0])

	out := make([]C.__m128i, b)

	for i := 0; i < n; i++ {
		C.simdpackwithoutmask(&data[0], &out[0], b)
	}
}

func CallSIMDDecode(n int) {
	var data [128]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(128); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	b := C.maxbits(&data[0])
	out := make([]C.__m128i, b)
	C.simdpackwithoutmask(&data[0], &out[0], b)
	var recovereddata [128]C.uint32_t

	for i := 0; i < n; i++ {
		C.simdunpack(&out[0], &recovereddata[0], b)
	}
}

package svb

/*
#cgo LDFLAGS: -lstreamvbyte
#include <streamvbyte.h>
#include <stdlib.h>
*/
import "C"
import (
	"unsafe"
)

func CallSvbEncode(n int) {
	const count = C.uint32_t(128)
	var data [count]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(count); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	size := C.size_t((count+3)/4 + (count * 4))
	compressedbuffer := C.malloc(size * C.size_t(unsafe.Sizeof(C.uint8_t(0))))

	for i := 0; i < n; i++ {
		_ = C.streamvbyte_encode(&data[0], count, (*C.uint8_t)(compressedbuffer))

	}

	C.free(unsafe.Pointer(compressedbuffer))
}

func CallSvbDecode(n int) {
	const count = C.uint32_t(128)
	var data [count]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(count); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	size := C.size_t((count+3)/4 + (count * 4))
	compressedbuffer := C.malloc(C.size_t(size) * C.size_t(unsafe.Sizeof(uintptr(0))))
	_ = C.streamvbyte_encode(&data[0], count, (*C.uint8_t)(compressedbuffer))
	decodedbuffer := make([]C.uint32_t, count)

	for i := 0; i < n; i++ {
		_ = C.streamvbyte_decode((*C.uint8_t)(compressedbuffer), &decodedbuffer[0], count)
	}

	C.free(unsafe.Pointer(compressedbuffer))
}

lemire · 2019-09-29T16:06:21Z

Pull requests are invited!!!

It is a great idea to provide more examples, but I simply can't do it all... I rely on people like you to help out.

lemire closed this as completed Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simdcomp vs roaringBitmap #22

simdcomp vs roaringBitmap #22

sreekanth-cb commented Sep 27, 2019

lemire commented Sep 27, 2019

sreekanth-cb commented Sep 28, 2019

lemire commented Sep 28, 2019

sreekanth-cb commented Sep 29, 2019 •

edited

Loading

sreekanth-cb commented Sep 29, 2019

lemire commented Sep 29, 2019

simdcomp vs roaringBitmap #22

simdcomp vs roaringBitmap #22

Comments

sreekanth-cb commented Sep 27, 2019

lemire commented Sep 27, 2019

sreekanth-cb commented Sep 28, 2019

lemire commented Sep 28, 2019

sreekanth-cb commented Sep 29, 2019 • edited Loading

sreekanth-cb commented Sep 29, 2019

lemire commented Sep 29, 2019

sreekanth-cb commented Sep 29, 2019 •

edited

Loading