Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simdcomp vs roaringBitmap #22

Closed
sreekanth-cb opened this issue Sep 27, 2019 · 6 comments
Closed

simdcomp vs roaringBitmap #22

sreekanth-cb opened this issue Sep 27, 2019 · 6 comments

Comments

@sreekanth-cb
Copy link

@lemire , thanks a lot for these wonderful libraries.

Just being curious about one thing here, how do these two libraries (simdcomp vs roaringBitmap) stand against each other in terms of encoding/decoding speed and compression performances when it comes to ordered numbers?

Though i have used roaringBitmaps, never benchmarked/verified the compression ratio there..
But while glancing over this simdcomp, end up running the go/test and realised that the compression ratio was ~32.

Another side note, is it would be great to have a sample for the variable length array encoding/decoding for the Go users in simdcomp.

thanks!

@lemire
Copy link
Owner

lemire commented Sep 27, 2019

Thanks.

I don't think that it is comparable, at all. The simdcomp library is a low level library that supports some algorithms, but it does not provide an integrated data structure. It is up to you to build up a format, data structure.

Roaring bitmaps are a standard format, a set of data structures and so forth.

If you are interested in something like variable-length coding, I suggest you look up the streamvbyte library instead.

@lemire lemire closed this as completed Sep 27, 2019
@sreekanth-cb
Copy link
Author

Thanks for the quick reply, one more doubt.
As per the https://github.com/powturbo/TurboPFor#StreamVByte, streamvbyte isn't looking at the best position. Is it too old or streamvbyte changed after that?

I was checking simdcomp as it was doing very well against all others,
And would be eyeing to use the simdcomp for random sized array compressions from golang.

(sorry, as there are many materials out there, i may not be looking in the right order too)

@lemire
Copy link
Owner

lemire commented Sep 28, 2019

You have looked at
https://github.com/lemire/simdcomp/blob/master/go/test.go

No doubt?

I am not sure I know what you mean by "a sample".

streamvbyte isn't looking at the best position. Is it too old or streamvbyte changed after that?

Please refer to https://github.com/lemire/streamvbyte and the paper at https://arxiv.org/abs/1709.08990

Stream VByte is extremely fast. I don't think that there is any byte-oriented codec out there that can match it.

@sreekanth-cb
Copy link
Author

sreekanth-cb commented Sep 29, 2019

Thats cool,
I wasn't clear enough(wrongly used the words) on my ask for that go/test.go extension.
Currently it depicts the use of a 128 length block. And my request was for a cgo example for any random length like here -https://github.com/lemire/simdcomp/blob/master/example.c#L13

@sreekanth-cb
Copy link
Author

I was playing with a few benchmark tests like below. (exploring cgo, so might be not the right way/ some mistakes there)
Both simdcomp and streamvbyte over cgo.

go test -benchmem -bench=.
goos: darwin
goarch: amd64
BenchmarkTestSIMDEncode-8 20000000 73.9 ns/op 0 B/op 0 allocs/op
BenchmarkTestSIMDDecode-8 20000000 74.7 ns/op 0 B/op 0 allocs/op
BenchmarkTestSvbEncode-8 10000000 123 ns/op 0 B/op 0 allocs/op
BenchmarkTestSvbDecode-8 20000000 96.5 ns/op 0 B/op 0 allocs/op

code

func BenchmarkTestSIMDEncode(t *testing.B) {
	CallSIMDEncode(t.N)
}

func BenchmarkTestSIMDDecode(t *testing.B) {
	CallSIMDDecode(t.N)
}

func BenchmarkTestSvbEncode(t *testing.B) {
	CallSvbEncode(t.N)
}

func BenchmarkTestSvbDecode(t *testing.B) {
	CallSvbDecode(t.N)
}
package svb

/*
#cgo LDFLAGS: -lsimdcomp
#include <simdcomp.h>
*/
import "C"

func CallSIMDEncode(n int) {
	var data [128]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(128); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	b := C.maxbits(&data[0])

	out := make([]C.__m128i, b)

	for i := 0; i < n; i++ {
		C.simdpackwithoutmask(&data[0], &out[0], b)
	}
}

func CallSIMDDecode(n int) {
	var data [128]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(128); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	b := C.maxbits(&data[0])
	out := make([]C.__m128i, b)
	C.simdpackwithoutmask(&data[0], &out[0], b)
	var recovereddata [128]C.uint32_t

	for i := 0; i < n; i++ {
		C.simdunpack(&out[0], &recovereddata[0], b)
	}
}
package svb

/*
#cgo LDFLAGS: -lstreamvbyte
#include <streamvbyte.h>
#include <stdlib.h>
*/
import "C"
import (
	"unsafe"
)

func CallSvbEncode(n int) {
	const count = C.uint32_t(128)
	var data [count]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(count); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	size := C.size_t((count+3)/4 + (count * 4))
	compressedbuffer := C.malloc(size * C.size_t(unsafe.Sizeof(C.uint8_t(0))))

	for i := 0; i < n; i++ {
		_ = C.streamvbyte_encode(&data[0], count, (*C.uint8_t)(compressedbuffer))

	}

	C.free(unsafe.Pointer(compressedbuffer))
}

func CallSvbDecode(n int) {
	const count = C.uint32_t(128)
	var data [count]C.uint32_t
	for i := C.uint32_t(0); i < C.uint32_t(count); i++ {
		data[i] = C.uint32_t(i*1000 + i*100 + i)
	}
	size := C.size_t((count+3)/4 + (count * 4))
	compressedbuffer := C.malloc(C.size_t(size) * C.size_t(unsafe.Sizeof(uintptr(0))))
	_ = C.streamvbyte_encode(&data[0], count, (*C.uint8_t)(compressedbuffer))
	decodedbuffer := make([]C.uint32_t, count)

	for i := 0; i < n; i++ {
		_ = C.streamvbyte_decode((*C.uint8_t)(compressedbuffer), &decodedbuffer[0], count)
	}

	C.free(unsafe.Pointer(compressedbuffer))
}

@lemire
Copy link
Owner

lemire commented Sep 29, 2019

Pull requests are invited!!!

It is a great idea to provide more examples, but I simply can't do it all... I rely on people like you to help out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants