Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFFI bindings for murmur3 #8

Closed
thedrow opened this issue Jun 21, 2015 · 5 comments
Closed

CFFI bindings for murmur3 #8

thedrow opened this issue Jun 21, 2015 · 5 comments

Comments

@thedrow
Copy link

thedrow commented Jun 21, 2015

We need CFFI bindings for murmur3 in order for this library to work fast on PyPy.
It shouldn't be that hard since we're only using a few methods from that library.

@ewdurbin
Copy link
Owner

I'm not familiar with what is required to resolve this.

@thedrow do you have any examples or documentation for me to reference?

@ewdurbin
Copy link
Owner

@thedrow I've made a cursory step towards improving PyPy performance with 570dcbe

Speedups are approximately 3x-4x and bring PyPy performance in line with CPython/C-Extension performance, without having to dig into CFFI.

Additionally, it lays a little groundwork for initial support of Jython.

I'd appreciate any feedback.

@ewdurbin
Copy link
Owner

Just a rough estimate actually shows that PyPy with the pure python fallback is considerably faster than CPython/C-Extension variant.

$ cat bench.py 
import timeit

s = """
from clandestined import RendezvousHash
rendezvous = RendezvousHash()
"""

print(timeit.timeit('rendezvous.hash_function("6666")', setup=s, number=100000))
$ .tox/py27/bin/python bench.py 
0.363238096237
$ .tox/py34/bin/python bench.py 
0.3322792090184521
$ .tox/pypy/bin/pypy bench.py 
0.0770089626312
$ .tox/pypy3/bin/pypy3 bench.py 
0.08740401268005371

EDIT above results are flawed, the C-Extension is not loaded by the CPython interpreters.

@ewdurbin
Copy link
Owner

A quick and dirty comparison for Python 2.7.10 and PyPy 2.6.0 using Pure Python, CFFI, and native extensions.

build.py

import cffi

ffi = cffi.FFI()

ffi.cdef("""
uint32_t murmur3_32(const char *key, uint32_t len, uint32_t seed);
""")

lib = ffi.verify("""
#include <stdint.h>
#include <string.h>

uint32_t murmur3_32(const char *key, uint32_t len, uint32_t seed) {
    static const uint32_t c1 = 0xcc9e2d51;
    static const uint32_t c2 = 0x1b873593;
    static const uint32_t r1 = 15;
    static const uint32_t r2 = 13;
    static const uint32_t m = 5;
    static const uint32_t n = 0xe6546b64;

    uint32_t hash = seed;

    const int nblocks = len / 4;
    const uint32_t *blocks = (const uint32_t *) key;
    int i;
    for (i = 0; i < nblocks; i++) {
        uint32_t k = blocks[i];
        k *= c1;
        k = (k << r1) | (k >> (32 - r1));
        k *= c2;

        hash ^= k;
        hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
    }

    const uint8_t *tail = (const uint8_t *) (key + nblocks * 4);
    uint32_t k1 = 0;

    switch (len & 3) {
    case 3:
        k1 ^= tail[2] << 16;
    case 2:
        k1 ^= tail[1] << 8;
    case 1:
        k1 ^= tail[0];

        k1 *= c1;
        k1 = (k1 << r1) | (k1 >> (32 - r1));
        k1 *= c2;
        hash ^= k1;
    }

    hash ^= len;
    hash ^= (hash >> 16);
    hash *= 0x85ebca6b;
    hash ^= (hash >> 13);
    hash *= 0xc2b2ae35;
    hash ^= (hash >> 16);

    return hash;
}
""")

def cffi_murmur3_32(data, seed=0):
    return lib.murmur3_32(data, len(data), seed)

def pure_murmur3_32(data, seed = 0):
    c1 = 0xcc9e2d51
    c2 = 0x1b873593

    length = len(data)
    h1 = seed
    roundedEnd = (length & 0xfffffffc)  # round down to 4 byte block
    for i in range(0, roundedEnd, 4):
      # little endian load order
      k1 = (ord(data[i]) & 0xff) | ((ord(data[i + 1]) & 0xff) << 8) | \
           ((ord(data[i + 2]) & 0xff) << 16) | (ord(data[i + 3]) << 24)
      k1 *= c1
      k1 = (k1 << 15) | ((k1 & 0xffffffff) >> 17) # ROTL32(k1,15)
      k1 *= c2

      h1 ^= k1
      h1 = (h1 << 13) | ((h1 & 0xffffffff) >> 19)  # ROTL32(h1,13)
      h1 = h1 * 5 + 0xe6546b64

    # tail
    k1 = 0

    val = length & 0x03
    if val == 3:
        k1 = (ord(data[roundedEnd + 2]) & 0xff) << 16
    # fallthrough
    if val in [2, 3]:
        k1 |= (ord(data[roundedEnd + 1]) & 0xff) << 8
    # fallthrough
    if val in [1, 2, 3]:
        k1 |= ord(data[roundedEnd]) & 0xff
        k1 *= c1
        k1 = (k1 << 15) | ((k1 & 0xffffffff) >> 17)  # ROTL32(k1,15)
        k1 *= c2
        h1 ^= k1

    # finalization
    h1 ^= length

    # fmix(h1)
    h1 ^= ((h1 & 0xffffffff) >> 16)
    h1 *= 0x85ebca6b
    h1 ^= ((h1 & 0xffffffff) >> 13)
    h1 *= 0xc2b2ae35
    h1 ^= ((h1 & 0xffffffff) >> 16)

    return h1 & 0xffffffff

bench.py


s = """
from build import cffi_murmur3_32
from build import pure_murmur3_32
import _murmur3
"""

import timeit

print "pure python: %s" % timeit.timeit("pure_murmur3_32('foo', 1)", setup=s, number=1000000)
print "cffi python: %s" % timeit.timeit("cffi_murmur3_32('foo', 1)",     setup=s, number=1000000)
print "cext python: %s" % timeit.timeit("_murmur3.murmur3_32('foo', 1)", setup=s, number=1000000)

Results

$ python-test/bin/python bench.py 
pure python: 2.60310292244
cffi python: 0.338982820511
cext python: 0.211312055588

$ pypy-test/bin/python bench.py 
pure python: 0.01256108284
cffi python: 0.113795042038
cext python: 3.43625211716

So with those results, it looks like falling back to Pure Python is the best of both worlds for PyPy rather than integrating CFFI.

@thedrow
Copy link
Author

thedrow commented Jun 29, 2015

Let's also try the new out of line method for binding C code with CFFI. See https://cffi.readthedocs.org/en/latest/cdef.html#upgrading-from-cffi-0-9-to-cffi-1-0
But overall it does look that the pure python implementation is better so that's really cool.
Let's enable the build for PyPy and PyPy3 and see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants