Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APIs to serialize/deserialize data? #28

Open
anhldbk opened this issue Nov 23, 2017 · 10 comments
Open

APIs to serialize/deserialize data? #28

anhldbk opened this issue Nov 23, 2017 · 10 comments

Comments

@anhldbk
Copy link

anhldbk commented Nov 23, 2017

Thanks for this nice implementation of Cuckoo filter. Would you please let me know how to serialize/deserialize the filter's state?

@nextgens
Copy link

nextgens commented Dec 2, 2017

I guess it's a feature request more than anything else... but I concur; it'd be super-convenient if it was part of the API

@dnbaker
Copy link
Contributor

dnbaker commented Dec 3, 2017

I used some dummy structs and manual memcpy to do so in a project a while back. It's gross and bad form but works. Just make sure the layout's the same. (Until it's actually supported, of course.)

@nextgens
Copy link

nextgens commented Dec 4, 2017

Would you mind sharing the code ?

@dnbaker
Copy link
Contributor

dnbaker commented Dec 4, 2017

I can pull that out of a private repo. I’ll look it over tomorrow and see if a pull request is easier.

If you’re doing sequence analysis, which I’m guessing based on your name, I could release the kmer database code around it if it meets your needs.

@nextgens
Copy link

nextgens commented Dec 4, 2017

Thanks!

No, this time it's not for sequence analysis :)

@roniemartinez
Copy link

👍 to this! Serializing/deserializing seems more efficient than rebuilding the filter. @dnbaker hope it is possible to share the code.

@d4tocchini
Copy link

@dnbaker, @efficient

this is viable, yes? if so, something to point me to?

@d4tocchini
Copy link

assuming a surgically placed memcpy, but not feeling confident whereabout

@amiller-gh
Copy link

@dnbaker resurrecting an old thread :)

I'm also interested in serialization. Any chance you can dig up that old code, or do you have recommendations for another cuckoo filter library?

@dnbaker
Copy link
Contributor

dnbaker commented Oct 26, 2021

Here's the code I had. It's only valid for POD (std::is_trivial and std::is_standard_layout). Sorry I took a while to respond - I had to go find and extract it.

First, there's a couple of dummy structs which are defined, and namespacing:

namespace {
typedef struct {
    size_t index;
    uint32_t tag;
    bool used;
} SerialVictim;
struct dummytable {
    void *p;
    size_t n;
};
struct dummyfilter {
    void *p;
    size_t n;
    SerialVictim vc;
};
}

using namespace cuckoofilter;

Writing to disk:

size_t write_cuckoo_filter(const CuckooFilter<uint64_t, bits_per_item, SingleTable> *filter,
                           const char *path) {
    dummyfilter dfilter;
    dummytable tbl;
    static const size_t tags_per_bucket(4);
    const size_t bktsz((bits_per_item * tags_per_bucket + 7) >> 3); // I haz a bkt.
    static_assert(sizeof(dfilter) == sizeof(*filter), "Dummy table is the wrong size!\n");
    static_assert(sizeof(tbl) == sizeof(SingleTable<bits_per_item>), "Dummy table is the wrong size!\n");
    // Copy info to dummy structs.
    memcpy(&dfilter, filter, sizeof(*filter));
    memcpy(&tbl, dfilter.p, sizeof(tbl));

    FILE *fp(fopen(path, "wb"));
    const int used(dfilter.vc.used);
    size_t ret(fwrite(&dfilter.n, 1, sizeof(dfilter.n), fp));
    ret += fwrite(&dfilter.vc.index, 1, sizeof(dfilter.vc.index), fp);
    ret += fwrite(&dfilter.vc.tag, 1, sizeof(dfilter.vc.tag), fp);
    ret += fwrite(&used, 1, sizeof(used), fp);
    ret += fwrite(&tbl.n, 1, sizeof(tbl.n), fp);
    ret += fwrite(tbl.p, 1, bktsz * tbl.n, fp);
    fclose(fp);
    return ret;
}

Reading:

template <size_t bits_per_item>
CuckooFilter<uint64_t, bits_per_item, SingleTable> *
load_cuckoo_filter(const char *path) {
    FILE *fp(fopen(path, "rb"));
    const size_t tags_per_bucket(4uL);
    const size_t bktsz((bits_per_item * tags_per_bucket + 7) >> 3); // I haz a bkt.
    fseek(fp, 0, SEEK_END);
    const size_t fsize = ftell(fp);
    fseek(fp, 0, SEEK_SET);
    size_t num_bktz, num_items, index;
    uint32_t tag;
    int used;
    fscanf(fp, "%llu", &num_items);
    fscanf(fp, "%llu", &index);
    fscanf(fp, "%u", &tag);
    fscanf(fp, "%i", &used);
    fscanf(fp, "%llu", &num_bktz);
    size_t num_bktz_found((fsize - (sizeof(size_t) * 3) - sizeof(uint32_t) - sizeof(int)) / bktsz);
    if((fsize - (sizeof(size_t) * 3) - sizeof(uint32_t) - sizeof(int)) % bktsz) {
       fprintf(stderr, "Found %" PRIu64 " buckets. Expected %" PRIu64 ". fsize: %" PRIu64 ". bktsz: %" PRIu64 ". They be stealin' my bkt :'(....\n",
               num_bktz_found, num_bktz, fsize, bktsz);
       exit(EXIT_FAILURE);
    }
    void *buf = (uint8_t *)malloc(fsize - 1);
    size_t nread;
    if((nread = fread(buf, 1, num_bktz * bktsz, fp)) != num_bktz * bktsz) {
       fprintf(stderr, "Couldn't read buffer from file '%s' Bytes read: %" PRIu64 ". Expected: %" PRIu64 ".\n", path, nread, num_bktz * bktsz);
       exit(EXIT_FAILURE);
    }
    dummytable *table((dummytable *)malloc(sizeof(SingleTable<bits_per_item>)));
    *table = {buf, num_bktz};
    static_assert(sizeof(*table) == sizeof(uint64_t[2]), "Wrong size!\n");

    dummyfilter *ret((dummyfilter *)malloc(sizeof(dummyfilter)));
    *ret = {table, num_items, {index, tag, !!used}};
    fclose(fp);
    return (CuckooFilter<uint64_t, bits_per_item, SingleTable> *)(ret);
}

At destruction, you'll need to manually call your destructor (since it's a pointer), and std::free(ret->table).
It's pretty clunky, but it at least worked for my purposes. Let me know if you have any problems or need some help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants