Memory Issue #31

hyperxpro · 2024-02-08T17:18:01Z

So I have been playing around with mmdbctl and I was doing simple mmdb export to json and import back to mmdb. However, the process started taking up to 46 GB of memory. Luckily my machine had plenty of memory so import was successful. However, systems with low memory may run into problems.

I did some digging into source and found that mmdbctl is holding all data into memory and writing into file at once after processing is done. Can we do buffered writing and flush incrementally to prevent huge memory usages?

mmdbctl/lib/cmd_import.go

Line 453 in 6d422a6

fmt.Fprintf(os.Stderr, "writing to %s (%v entries)\n", f.Out, entrycnt)

Task Manager:

MMDB file used: GeoLite2-City.mmdb

The text was updated successfully, but these errors were encountered:

UmanShahzad · 2024-02-08T17:24:47Z

@hyperxpro mmdbctl uses the underlying mmdb-writer library for creating the actual MMDB, which doesn't support incremental writes. The MMDB format itself isn't very friendly to incremental writes, which is the root of the problem. It's probably not impossible, but it'd be a really complex and sensitive algorithm using partial files to help deal with the memory growth (similar to a large-scale merge-sort solution).

In practice, almost anyone producing a big enough MMDB file ends up having large RAM machines to do it, which is cheap these days, so little effort has been invested into optimizing that.

It could be something we do!

hyperxpro · 2024-02-08T17:32:53Z

Can we not write N bytes into the file and flush and increase offset and repeat the process? I am not sure if it is entirely possible like this but, is there any workaround we can do at the I/O level to fix this?

UmanShahzad · 2024-02-08T17:41:59Z

@hyperxpro unfortunately, the memory and I/O model for MMDBs, and the relationship between them, is more complex than that.

See https://maxmind.github.io/MaxMind-DB for the actual MMDB spec

jhg03a · 2024-02-09T15:53:19Z

For reference in my custom mmdb, it takes around 800G of ram and many hours to compile from an optimized ip_trie that takes a fraction of the time and memory to build and is threadsafe. From my shallow dive into it, a large part of it is the fact that the mmdb writer library uses reflection all over the place since it was written before real generics were part of the language.

johnhtodd · 2024-08-01T23:42:32Z

Hitting the same problem here, doing the same thing. MMDB->JSON->MMDB. JSON file has ~15m routes/prefixes. Un-possible thus far on any of the systems we have here (512G is max) to turn back into MMDB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Issue #31

Memory Issue #31

hyperxpro commented Feb 8, 2024

UmanShahzad commented Feb 8, 2024

hyperxpro commented Feb 8, 2024 •

edited

Loading

UmanShahzad commented Feb 8, 2024

jhg03a commented Feb 9, 2024

johnhtodd commented Aug 1, 2024

Memory Issue #31

Memory Issue #31

Comments

hyperxpro commented Feb 8, 2024

UmanShahzad commented Feb 8, 2024

hyperxpro commented Feb 8, 2024 • edited Loading

UmanShahzad commented Feb 8, 2024

jhg03a commented Feb 9, 2024

johnhtodd commented Aug 1, 2024

hyperxpro commented Feb 8, 2024 •

edited

Loading