-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we store geoip files uncompressed? #28782
Comments
FYI @elastic/es-core-infra. |
@danielmitterdorfer Thank you for this analysis. I'm in favor of loading the geoip database as uncompressed files, so that we need less heap memory when the geoip plugin gets used.
Even if we change how the database files are loaded the increase of native memory only occurs if geoip gets used and this memory is managed by the OS instead of the jvm.
To me this is a minor downside, as diskspace is the resources we should worry least about and it just increases the diskspace used by Elasticsearch if the geoip plugin is installed with 38.1 MB.
This is just a small increase and doesn't outweigh the benefits of not changing how the data files get loaded. |
An alternative, or possibly in conjunction with this idea, would be to further investigate what we have discussed casually for some time now: re-encoding the data we actually use from the database in our own optimized format. From what I remember, we use a small portion of the database fields. The work for this is doable; the question really is just whether this is allowed via the maxmind db license (assuming we keep adequate notice about the source of the data). |
According to the site for the Geolite2 database the license is CC BY-SA 4.0. I also think that it would be wise to double-check but on the Creative Commons site it says:
We must give appropriate credit (which is IMHO nice behaviour anyway) and must redistribute under the same license. All this sounds to me (IANAL) as if it would be feasible to re-encode the data. |
I think we should pursue both, but let's not make the immediate gains depend on the longer term goal of improving encoding. |
We discussed in Fix-it Friday and want to pursue this. By default, we will load the files using the mmap approach and will add a system property in case some users explicitly want to load data still on-heap. |
With this commit we reduce heap usage of the ingest-geoip plugin by memory-mapping the database files. Previously, we have stored these files gzip-compressed but this has resulted that data are loaded on the heap. Closes elastic#28782
With this commit we reduce heap usage of the ingest-geoip plugin by memory-mapping the database files. Previously, we have stored these files gzip-compressed but this has resulted that data are loaded on the heap. Closes #28782
With this commit we reduce heap usage of the ingest-geoip plugin by memory-mapping the database files. Previously, we have stored these files gzip-compressed but this has resulted that data are loaded on the heap. Closes #28782
With this commit we reduce heap usage of the ingest-geoip plugin by memory-mapping the database files. Previously, we have stored these files gzip-compressed but this has resulted that data are loaded on the heap. Closes #28782
Current Situation
The ingest-geoip plugin ships with three geoip files:
We load these files lazily to reduce memory usage when this feature is not required (despite the plugin being loaded).
While the Maxmind DB reader allows to load data either on-heap or off-heap, we basically have to load data on-heap because we provide an
InputStream
that decompresses the gzipped data on the fly.In order to allow loading the data off-heap, we need to provide a file (see the
builder.mode
parameter which controls whether to load on- or off-heap).Discussion Item
Does it make sense to store these files uncompressed instead of gzip-compressed?
Consequences
Positive Consequences
Negative Consequences
config
directory on disk increases from 34.5 MB to 72.6 MB.ingest-geoip-6.2.2.zip
).The text was updated successfully, but these errors were encountered: