Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python vs C extension difference with large db #335

Closed
suola opened this issue Jan 4, 2024 · 5 comments · Fixed by #336
Closed

Python vs C extension difference with large db #335

suola opened this issue Jan 4, 2024 · 5 comments · Fixed by #336

Comments

@suola
Copy link

suola commented Jan 4, 2024

We have a custom built db with the following details:

  • record size: 32
  • node count: 542155119
  • file size: 4339498906

The database works with python implementation. The latest maxminddb python library came with a vendored copy of libmaxminddb, and we noticed that the database doesn't work with the C extension.

Opening the database succeeds, but a query for an IP results in

maxminddb.errors.InvalidDatabaseError: Error looking up 192.0.2.0. The MaxMind DB file's search tree is corrupt

Slightly smaller database works properly also with the C extension, so maybe the database size is the problem?

@oschwald
Copy link
Member

oschwald commented Jan 4, 2024

You are likely near the maximum size that can be supported with 32 bit records. The exact maximum size will depend on the size of the search tree and data section. What writer are you using to create the database?

That said, if you have been using the database with the pure Python implementation and have not run into any errors, I would suspect that a 32 bit integer in libmaxminddb is overflowing. Could you confirm that you see the same behavior with the mmdblookup tool included in libmaxminddb? If so, I'll move this issue there. Also, is there any way you could share a copy of a database that exhibits the issue?

@suola
Copy link
Author

suola commented Jan 4, 2024

You are likely near the maximum size that can be supported with 32 bit records. The exact maximum size will depend on the size of the search tree and data section. What writer are you using to create the database?

The standard MaxMind perl writer.

That said, if you have been using the database with the pure Python implementation and have not run into any errors, I would suspect that a 32 bit integer in libmaxminddb is overflowing. Could you confirm that you see the same behavior with the mmdblookup tool included in libmaxminddb?

Yes, I get the same error

➜  libmaxminddb-1.8.0 bin/mmdblookup --file db.mmdb --ip 1.2.4.5
Got an error looking up the entry data - The MaxMind DB file's data section contains bad data (unknown data type or corrupt data)

Also, is there any way you could share a copy of a database that exhibits the issue?

I'll look into this and reply later.

@oschwald
Copy link
Member

oschwald commented Jan 4, 2024

If you want to share the database with me privately, you can email me at goschwald@maxmind.com.

The Perl writer is no longer maintained, but I don't think it would produce a corrupt database if a maximum size was exceeded. I believe it would throw an exception when writing.

@oschwald oschwald transferred this issue from maxmind/MaxMind-DB-Reader-python Jan 4, 2024
@suola
Copy link
Author

suola commented Jan 5, 2024

I sent you an email regarding the db file.

We have successfully created the db with the go driver, and will switch from the perl writer to that in the near future.

oschwald added a commit that referenced this issue Jan 8, 2024
Also, add guards to follow-up calculations to error if they overflow.

Closes #335.
oschwald added a commit that referenced this issue Jan 8, 2024
Also, add guards to follow-up calculations to error if they overflow.

Closes #335.
oschwald added a commit that referenced this issue Jan 8, 2024
Also, add guards to follow-up calculations to error if they overflow.

Closes #335.
oschwald added a commit that referenced this issue Jan 8, 2024
Also, add guards to follow-up calculations to error if they overflow.

Closes #335.
@horgh horgh closed this as completed in #336 Jan 9, 2024
@oschwald
Copy link
Member

oschwald commented Jan 9, 2024

maxminddb 2.5.2 has been released with a fix for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants