Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not store IDs as BigInt? #57

Open
metabench opened this issue Nov 27, 2022 · 4 comments
Open

Why not store IDs as BigInt? #57

metabench opened this issue Nov 27, 2022 · 4 comments

Comments

@metabench
Copy link

Once I obtain the records containing string ids (including the referenced nodes in the ways) I create new BigInt objects to replace their string representations.

Has any consideration been made of parsing them into BigInt values within osm-read?

Perhaps it would be a useful option to have if it were not to be done by default. Making it an option would avoid breaking changes for those who expect string values.

@marook
Copy link
Owner

marook commented Nov 28, 2022

I assume you create the BigInt by invoking it using the string id? For example: BigInt(id)

If this is the case I'm not sure if adding this behavior as a feature flag to osm-read is worth the effort. People which need the id in a number representation can easily do by themself.

Are there any more benefits of parsing the id within osm-read which I have missed @metabench ?

@metabench
Copy link
Author

The earlier it's represented as BigInt the less time strings longer that 8 bytes need to be stored. It's not a big efficiency difference.

Getting the data from osm-read in the most appropriate type is the largest advantage as far as I can tell. Would make programming it easier and maybe a bit more performant.

@metabench
Copy link
Author

There would likely be less processing to do between the data that's stored in the protobuf and having usable output if it were parsed as BigInt. I don't know whether or not there is anything in the osm-read codebase that would make it difficult to do, such as relying on a schema or dependency which already parses them into strings.

@metabench
Copy link
Author

Looking at various TODOs such as

// TODO we should test wheather adding 64bit numbers works fine with high values

There is no problem with integers of the size we get in OSM PBF files, such as for high node IDs. File positions beyond 2^32 are also fine.

"The Number.MAX_SAFE_INTEGER constant represents the maximum safe integer in JavaScript (253 – 1)." - MDN Web Docs.

It's worth noting that the numeric parts beyond 32bit are lost when doing binary operations such as '>>>'.

When representing these numbers in a TypedArray, 64 bit integer types should be used (signed or unsigned will work, but I go for unsigned when I am only supporting unsigned numbers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants