Why not store IDs as BigInt? #57

metabench · 2022-11-27T23:18:50Z

Once I obtain the records containing string ids (including the referenced nodes in the ways) I create new BigInt objects to replace their string representations.

Has any consideration been made of parsing them into BigInt values within osm-read?

Perhaps it would be a useful option to have if it were not to be done by default. Making it an option would avoid breaking changes for those who expect string values.

marook · 2022-11-28T10:12:18Z

I assume you create the BigInt by invoking it using the string id? For example: BigInt(id)

If this is the case I'm not sure if adding this behavior as a feature flag to osm-read is worth the effort. People which need the id in a number representation can easily do by themself.

Are there any more benefits of parsing the id within osm-read which I have missed @metabench ?

metabench · 2022-11-28T14:24:52Z

The earlier it's represented as BigInt the less time strings longer that 8 bytes need to be stored. It's not a big efficiency difference.

Getting the data from osm-read in the most appropriate type is the largest advantage as far as I can tell. Would make programming it easier and maybe a bit more performant.

metabench · 2022-11-29T00:15:04Z

There would likely be less processing to do between the data that's stored in the protobuf and having usable output if it were parsed as BigInt. I don't know whether or not there is anything in the osm-read codebase that would make it difficult to do, such as relying on a schema or dependency which already parses them into strings.

metabench · 2022-12-31T14:50:20Z

Looking at various TODOs such as

osm-read/lib/pbfParser.js

Line 335 in 411aba2

    
           // TODO we should test wheather adding 64bit numbers works fine with high values

There is no problem with integers of the size we get in OSM PBF files, such as for high node IDs. File positions beyond 2^32 are also fine.

"The Number.MAX_SAFE_INTEGER constant represents the maximum safe integer in JavaScript (253 – 1)." - MDN Web Docs.

It's worth noting that the numeric parts beyond 32bit are lost when doing binary operations such as '>>>'.

When representing these numbers in a TypedArray, 64 bit integer types should be used (signed or unsigned will work, but I go for unsigned when I am only supporting unsigned numbers).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not store IDs as BigInt? #57

Why not store IDs as BigInt? #57

metabench commented Nov 27, 2022

marook commented Nov 28, 2022

metabench commented Nov 28, 2022

metabench commented Nov 29, 2022

metabench commented Dec 31, 2022

Why not store IDs as BigInt? #57

Why not store IDs as BigInt? #57

Comments

metabench commented Nov 27, 2022

marook commented Nov 28, 2022

metabench commented Nov 28, 2022

metabench commented Nov 29, 2022

metabench commented Dec 31, 2022