New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate cost of adding metadata #12
Comments
Hi @bdon. I'm a big fan of OSMExpress and the Protomaps extract service. At my company we have some internal tooling that relies on the object version number for caching. We don't need the changeset/user/timestamp. Would you consider adding version numbers to Protomaps extracts? Thanks. |
are you working with a .osmx locally or just a .pbf extract? If an .osmx is it a region or the whole planet? I'm wary to implement this because it will probably double the total db size. Ideally: metadata is optional, and you won't pay the storage cost for it if you don't use it. but I think this depends on migrating from capnproto to flatbuffers (#1) because of how empty fields are stored. |
We are just working with .pbf extracts for now. |
|
on an AWS |
download server at http://protomaps.com/extracts now includes version and timestamp information @invisiblefunnel let me know if this is working for you; I'm working on the ecosystem around these tools so I'm interested in what people are building! |
Thanks @bdon! This is great news. I'll take a look this week and reply back. |
I just grabbed an extract from protomaps, loaded it into josm, fixed a road's name, and uploaded the change. This demonstrates that the extract had the required meta-data (version). I also manually verified that elements had However... I cannot use this as a source to change the shape of a road, since most of the way's nodes are tag-less, and you don't provider them with a Please reconsider including meta-data (or at least |
FWIW this is also a blocker for my use cases which rely on the ID and version to |
just to confirm - to make this work for your use cases only |
Yes, just the version is needed. We don't use timestamps at all. |
Confirmed, |
changed location values from a 64 bit integer to a 96-bit struct that includes the version AWS i3.xlarge: |
That seems pretty reasonable. For the augmented diff use case #17, version information is useful for the same reason as @invisiblefunnel mentioned above, it allows for unique identification of a particular node in order to match it to its metadata.
Can you describe this a bit more? |
Locations were previously stored as 64 bit integers. The records for the "Locations" table in the osmx file occupy contiguous pages of storage on disk, ordered by node ID. Adding a 32 bit version number increases the record size by 50%, so less records fit on a single disk page. When the osmx design (by using lmdb) implements no application level caching. it relies on the kernel to cache pages as they are retrieved from disk. This is tuned to automatically manage a pool in RAM of cached disk pages. Since the locations table is now less dense, it's more likely when fetching Locations that you will need a page that has not been fetched yet or has been evicted from cache. This is just my performance hypothesis, I need to run some benchmarks to determine whether or not it makes any significant difference. |
Here's my test region:
first run on versionless planet: 943 seconds version planet: 873 seconds
|
@blackboxlogic @invisiblefunnel new planet with versions is now online - can you try on https://protomaps.com/extracts ? |
Works perfectly for me. Many thanks @bdon. |
Every element has a version number, so the extracts are usable for editing. |
Yes, the data is stored but I intentionally am excluding it. That seems to be the convention for GDPR compliance. Is that needed for any of your applications? |
I definitely don't need it but it could be plausibly useful* and if you're storing it already then there isn't much to gain by withholding it. Other services handle GDPR by offering the "pii" only to OSM users who have signed in with oAuth, since they have agreed to terms of service. That would, of course, complicate your service by involving oAuth. *Possible use-case: A vandal changes all buildings into parks, I want to remove all |
I have an auth system built which is separate from osm Oauth. I could make PII only available to logged in users. Can you describe your editing workflow in more detail ? I’d like to include it in my SOTM talk and I can mention your username if that’s ok. |
Re: "Describe your workflow" Yes, "ok" to mention my username. |
Great, we can discuss over email. |
Possibly add one or all of:
We would ignore metadata for nodes that have no tags.
The text was updated successfully, but these errors were encountered: