Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osmium time-filter - recreation of osc files based on history dump #19

Closed
mmd-osm opened this issue Nov 5, 2015 · 4 comments
Closed

Comments

@mmd-osm
Copy link

mmd-osm commented Nov 5, 2015

The issue I'm looking at is to recreate .osc files based on an existing history dump. Instead of creating a huge .osc file based on several years of history, the idea came up to split the history file into smaller chunks and create .osc files based on them, say with 1 day or 1 week granularity. The following thread on the Overpass API dev list gives some more context, but that's optional reading.

I somehow got the impression that osmium time-filter could be used to extract all changes pertaining to a certain timeframe. I tried the following command:

osmium time-filter -o swiss_diff.osh.pbf switzerland-padded-softcut.osh.pbf 2015-01-01T00:00:00Z 2015-01-07T00:00:00Z

... followed by a osmconvert swiss_diff.osh.pbf --out-osc.

Unfortunately, the resulting .osc file contains data outside of the specified timeframe:

<modify>
        <node id="172218" lat="46.265176" lon="6.1340822" version="8" timestamp="2010-09-03T20:53:20Z" changeset="5674130" uid="74847" user="Marc Mongenet"/>
        <node id="172219" lat="46.2635155" lon="6.1331579" version="9" timestamp="2012-04-29T11:03:44Z" changeset="11448508" uid="74847" user="Marc Mongenet"/>
        <node id="172220" lat="46.2615572" lon="6.1326461" version="9" timestamp="2012-04-29T11:03:44Z" changeset="11448508" uid="74847" user="Marc Mongenet"/>
        <node id="172221" lat="46.2597426" lon="6.1330302" version="8" timestamp="2012-04-29T11:03:44Z" changeset="11448508" uid="74847" user="Marc Mongenet"/>
</modify>

I'm not sure if I'm somehow misunderstanding what osmium time-filter is supposed to do on .osh.pbf files. I checked the man page, but somehow I couldn't figure out, what If both FROM-TIME and TO-TIME are given, the result will also have history data is supposed to mean related to my question.

Original file location: http://planet.osm.ch/history/switzerland-padded-softcut.osh.pbf (521MB file size)

@joto
Copy link
Member

joto commented Nov 6, 2015

Yes, a misunderstanding. What osmium time-filter does in the case with a time range is create a file that contains all the data needed for re-creating the state of the planet at any point in time in this time range. So it has all objects in there that were valid at the starting time even if they were created (or modified) before that start time.

What you are trying to do can't be done with current osmium-tool, but it should be easy to do with a small libosmium program. Or you can call osmium time-filter twice, once with the first and once with the last timestamp and then use osmosis with the --derive-change option to create the change file. Just beware that this will not give you all the changes between those points in time, because for multiple intermediate changes only the last one will appear. If this is not good enough, you'll have to roll your own with libosmium (which you still might want to do because it will be much faster).

As a side note: you didn't have to use osmconvert above. Just tell osmium to create the change file and it will happily do that:
osmium time-filter -o swiss_diff.osc switzerland-padded-softcut.osh.pbf 2015-01-01T00:00:00Z 2015-01-07T00:00:00Z

@mmd-osm
Copy link
Author

mmd-osm commented Nov 8, 2015

Thank you for your detailed reply. I have to say that I really misunderstood time-filter. One thing I don't quite get is the difference between the following two statements:

"is create a file that contains all the data needed for re-creating the state of the planet at any point in time in this time range."

vs.

"Just beware that this will not give you all the changes between those points in time, because for multiple intermediate changes only the last one will appear."

My question is basically, where this restriction "multiple intermediate changes only the last one will appear." comes from. Is this something inherit to osmium time-filter, or the approach to call it twice, or even something caused by osmosis.
Somehow this seems to contradict the first statement: to be able to recreate a planet at any point in the given time range, don't I have to basically include all versions in that time frame?

I experimented a bit with DiffObject and added a new method there, which I also called from osmium time-filter.

        bool is_only_between(const osmium::Timestamp& from, const osmium::Timestamp& to) const noexcept {
            return (from <= start_time() && start_time() < to) &&
                   ((start_time() != end_time() && (end_time() >  from && to < end_time()));
        }

Unfortunately, I really only got the last version in the given timeframe.

So for the purpose of creating .osc files with all versions, which I want to feed into Overpass API, I really need to go ahead with the plain vanilla libosmium approach, as you suggested. As I have absolutely no experience with this library, this is going to be a bit of a learning curve.

I wonder if the overall requirement to recreate .osc files based on .osh.* files is worthwhile including in osmium-contrib?

@joto
Copy link
Member

joto commented Nov 9, 2015

If you use time-filter with a range you will get all changes in between those times. But if you call time-filter twice, each time with a different point in time and then create the diff you will loose multiple changes. That is inherent to the "call it twice" approach.

Yes, this could be something for osmium-contrib.

@joto joto closed this as completed Nov 17, 2015
@mmd-osm
Copy link
Author

mmd-osm commented Aug 15, 2016

I tried the following quick hack to split a full history file into daily chunks of osh.pbf files based on an object's timestamp . Although this seemed to work once the number of files has been increased (ulimit -n 20000), it is for sure not a very elegant way of doing it. Also memory requirements were quite high (~24GB). Not sure what the most idiomatic way of accomplishing the same using libosmium would be.

https://github.com/mmd-osm/libosmium/commit/6884726271018126c8eccaea01a6ea071fa438a0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants