Reduce maximum changeset size to 10k changes #1259

Closed
wants to merge 1 commit into
from

Projects

None yet
@woodpeck
Contributor

10k changes ought to be enough for any normal mapping activity. Automatic edits and imports are controlled by scripts anyway so it doesn't make a difference to them (they just have to adapt the limit). Reason for my suggestion is that large changesets are increasingly difficult to handle (frequent timeouts when trying to load and process). The changeset size limit is returned by the API in the "capabilities" request (http://api.openstreetmap.org/api/capabilities) i.e. client software that honours that information will automatically pick up the new limit.

@woodpeck woodpeck Reduce maximum changeset size to 10k changes
10k changes ought to be enough for any normal mapping activity. Automatic edits and imports are controlled by scripts anyway so it doesn't make a difference to them (they just have to adapt the limit). Reason for my suggestion is that large changesets are increasingly difficult to handle (frequent timeouts when trying to load and process). The changeset size limit is returned by the API in the "capabilities" request (http://api.openstreetmap.org/api/capabilities) i.e. client software that honours that information will automatically pick up the new limit.
f014010
@pnorman
Contributor
pnorman commented Jul 31, 2016

๐Ÿ‘

Changesets over 10k cause a problem for uploads, downloads, analysis, and figuring out what was done in them. The API has gotten better at handling large changesets but changesets over a certain size will always be a problem.

Third-party analysis tools also have problems with changesets this large, tending to time out even if they don't need to download the changeset.

Uploads over 10k are prone to timeouts, so anyone creating them will tend to be uploading by chunks, which is not ideal because interrupted connections require manual cleanup of the result.

Best practices have long been to use changesets under 10k, and software like the redaction bot and the TIGER name completion code have picked sizes in the 500-1k range for the maximum changeset size.

A statistical analysis of changeset sizes I did a couple of years ago supports this too. The 99th percentile changeset size is 1.7k and the 99.5th percentile is 3.2k. These values have been decreasing with this year being 1.4k and 2.4k. The proposed limit would only affect 0.09% of changesets this year.

@planemad
planemad commented Aug 3, 2016

This would impact ongoing imports using JOSM. Ticketed to auto split large changesets : https://josm.openstreetmap.de/ticket/13265#ticket

@pnorman
Contributor
pnorman commented Aug 3, 2016

JOSM already auto-splits large changesets. Hopefully with this change we can get people out of the habit of using the horrible chunk upload.

@maning maning referenced this pull request in osmlab/labuildings Aug 3, 2016
Closed

New OSM API to limit upload to 10k changes #106

@Zverik
Contributor
Zverik commented Aug 3, 2016

Let's do this. I suggest posting a warning to talk@, dev@ and imports@ mailing lists, and in two weeks merging the pull request. We need a confirmation from OWG that they are okay with this.

@mmd-osm
mmd-osm commented Aug 3, 2016

Are various revert scripts also happy to process 50k changesets after the change, including comments like "Reverting changesets foo, part x/y" or the like?

@pnorman
Contributor
pnorman commented Aug 3, 2016

The revert scripts have to be able to handle changesets > current max size already as there are some 50k+1 changesets, and I believe they do this okay.

This will make it easier to revert large changesets since we'll be able to reliably download them from the API.

@pnorman pnorman referenced this pull request in zerebubuth/openstreetmap-cgimap Aug 21, 2016
Merged

Add support for changesets #111

@SomeoneElseOSM

Just a comment to echo what @pnorman said above, and to say that we've just had another example where someone created "unrevertable (by JOSM) changesets" https://help.openstreetmap.org/questions/51646/how-to-revert-3-changesets-containing-136000-untagged-nodes?page=1&focusedAnswerId=51648#51648 .

@gravitystorm
Contributor

Personally I've no major objections to lowering the limit. It's an arbitrary number after all, so a different arbitrary number is fine.

One minor point is that this feels like papering over the cracks. If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

@zerebubuth
Contributor

If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

One part of the problem is that the protocol has a design flaw; if the connection to the API server is broken, then it's very hard to tell what the state of the upload is, which can be frustrating.

For API 0.7 (or perhaps also back-ported to 0.6.x), my favourite of the many good solutions is to replace the single POST upload with two requests. The first request would be a POST to .../upload/new, returning 202 with a Location: header to a unique upload UUID, and the second PUTting the osmChange data there. GETting that location would return the status of the upload. This means that an interrupted POST would leave an empy, but harmless, upload location and an interrupted PUT could be checked to see if the request was successful, and whether it has been processed or not.

Apologies - that was pretty well off-topic for this issue. I agree with @gravitystorm and have no objection to lowering the limit, especially if we're pretty sure it'll affect <0.1% of uploads.

@pnorman
Contributor
pnorman commented Aug 25, 2016

One minor point is that this feels like papering over the cracks. If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

The changeset limit is more about downloads failing and keeping changesets to an understandable size. 10k changes in a single upload is still likely to fail. When I was doing coastline replacements 2-5k was about the maximum upload chunk size.

I'd actually prefer a maximum changeset size that could reliably be uploaded in one chunk and represents the high end of what is a reasonable amount of work for a person to do in a reasonable amount of time, but that would be controversial.

zerebubuth/openstreetmap-cgimap#111 should help the downloads somewhat, but all the other reasons for decreasing the size remain.

@zerebubuth
Contributor

I'm afraid to say that zerebubuth/openstreetmap-cgimap#111 is just the beginning and doesn't yet handle changeset downloads (osmChange), only changeset metadata and discussions. I hope to be able to add osmChange downloads soon. If anyone wants to help then it would be appreciated; code, tests, docs, ad-hoc comparisons with the rails API - anything would be helpful.

@woodpeck
Contributor

Can we keep the various good ideas for the future in the mind while at the same time papering over the cracks by applying this?

@Zverik
Contributor
Zverik commented Dec 7, 2016

So... Are we going to merge this?

@grischard

@tomhughes reducing the maximum changeset size to 10k changes would greatly help the work of the DWG. Would it please be possible to merge it?

@tomhughes
Member

Of course it's possible. The question is what is the correct way to decide on a change as significant as this...

@tomhughes
Member

So to summarise discussion so far it seems we have decided JOSM will adapt automatically.

Presumably iD and Potlatch are extremely unlikely to generate such large changesets.

What about Merkaartor? Various command like tools?

Other than that is was suggested this should be announced somewhere before implementing it which has not been done as far as I know?

@grischard grischard referenced this pull request in openstreetmap/operations Jan 31, 2017
Closed

Reduce maximum changeset size to 10k changes #144

@Komzpa
Komzpa commented Jan 31, 2017

Is there any kind of histogram on the object count in a changeset? So that we see that there is major drop actually at 10000 objects, and then a sudden rise on 50000 objects.

If this exists only to fight with API slowness, then I have to raise the concern about inefficiencies in current API implementation, converting each object into 2x0.4ms sleeps (2 x 0.4ms x 50000 = 40s for any kind of API request), for details/PoC see https://github.com/Komzpa/fastmap

@mmd-osm
mmd-osm commented Jan 31, 2017 edited

re Merkaartor: sources don't seem to include the terms "capabilities" or "maximum_elements". So it's either some hardcoded value or not being checked at all. (@Krakonos)

Question would be if this change could be activated on the dev instance for testing maybe?

@tomhughes tomhughes added a commit that closed this pull request Jan 31, 2017
@woodpeck @tomhughes woodpeck + tomhughes Reduce maximum changeset size to 10k changes
Ten thousand changes ought to be enough for any normal mapping activity.

Automatic edits and imports are controlled by scripts anyway so it doesn't
make a difference to them as they just have to adapt the limit.

The reason for my suggestion is that large changesets are becoming
increasingly difficult to handle due to frequent timeouts when trying
to load and process them.

The changeset size limit is returned by the API in the "capabilities" request
so client software that honours that information will automatically pick up
the new limit.

Closes #1259
ad2b4fe
@tomhughes tomhughes closed this in ad2b4fe Jan 31, 2017
@tomhughes
Member

Apparently I'm not going to get any peace until I merge this so consider it done.

@bhousel bhousel referenced this pull request in openstreetmap/iD Jan 31, 2017
Open

Honor max changeset elements limit #3810

@DaveF63
DaveF63 commented Jan 31, 2017

This proposal won't affect me but...

Are these 0.09% of edits bad data? If not, this appears to be cutting OSMs nose to spite its face. As @gravitystorm points out, improving the code, if possible, would be a step forward rather than a backward step of disallowing the addition of data.

Are these 10k+ uploads from specific contributors or sources?

@simonpoole simonpoole referenced this pull request in MarcusWolschon/osmeditor4android Jan 31, 2017
Closed

Enforce max num of changes in changeset #454

@Krakonos
Member
Krakonos commented Feb 1, 2017

I just checked and it seems neither hardcoded limit or capabilities are implemented. I do have rewriting this code on my shortlist, but even my short list seems pretty long right now. I'll at least let users know this change happened.

@Krakonos Krakonos referenced this pull request in openstreetmap/merkaartor Feb 1, 2017
Open

Implement full OSM API #127

@Zverik
Contributor
Zverik commented Feb 1, 2017

Just a bit of trivia: of 8.5 mln changesets made in 2016, 6796 (0.08%) contained more than 10k objects, and 6191 of these were uploaded with JOSM.

@DaveF63, please read the comments. This limit won't affect mappers, since JOSM (and soon, other editors) can split changes in multiple changesets.

@DaveF63
DaveF63 commented Feb 1, 2017

@Zverik I did. Blocking 10k+ changesets is still on table. ID has implemented it.
If JOSM, as claimed, already splits them, & other editors won't be effected, what is being used to upload? To find a solution you first have to understand the problem.

@tomhughes tomhughes locked and limited conversation to collaborators Feb 1, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.