Reduce maximum changeset size to 10k changes #1259

woodpeck · 2016-07-31T21:32:46Z

10k changes ought to be enough for any normal mapping activity. Automatic edits and imports are controlled by scripts anyway so it doesn't make a difference to them (they just have to adapt the limit). Reason for my suggestion is that large changesets are increasingly difficult to handle (frequent timeouts when trying to load and process). The changeset size limit is returned by the API in the "capabilities" request (http://api.openstreetmap.org/api/capabilities) i.e. client software that honours that information will automatically pick up the new limit.

pnorman · 2016-07-31T23:53:53Z

👍

Changesets over 10k cause a problem for uploads, downloads, analysis, and figuring out what was done in them. The API has gotten better at handling large changesets but changesets over a certain size will always be a problem.

Third-party analysis tools also have problems with changesets this large, tending to time out even if they don't need to download the changeset.

Uploads over 10k are prone to timeouts, so anyone creating them will tend to be uploading by chunks, which is not ideal because interrupted connections require manual cleanup of the result.

Best practices have long been to use changesets under 10k, and software like the redaction bot and the TIGER name completion code have picked sizes in the 500-1k range for the maximum changeset size.

A statistical analysis of changeset sizes I did a couple of years ago supports this too. The 99th percentile changeset size is 1.7k and the 99.5th percentile is 3.2k. These values have been decreasing with this year being 1.4k and 2.4k. The proposed limit would only affect 0.09% of changesets this year.

planemad · 2016-08-03T06:44:21Z

This would impact ongoing imports using JOSM. Ticketed to auto split large changesets : https://josm.openstreetmap.de/ticket/13265#ticket

pnorman · 2016-08-03T06:49:35Z

JOSM already auto-splits large changesets. Hopefully with this change we can get people out of the habit of using the horrible chunk upload.

Zverik · 2016-08-03T16:59:53Z

Let's do this. I suggest posting a warning to talk@, dev@ and imports@ mailing lists, and in two weeks merging the pull request. We need a confirmation from OWG that they are okay with this.

mmd-osm · 2016-08-03T19:38:14Z

Are various revert scripts also happy to process 50k changesets after the change, including comments like "Reverting changesets foo, part x/y" or the like?

pnorman · 2016-08-03T20:03:26Z

The revert scripts have to be able to handle changesets > current max size already as there are some 50k+1 changesets, and I believe they do this okay.

This will make it easier to revert large changesets since we'll be able to reliably download them from the API.

SomeoneElseOSM · 2016-08-22T17:02:42Z

Just a comment to echo what @pnorman said above, and to say that we've just had another example where someone created "unrevertable (by JOSM) changesets" https://help.openstreetmap.org/questions/51646/how-to-revert-3-changesets-containing-136000-untagged-nodes?page=1&focusedAnswerId=51648#51648 .

gravitystorm · 2016-08-25T12:45:00Z

Personally I've no major objections to lowering the limit. It's an arbitrary number after all, so a different arbitrary number is fine.

One minor point is that this feels like papering over the cracks. If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

zerebubuth · 2016-08-25T13:18:49Z

If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

One part of the problem is that the protocol has a design flaw; if the connection to the API server is broken, then it's very hard to tell what the state of the upload is, which can be frustrating.

For API 0.7 (or perhaps also back-ported to 0.6.x), my favourite of the many good solutions is to replace the single POST upload with two requests. The first request would be a POST to .../upload/new, returning 202 with a Location: header to a unique upload UUID, and the second PUTting the osmChange data there. GETting that location would return the status of the upload. This means that an interrupted POST would leave an empy, but harmless, upload location and an interrupted PUT could be checked to see if the request was successful, and whether it has been processed or not.

Apologies - that was pretty well off-topic for this issue. I agree with @gravitystorm and have no objection to lowering the limit, especially if we're pretty sure it'll affect <0.1% of uploads.

pnorman · 2016-08-25T23:12:39Z

One minor point is that this feels like papering over the cracks. If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

The changeset limit is more about downloads failing and keeping changesets to an understandable size. 10k changes in a single upload is still likely to fail. When I was doing coastline replacements 2-5k was about the maximum upload chunk size.

I'd actually prefer a maximum changeset size that could reliably be uploaded in one chunk and represents the high end of what is a reasonable amount of work for a person to do in a reasonable amount of time, but that would be controversial.

zerebubuth/openstreetmap-cgimap#111 should help the downloads somewhat, but all the other reasons for decreasing the size remain.

zerebubuth · 2016-08-26T09:36:21Z

I'm afraid to say that zerebubuth/openstreetmap-cgimap#111 is just the beginning and doesn't yet handle changeset downloads (osmChange), only changeset metadata and discussions. I hope to be able to add osmChange downloads soon. If anyone wants to help then it would be appreciated; code, tests, docs, ad-hoc comparisons with the rails API - anything would be helpful.

woodpeck · 2016-10-18T21:42:03Z

Can we keep the various good ideas for the future in the mind while at the same time papering over the cracks by applying this?

Zverik · 2016-12-07T14:00:10Z

So... Are we going to merge this?

grischard · 2017-01-31T17:27:10Z

@tomhughes reducing the maximum changeset size to 10k changes would greatly help the work of the DWG. Would it please be possible to merge it?

tomhughes · 2017-01-31T17:28:41Z

Of course it's possible. The question is what is the correct way to decide on a change as significant as this...

tomhughes · 2017-01-31T17:47:27Z

So to summarise discussion so far it seems we have decided JOSM will adapt automatically.

Presumably iD and Potlatch are extremely unlikely to generate such large changesets.

What about Merkaartor? Various command like tools?

Other than that is was suggested this should be announced somewhere before implementing it which has not been done as far as I know?

Komzpa · 2017-01-31T18:04:51Z

Is there any kind of histogram on the object count in a changeset? So that we see that there is major drop actually at 10000 objects, and then a sudden rise on 50000 objects.

If this exists only to fight with API slowness, then I have to raise the concern about inefficiencies in current API implementation, converting each object into 2x0.4ms sleeps (2 x 0.4ms x 50000 = 40s for any kind of API request), for details/PoC see https://github.com/Komzpa/fastmap

mmd-osm · 2017-01-31T18:07:26Z

re Merkaartor: sources don't seem to include the terms "capabilities" or "maximum_elements". So it's either some hardcoded value or not being checked at all. (@Krakonos)

Question would be if this change could be activated on the dev instance for testing maybe?

tomhughes · 2017-01-31T18:19:18Z

Apparently I'm not going to get any peace until I merge this so consider it done.

DaveF63 · 2017-01-31T21:01:38Z

This proposal won't affect me but...

Are these 0.09% of edits bad data? If not, this appears to be cutting OSMs nose to spite its face. As @gravitystorm points out, improving the code, if possible, would be a step forward rather than a backward step of disallowing the addition of data.

Are these 10k+ uploads from specific contributors or sources?

Krakonos · 2017-02-01T07:47:19Z

I just checked and it seems neither hardcoded limit or capabilities are implemented. I do have rewriting this code on my shortlist, but even my short list seems pretty long right now. I'll at least let users know this change happened.

Zverik · 2017-02-01T09:03:35Z

Just a bit of trivia: of 8.5 mln changesets made in 2016, 6796 (0.08%) contained more than 10k objects, and 6191 of these were uploaded with JOSM.

@DaveF63, please read the comments. This limit won't affect mappers, since JOSM (and soon, other editors) can split changes in multiple changesets.

DaveF63 · 2017-02-01T12:09:21Z

@Zverik I did. Blocking 10k+ changesets is still on table. ID has implemented it.
If JOSM, as claimed, already splits them, & other editors won't be effected, what is being used to upload? To find a solution you first have to understand the problem.

maning mentioned this pull request Aug 3, 2016

New OSM API to limit upload to 10k changes osmlab/labuildings#106

Closed

pnorman mentioned this pull request Aug 21, 2016

Add support for changesets zerebubuth/openstreetmap-cgimap#111

Merged

grischard mentioned this pull request Jan 31, 2017

Reduce maximum changeset size to 10k changes openstreetmap/operations#144

Closed

tomhughes closed this in ad2b4fe Jan 31, 2017

bhousel mentioned this pull request Jan 31, 2017

Honor max changeset elements limit openstreetmap/iD#3810

Open

simonpoole mentioned this pull request Jan 31, 2017

Enforce max num of changes in changeset MarcusWolschon/osmeditor4android#454

Closed

Krakonos mentioned this pull request Feb 1, 2017

Implement full OSM API openstreetmap/merkaartor#127

Open

openstreetmap locked and limited conversation to collaborators Feb 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce maximum changeset size to 10k changes #1259

Reduce maximum changeset size to 10k changes #1259

woodpeck commented Jul 31, 2016

pnorman commented Jul 31, 2016

planemad commented Aug 3, 2016

pnorman commented Aug 3, 2016

Zverik commented Aug 3, 2016

mmd-osm commented Aug 3, 2016

pnorman commented Aug 3, 2016

SomeoneElseOSM commented Aug 22, 2016

gravitystorm commented Aug 25, 2016

zerebubuth commented Aug 25, 2016

pnorman commented Aug 25, 2016

zerebubuth commented Aug 26, 2016

woodpeck commented Oct 18, 2016

Zverik commented Dec 7, 2016

grischard commented Jan 31, 2017

tomhughes commented Jan 31, 2017

tomhughes commented Jan 31, 2017

Komzpa commented Jan 31, 2017

mmd-osm commented Jan 31, 2017 •

edited

tomhughes commented Jan 31, 2017

DaveF63 commented Jan 31, 2017

Krakonos commented Feb 1, 2017

Zverik commented Feb 1, 2017

DaveF63 commented Feb 1, 2017

Reduce maximum changeset size to 10k changes #1259

Reduce maximum changeset size to 10k changes #1259

Conversation

woodpeck commented Jul 31, 2016

pnorman commented Jul 31, 2016

planemad commented Aug 3, 2016

pnorman commented Aug 3, 2016

Zverik commented Aug 3, 2016

mmd-osm commented Aug 3, 2016

pnorman commented Aug 3, 2016

SomeoneElseOSM commented Aug 22, 2016

gravitystorm commented Aug 25, 2016

zerebubuth commented Aug 25, 2016

pnorman commented Aug 25, 2016

zerebubuth commented Aug 26, 2016

woodpeck commented Oct 18, 2016

Zverik commented Dec 7, 2016

grischard commented Jan 31, 2017

tomhughes commented Jan 31, 2017

tomhughes commented Jan 31, 2017

Komzpa commented Jan 31, 2017

mmd-osm commented Jan 31, 2017 • edited

tomhughes commented Jan 31, 2017

DaveF63 commented Jan 31, 2017

Krakonos commented Feb 1, 2017

Zverik commented Feb 1, 2017

DaveF63 commented Feb 1, 2017

mmd-osm commented Jan 31, 2017 •

edited