Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce maximum changeset size to 10k changes #1259

Closed
wants to merge 1 commit into from
Closed

Reduce maximum changeset size to 10k changes #1259

wants to merge 1 commit into from

Conversation

woodpeck
Copy link
Contributor

10k changes ought to be enough for any normal mapping activity. Automatic edits and imports are controlled by scripts anyway so it doesn't make a difference to them (they just have to adapt the limit). Reason for my suggestion is that large changesets are increasingly difficult to handle (frequent timeouts when trying to load and process). The changeset size limit is returned by the API in the "capabilities" request (http://api.openstreetmap.org/api/capabilities) i.e. client software that honours that information will automatically pick up the new limit.

10k changes ought to be enough for any normal mapping activity. Automatic edits and imports are controlled by scripts anyway so it doesn't make a difference to them (they just have to adapt the limit). Reason for my suggestion is that large changesets are increasingly difficult to handle (frequent timeouts when trying to load and process). The changeset size limit is returned by the API in the "capabilities" request (http://api.openstreetmap.org/api/capabilities) i.e. client software that honours that information will automatically pick up the new limit.
@pnorman
Copy link
Contributor

pnorman commented Jul 31, 2016

👍

Changesets over 10k cause a problem for uploads, downloads, analysis, and figuring out what was done in them. The API has gotten better at handling large changesets but changesets over a certain size will always be a problem.

Third-party analysis tools also have problems with changesets this large, tending to time out even if they don't need to download the changeset.

Uploads over 10k are prone to timeouts, so anyone creating them will tend to be uploading by chunks, which is not ideal because interrupted connections require manual cleanup of the result.

Best practices have long been to use changesets under 10k, and software like the redaction bot and the TIGER name completion code have picked sizes in the 500-1k range for the maximum changeset size.

A statistical analysis of changeset sizes I did a couple of years ago supports this too. The 99th percentile changeset size is 1.7k and the 99.5th percentile is 3.2k. These values have been decreasing with this year being 1.4k and 2.4k. The proposed limit would only affect 0.09% of changesets this year.

@planemad
Copy link

planemad commented Aug 3, 2016

This would impact ongoing imports using JOSM. Ticketed to auto split large changesets : https://josm.openstreetmap.de/ticket/13265#ticket

@pnorman
Copy link
Contributor

pnorman commented Aug 3, 2016

JOSM already auto-splits large changesets. Hopefully with this change we can get people out of the habit of using the horrible chunk upload.

@Zverik
Copy link
Contributor

Zverik commented Aug 3, 2016

Let's do this. I suggest posting a warning to talk@, dev@ and imports@ mailing lists, and in two weeks merging the pull request. We need a confirmation from OWG that they are okay with this.

@mmd-osm
Copy link
Contributor

mmd-osm commented Aug 3, 2016

Are various revert scripts also happy to process 50k changesets after the change, including comments like "Reverting changesets foo, part x/y" or the like?

@pnorman
Copy link
Contributor

pnorman commented Aug 3, 2016

The revert scripts have to be able to handle changesets > current max size already as there are some 50k+1 changesets, and I believe they do this okay.

This will make it easier to revert large changesets since we'll be able to reliably download them from the API.

@SomeoneElseOSM
Copy link

Just a comment to echo what @pnorman said above, and to say that we've just had another example where someone created "unrevertable (by JOSM) changesets" https://help.openstreetmap.org/questions/51646/how-to-revert-3-changesets-containing-136000-untagged-nodes?page=1&focusedAnswerId=51648#51648 .

@gravitystorm
Copy link
Collaborator

Personally I've no major objections to lowering the limit. It's an arbitrary number after all, so a different arbitrary number is fine.

One minor point is that this feels like papering over the cracks. If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

@zerebubuth
Copy link
Contributor

If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

One part of the problem is that the protocol has a design flaw; if the connection to the API server is broken, then it's very hard to tell what the state of the upload is, which can be frustrating.

For API 0.7 (or perhaps also back-ported to 0.6.x), my favourite of the many good solutions is to replace the single POST upload with two requests. The first request would be a POST to .../upload/new, returning 202 with a Location: header to a unique upload UUID, and the second PUTting the osmChange data there. GETting that location would return the status of the upload. This means that an interrupted POST would leave an empy, but harmless, upload location and an interrupted PUT could be checked to see if the request was successful, and whether it has been processed or not.

Apologies - that was pretty well off-topic for this issue. I agree with @gravitystorm and have no objection to lowering the limit, especially if we're pretty sure it'll affect <0.1% of uploads.

@pnorman
Copy link
Contributor

pnorman commented Aug 25, 2016

One minor point is that this feels like papering over the cracks. If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead.

The changeset limit is more about downloads failing and keeping changesets to an understandable size. 10k changes in a single upload is still likely to fail. When I was doing coastline replacements 2-5k was about the maximum upload chunk size.

I'd actually prefer a maximum changeset size that could reliably be uploaded in one chunk and represents the high end of what is a reasonable amount of work for a person to do in a reasonable amount of time, but that would be controversial.

zerebubuth/openstreetmap-cgimap#111 should help the downloads somewhat, but all the other reasons for decreasing the size remain.

@zerebubuth
Copy link
Contributor

I'm afraid to say that zerebubuth/openstreetmap-cgimap#111 is just the beginning and doesn't yet handle changeset downloads (osmChange), only changeset metadata and discussions. I hope to be able to add osmChange downloads soon. If anyone wants to help then it would be appreciated; code, tests, docs, ad-hoc comparisons with the rails API - anything would be helpful.

@woodpeck
Copy link
Contributor Author

Can we keep the various good ideas for the future in the mind while at the same time papering over the cracks by applying this?

@Zverik
Copy link
Contributor

Zverik commented Dec 7, 2016

So... Are we going to merge this?

@grischard
Copy link
Contributor

@tomhughes reducing the maximum changeset size to 10k changes would greatly help the work of the DWG. Would it please be possible to merge it?

@tomhughes
Copy link
Member

Of course it's possible. The question is what is the correct way to decide on a change as significant as this...

@tomhughes
Copy link
Member

So to summarise discussion so far it seems we have decided JOSM will adapt automatically.

Presumably iD and Potlatch are extremely unlikely to generate such large changesets.

What about Merkaartor? Various command like tools?

Other than that is was suggested this should be announced somewhere before implementing it which has not been done as far as I know?

@Komzpa
Copy link

Komzpa commented Jan 31, 2017

Is there any kind of histogram on the object count in a changeset? So that we see that there is major drop actually at 10000 objects, and then a sudden rise on 50000 objects.

If this exists only to fight with API slowness, then I have to raise the concern about inefficiencies in current API implementation, converting each object into 2x0.4ms sleeps (2 x 0.4ms x 50000 = 40s for any kind of API request), for details/PoC see https://github.com/Komzpa/fastmap

@mmd-osm
Copy link
Contributor

mmd-osm commented Jan 31, 2017

re Merkaartor: sources don't seem to include the terms "capabilities" or "maximum_elements". So it's either some hardcoded value or not being checked at all. (@Krakonos)

Question would be if this change could be activated on the dev instance for testing maybe?

@tomhughes tomhughes closed this in ad2b4fe Jan 31, 2017
@tomhughes
Copy link
Member

Apparently I'm not going to get any peace until I merge this so consider it done.

@DaveF63
Copy link

DaveF63 commented Jan 31, 2017

This proposal won't affect me but...

Are these 0.09% of edits bad data? If not, this appears to be cutting OSMs nose to spite its face. As @gravitystorm points out, improving the code, if possible, would be a step forward rather than a backward step of disallowing the addition of data.

Are these 10k+ uploads from specific contributors or sources?

@Krakonos
Copy link
Member

Krakonos commented Feb 1, 2017

I just checked and it seems neither hardcoded limit or capabilities are implemented. I do have rewriting this code on my shortlist, but even my short list seems pretty long right now. I'll at least let users know this change happened.

@Zverik
Copy link
Contributor

Zverik commented Feb 1, 2017

Just a bit of trivia: of 8.5 mln changesets made in 2016, 6796 (0.08%) contained more than 10k objects, and 6191 of these were uploaded with JOSM.

@DaveF63, please read the comments. This limit won't affect mappers, since JOSM (and soon, other editors) can split changes in multiple changesets.

@DaveF63
Copy link

DaveF63 commented Feb 1, 2017

@Zverik I did. Blocking 10k+ changesets is still on table. ID has implemented it.
If JOSM, as claimed, already splits them, & other editors won't be effected, what is being used to upload? To find a solution you first have to understand the problem.

@openstreetmap openstreetmap locked and limited conversation to collaborators Feb 1, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet