-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce maximum changeset size to 10k changes #1259
Conversation
10k changes ought to be enough for any normal mapping activity. Automatic edits and imports are controlled by scripts anyway so it doesn't make a difference to them (they just have to adapt the limit). Reason for my suggestion is that large changesets are increasingly difficult to handle (frequent timeouts when trying to load and process). The changeset size limit is returned by the API in the "capabilities" request (http://api.openstreetmap.org/api/capabilities) i.e. client software that honours that information will automatically pick up the new limit.
👍 Changesets over 10k cause a problem for uploads, downloads, analysis, and figuring out what was done in them. The API has gotten better at handling large changesets but changesets over a certain size will always be a problem. Third-party analysis tools also have problems with changesets this large, tending to time out even if they don't need to download the changeset. Uploads over 10k are prone to timeouts, so anyone creating them will tend to be uploading by chunks, which is not ideal because interrupted connections require manual cleanup of the result. Best practices have long been to use changesets under 10k, and software like the redaction bot and the TIGER name completion code have picked sizes in the 500-1k range for the maximum changeset size. A statistical analysis of changeset sizes I did a couple of years ago supports this too. The 99th percentile changeset size is 1.7k and the 99.5th percentile is 3.2k. These values have been decreasing with this year being 1.4k and 2.4k. The proposed limit would only affect 0.09% of changesets this year. |
This would impact ongoing imports using JOSM. Ticketed to auto split large changesets : https://josm.openstreetmap.de/ticket/13265#ticket |
JOSM already auto-splits large changesets. Hopefully with this change we can get people out of the habit of using the horrible chunk upload. |
Let's do this. I suggest posting a warning to talk@, dev@ and imports@ mailing lists, and in two weeks merging the pull request. We need a confirmation from OWG that they are okay with this. |
Are various revert scripts also happy to process 50k changesets after the change, including comments like "Reverting changesets foo, part x/y" or the like? |
The revert scripts have to be able to handle changesets > current max size already as there are some 50k+1 changesets, and I believe they do this okay. This will make it easier to revert large changesets since we'll be able to reliably download them from the API. |
Just a comment to echo what @pnorman said above, and to say that we've just had another example where someone created "unrevertable (by JOSM) changesets" https://help.openstreetmap.org/questions/51646/how-to-revert-3-changesets-containing-136000-untagged-nodes?page=1&focusedAnswerId=51648#51648 . |
Personally I've no major objections to lowering the limit. It's an arbitrary number after all, so a different arbitrary number is fine. One minor point is that this feels like papering over the cracks. If the diff uploads are taking too long and timing out, we should find out the cause - either the code is inefficient (likely more complex for deletes than additions), or the databases are too busy. It could be better to solve those problems instead. |
One part of the problem is that the protocol has a design flaw; if the connection to the API server is broken, then it's very hard to tell what the state of the upload is, which can be frustrating. For API 0.7 (or perhaps also back-ported to 0.6.x), my favourite of the many good solutions is to replace the single POST upload with two requests. The first request would be a POST to Apologies - that was pretty well off-topic for this issue. I agree with @gravitystorm and have no objection to lowering the limit, especially if we're pretty sure it'll affect <0.1% of uploads. |
The changeset limit is more about downloads failing and keeping changesets to an understandable size. 10k changes in a single upload is still likely to fail. When I was doing coastline replacements 2-5k was about the maximum upload chunk size. I'd actually prefer a maximum changeset size that could reliably be uploaded in one chunk and represents the high end of what is a reasonable amount of work for a person to do in a reasonable amount of time, but that would be controversial. zerebubuth/openstreetmap-cgimap#111 should help the downloads somewhat, but all the other reasons for decreasing the size remain. |
I'm afraid to say that zerebubuth/openstreetmap-cgimap#111 is just the beginning and doesn't yet handle changeset downloads (osmChange), only changeset metadata and discussions. I hope to be able to add osmChange downloads soon. If anyone wants to help then it would be appreciated; code, tests, docs, ad-hoc comparisons with the rails API - anything would be helpful. |
Can we keep the various good ideas for the future in the mind while at the same time papering over the cracks by applying this? |
So... Are we going to merge this? |
@tomhughes reducing the maximum changeset size to 10k changes would greatly help the work of the DWG. Would it please be possible to merge it? |
Of course it's possible. The question is what is the correct way to decide on a change as significant as this... |
So to summarise discussion so far it seems we have decided JOSM will adapt automatically. Presumably iD and Potlatch are extremely unlikely to generate such large changesets. What about Merkaartor? Various command like tools? Other than that is was suggested this should be announced somewhere before implementing it which has not been done as far as I know? |
Is there any kind of histogram on the object count in a changeset? So that we see that there is major drop actually at 10000 objects, and then a sudden rise on 50000 objects. If this exists only to fight with API slowness, then I have to raise the concern about inefficiencies in current API implementation, converting each object into 2x0.4ms sleeps (2 x 0.4ms x 50000 = 40s for any kind of API request), for details/PoC see https://github.com/Komzpa/fastmap |
re Merkaartor: sources don't seem to include the terms "capabilities" or "maximum_elements". So it's either some hardcoded value or not being checked at all. (@Krakonos) Question would be if this change could be activated on the dev instance for testing maybe? |
Apparently I'm not going to get any peace until I merge this so consider it done. |
This proposal won't affect me but... Are these 0.09% of edits bad data? If not, this appears to be cutting OSMs nose to spite its face. As @gravitystorm points out, improving the code, if possible, would be a step forward rather than a backward step of disallowing the addition of data. Are these 10k+ uploads from specific contributors or sources? |
I just checked and it seems neither hardcoded limit or capabilities are implemented. I do have rewriting this code on my shortlist, but even my short list seems pretty long right now. I'll at least let users know this change happened. |
Just a bit of trivia: of 8.5 mln changesets made in 2016, 6796 (0.08%) contained more than 10k objects, and 6191 of these were uploaded with JOSM. @DaveF63, please read the comments. This limit won't affect mappers, since JOSM (and soon, other editors) can split changes in multiple changesets. |
@Zverik I did. Blocking 10k+ changesets is still on table. ID has implemented it. |
10k changes ought to be enough for any normal mapping activity. Automatic edits and imports are controlled by scripts anyway so it doesn't make a difference to them (they just have to adapt the limit). Reason for my suggestion is that large changesets are increasingly difficult to handle (frequent timeouts when trying to load and process). The changeset size limit is returned by the API in the "capabilities" request (http://api.openstreetmap.org/api/capabilities) i.e. client software that honours that information will automatically pick up the new limit.