New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The specific limits on keys and values are unclear. #3706
Comments
Code is actually running on Ubuntu 22.04 in production, and I'd prefer if we could check the assumption for that platform. We don't support Windows for cgimap, and have no plans to do so. |
@mmd-osm I tried on your godbolt instance, https://cpp.godbolt.org/z/K1TbqbMhK and it shows that in your C++11 environment, |
|
I should mention, we also have this test case: https://github.com/zerebubuth/openstreetmap-cgimap/blob/master/test/test_parse_osmchange_input.cpp#L445-L468 using 😎 in UTF-8, which is represented by a 4 byte character sequence.
We're on C++17. |
If this ticket is about cgimap then it's on the wrong repository... Let me see if I can move it. |
No I can't sadly, so I'll have to close it here as it's not a rails issue, |
Well, there's also a rails validation, which I think goes by codepoints, the cgimap one seems to, as a result of the environment, go by codepoints as well. |
To be clear rails will count characters, which I'm pretty sure is codepoints (what I think you mean by "scalar values") which is I believe correct and is what cgimap is trying to do. |
On a scale of 0-9, how unwelcome would a PR clarifying the language in comments around this be? |
Are you asking for Rails port or CGImap? You can always send in PRs for CGImap, which is over at: https://github.com/zerebubuth/openstreetmap-cgimap |
I'm thinking in both cases. If there are references to "characters" I would like to clarify that this is scalars/codepoints. The average person reading the word "character" is unlikely to sense that it's not a grapheme, or not a byte. |
In #2025 I tried asking for what the specific limitations on key and value sizes are. These limits are often quoted as "255 characters" or "255 chars", which is ambigous. Some people interpret this as 255 UTF-16 code units, some interpret it as 255 UTF-8 bytes, some interpret it as 255 Unicode Scalar Values.
@mmd-osm points out that the function
unicode_strlen
in openstreetmap-cgimap is responsible for determining the length, which is checked against 255 in order to accept or reject a key or value. This raises issues becauseunicode_strlen
relies onstd::mbsrtowcs
, which will give a length that is dependent on the type ofwchar_t
. My understanding is thatwchar_t
is UTF-16LE/uint16_t on Windows, but it is implementation-dependent, and can be a 32-bit quantity instead, representing a whole Unicode Scalar Value.I would like to have a clear understanding of what specific limits are placed on the length and content of keys and values in changeset tags.
The text was updated successfully, but these errors were encountered: