Show nonstandard element tags in error messages#3219
Conversation
|
This is crashing ... I'll take a look and see if I can figure it out |
|
Looks like it's crashing on print_message(output, "%s", tag_name); |
|
If I comment out that line, it's crashing here: |
|
Okay so either there's some memory corruption happening and the vector's data or length are bad, or I've made a mistake in packing either a |
|
Something weird happened in this test run: https://github.com/sparklemotion/nokogiri/actions/runs/9487063381/job/26143269473#step:5:276 I can't reproduce it locally (even with the same test seed). Looking at the error message, the only reason I can think of for this to happen is if the output buffer got clobbered or corrupted somehow? But I don't see anything being reported by valgrind, either. I'm re-running the full test suite to see what happens. Any ideas what might be happening here? |
|
Holy cow, it's reproducible on x64-mingw32: https://github.com/sparklemotion/nokogiri/actions/runs/9487063381/job/26198041565 I'm genuinely mystified what's going on here. |
|
I'm not sure what could cause that. Is x64-mingw32 unusual in some way? |
|
@stevecheckoway I think it might be the only platform affected by this code in error.c? Just guessing at the moment, but this seems like the only reasonable difference #if _MSC_VER && _MSC_VER < 1900
if (bytes_written == -1) {
// vsnprintf returns -1 on older MSVC++ if there's not enough capacity,
// instead of returning the number of bytes that would've been written had
// there been enough. In this case, we'll double the buffer size and hope
// it fits when we retry (letting it fail and returning 0 if it doesn't),
// since there's no way to smartly resize the buffer.
gumbo_string_buffer_reserve(output->capacity * 2, output);
va_start(args, format);
int result = vsnprintf (
output->data + output->length,
remaining_capacity,
format,
args
);
va_end(args);
return result == -1 ? 0 : result;
}
#else
// -1 in standard C99 indicates an encoding error. Return 0 and do nothing.
if (bytes_written == -1) {
return 0;
}
#endif |
|
Huh. Okay, that code doesn't even look correct to me. Let me see if I can fix it. |
|
@stevecheckoway Hmm, that's not it. When I precompile for x64-mingw32, |
|
I'm going to spin up a windows VM locally and see if I can reproduce this so we have a better feedback loop to iterate/investigate. |
|
Okay, even if that ends up not being the culprit, it seems good to fix that anyway. |
|
OK, I can reproduce (including your most recent commit 7dea5a4): Digging in ... |
|
Diagnosis: vsnprintf is returning -1 on insufficient capacity (because it's a non-UCRT (old enough) platform), but I'll figure out how we can indicate this during compiletime and add it to the cpp conditional ... |
|
Grr. I have it patched so the safety-net behavior kicks in on the MSVCRT builds, but I've pushed the commit, maybe you can take a second look at what's going on there? |
3e38cf0 to
7b92599
Compare
|
Just a note that I rebased onto |
7b92599 to
9302f2c
Compare
|
Ah, I figured it out: you need to call with a |
Standards-compliant vsnprintf implementations return the number of characters they would have written if the buffer was large enough, not including the null terminator. Some older versions of Visual Studio return -1 when the buffer is not large enough. For those versions, we need to call vsnprintf a second time with a count of 0 to get the number of characters that would have been written. In both the C99 and old Visual Studio case, we need to 1. Ensure the buffer is large enough to hold the string, including the null terminator; and 2. Increase the buffer capacity sufficiently to make repeated calls to print_message take linear rather than quadratic time.
so that we can tell if vsnprintf is returning -1 because there's insufficient capacity or because there's an error.
9302f2c to
837d700
Compare
|
Nice work tracking that down! I've squashed some of the early flailing commits. Once this goes green again, I'll merge. |

What problem is this PR intended to solve?
Error messages do not currently show nonstandard element names. This remedies this situation.
Have you included adequate test coverage?
Yes.
Does this change affect the behavior of either the C or the Java implementations?
It changes the behavior of the gumbo parser.