Skip to content

Show nonstandard element tags in error messages#3219

Merged
stevecheckoway merged 3 commits into
sparklemotion:mainfrom
stevecheckoway:nonstandard-tags-in-errors
Jun 14, 2024
Merged

Show nonstandard element tags in error messages#3219
stevecheckoway merged 3 commits into
sparklemotion:mainfrom
stevecheckoway:nonstandard-tags-in-errors

Conversation

@stevecheckoway
Copy link
Copy Markdown
Contributor

What problem is this PR intended to solve?

Error messages do not currently show nonstandard element names. This remedies this situation.

Have you included adequate test coverage?

Yes.

Does this change affect the behavior of either the C or the Java implementations?

It changes the behavior of the gumbo parser.

@flavorjones
Copy link
Copy Markdown
Member

This is crashing ... I'll take a look and see if I can figure it out

@flavorjones
Copy link
Copy Markdown
Member

flavorjones commented Jun 7, 2024

Looks like it's crashing on error.c:106:

print_message(output, "%s", tag_name);

@flavorjones
Copy link
Copy Markdown
Member

If I comment out that line, it's crashing here:

Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
Download failed: Invalid argument.  Continuing without source file ./malloc/./malloc/malloc.c.
0x00007ffff7aadd45 in __GI___libc_free (mem=0x587d1cf0) at ./malloc/malloc.c:3375
warning: 3375   ./malloc/malloc.c: No such file or directory
(gdb) where 10
#0  0x00007ffff7aadd45 in __GI___libc_free (mem=0x587d1cf0) at ./malloc/malloc.c:3375
#1  0x00007fffdb36a069 in gumbo_free (ptr=<optimized out>) at util.c:43
#2  0x00007fffdb35929e in gumbo_error_destroy (error=0x5555587bd3b0) at error.c:633
#3  0x00007fffdb362e51 in gumbo_destroy_output (output=0x5555587d1760) at parser.c:4915
#4  0x00007fffdb2480b0 in parse_cleanup (parse_args=140737488335248) at ../../../../ext/nokogiri/gumbo.c:284
#5  0x00005555555a3916 in rb_ensure (b_proc=b_proc@entry=0x7fffdb248dd0 <parse_continue>, data1=data1@entry=140737488335248,
    e_proc=e_proc@entry=0x7fffdb2480a0 <parse_cleanup>, data2=data2@entry=140737488335248) at eval.c:1017
#6  0x00007fffdb248995 in noko_gumbo_s_parse (argc=<optimized out>, argv=<optimized out>, _self=<optimized out>) at ../../../../ext/nokogiri/gumbo.c:351
#7  0x0000555555786099 in vm_call_cfunc_with_frame_ (stack_bottom=<optimized out>, argv=<optimized out>, argc=<optimized out>, calling=<optimized out>,
    reg_cfp=<optimized out>, ec=<optimized out>) at /home/flavorjones/code/oss/ruby/vm_insnhelper.c:3490
#8  vm_call_cfunc_with_frame (calling=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at /home/flavorjones/code/oss/ruby/vm_insnhelper.c:3518
#9  vm_call_cfunc_other (ec=0x555555b2f540, reg_cfp=0x7ffff7dafa28, calling=<optimized out>) at /home/flavorjones/code/oss/ruby/vm_insnhelper.c:3544

@stevecheckoway
Copy link
Copy Markdown
Contributor Author

Okay so either there's some memory corruption happening and the vector's data or length are bad, or I've made a mistake in packing either a GumboTag or a char * into the void *. I'm betting on the latter.

Comment thread test/html5/test_errors.rb Outdated
@flavorjones
Copy link
Copy Markdown
Member

Something weird happened in this test run:

https://github.com/sparklemotion/nokogiri/actions/runs/9487063381/job/26143269473#step:5:276

  1) Failure:
TestHtml5Serialize#test_nonstandard_elements_in_errors [test/html5/test_errors.rb:10]:
Expected /Start tag 'foo'/ to match "1:8: ERROR: html.\n".

  2) Failure:
TestHtml5Serialize#test_nonstandard_elements_in_tag_stack [test/html5/test_errors.rb:16]:
Expected /Currently open tags: html, foo, table/ to match "1:13: ERROR: html.\n".

I can't reproduce it locally (even with the same test seed). Looking at the error message, the only reason I can think of for this to happen is if the output buffer got clobbered or corrupted somehow? But I don't see anything being reported by valgrind, either. I'm re-running the full test suite to see what happens. Any ideas what might be happening here?

@flavorjones
Copy link
Copy Markdown
Member

Holy cow, it's reproducible on x64-mingw32:

https://github.com/sparklemotion/nokogiri/actions/runs/9487063381/job/26198041565

I'm genuinely mystified what's going on here.

@stevecheckoway
Copy link
Copy Markdown
Contributor Author

I'm not sure what could cause that. Is x64-mingw32 unusual in some way?

@flavorjones
Copy link
Copy Markdown
Member

@stevecheckoway I think it might be the only platform affected by this code in error.c? Just guessing at the moment, but this seems like the only reasonable difference

#if _MSC_VER && _MSC_VER < 1900
  if (bytes_written == -1) {
    // vsnprintf returns -1 on older MSVC++ if there's not enough capacity,
    // instead of returning the number of bytes that would've been written had
    // there been enough. In this case, we'll double the buffer size and hope
    // it fits when we retry (letting it fail and returning 0 if it doesn't),
    // since there's no way to smartly resize the buffer.
    gumbo_string_buffer_reserve(output->capacity * 2, output);
    va_start(args, format);
    int result = vsnprintf (
      output->data + output->length,
      remaining_capacity,
      format,
      args
    );
    va_end(args);
    return result == -1 ? 0 : result;
  }
#else
  // -1 in standard C99 indicates an encoding error. Return 0 and do nothing.
  if (bytes_written == -1) {
    return 0;
  }
#endif

@stevecheckoway
Copy link
Copy Markdown
Contributor Author

Huh. Okay, that code doesn't even look correct to me. Let me see if I can fix it.

@flavorjones
Copy link
Copy Markdown
Member

@stevecheckoway Hmm, that's not it. When I precompile for x64-mingw32, _MSC_VER is not defined.

@flavorjones
Copy link
Copy Markdown
Member

I'm going to spin up a windows VM locally and see if I can reproduce this so we have a better feedback loop to iterate/investigate.

@stevecheckoway
Copy link
Copy Markdown
Contributor Author

Okay, even if that ends up not being the culprit, it seems good to fix that anyway.

@flavorjones
Copy link
Copy Markdown
Member

OK, I can reproduce (including your most recent commit 7dea5a4):

image

Digging in ...

@flavorjones
Copy link
Copy Markdown
Member

Diagnosis: vsnprintf is returning -1 on insufficient capacity (because it's a non-UCRT (old enough) platform), but _MSC_VER isn't set in the rake-compiler-dock cross-compilation environment and so the safety net code is not present.

I'll figure out how we can indicate this during compiletime and add it to the cpp conditional ...

@flavorjones
Copy link
Copy Markdown
Member

Grr. I have it patched so the safety-net behavior kicks in on the MSVCRT builds, but vsnprintf is returning -1 even when count is 0.

I've pushed the commit, maybe you can take a second look at what's going on there?

@flavorjones flavorjones force-pushed the nonstandard-tags-in-errors branch 2 times, most recently from 3e38cf0 to 7b92599 Compare June 13, 2024 21:48
@flavorjones
Copy link
Copy Markdown
Member

Just a note that I rebased onto origin/main and force-pushed, you may want to make sure you're up-to-date.

@flavorjones flavorjones force-pushed the nonstandard-tags-in-errors branch from 7b92599 to 9302f2c Compare June 13, 2024 21:59
@flavorjones
Copy link
Copy Markdown
Member

Ah, I figured it out: you need to call with a buffer of NULL, also! Force-pushed. Should be green.

stevecheckoway and others added 3 commits June 13, 2024 20:45
Standards-compliant vsnprintf implementations return the number of
characters they would have written if the buffer was large enough, not
including the null terminator. Some older versions of Visual Studio
return -1 when the buffer is not large enough. For those versions, we
need to call vsnprintf a second time with a count of 0 to get the
number of characters that would have been written. In both the C99 and
old Visual Studio case, we need to
1. Ensure the buffer is large enough to hold the string, including the
   null terminator; and
2. Increase the buffer capacity sufficiently to make repeated calls
   to print_message take linear rather than quadratic time.
so that we can tell if vsnprintf is returning -1 because there's
insufficient capacity or because there's an error.
@stevecheckoway stevecheckoway force-pushed the nonstandard-tags-in-errors branch from 9302f2c to 837d700 Compare June 14, 2024 00:45
@stevecheckoway
Copy link
Copy Markdown
Contributor Author

Nice work tracking that down! I've squashed some of the early flailing commits. Once this goes green again, I'll merge.

@stevecheckoway stevecheckoway enabled auto-merge (rebase) June 14, 2024 00:48
@stevecheckoway stevecheckoway merged commit baa215a into sparklemotion:main Jun 14, 2024
@stevecheckoway stevecheckoway deleted the nonstandard-tags-in-errors branch June 14, 2024 03:50
@flavorjones flavorjones added this to the v1.17.0 milestone Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants