-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Regression in 1.13.0 in DocumentFragment#css? #2419
Comments
@CvX Thanks for reporting this! It's unrelated to #2418. This is an interesting failure -- and is most likely happening because nokogiri's ruby code is running against something besides the packaged/vendored libxml2 (with the patch Is it possible that you're pulling in a libxml2 library from somewhere else (an earlier version of nokogiri or libxml-ruby or the system libxml2)? Can you please reply with:
I can't tell much from the output of that actions run, unfortunately. |
Hey! Good news, I can reproduce this in the |
@CvX Ah, ok, so I apologize for my confusion above -- this is pretty easily reproducible once I got my local system into a good state. Some background contextFor context, Nokogiri implements CSS selector searches by translating them into the equivalent XPath query. So, for example: #! /usr/bin/env ruby
require 'nokogiri'
css_parser = Nokogiri::CSS::Parser.new
css_parser.parse("a").first.to_xpath("//", Nokogiri::CSS::XPathVisitor.new)
# => "//a" When you write a CSS selector that looks like css_parser.parse("a/@href").first.to_xpath("//", Nokogiri::CSS::XPathVisitor.new)
# => "//a/@href" What changed in v1.13.0Nokogiri v1.13.0 improved HTML5 CSS selector query performance (by about 10X!) by introducing limited "wildcard namespace" functionality into libxml2, and converting CSS queries to use it. (See #2376 and #2403 for details if you're interested in the details.) With that new HTML5 CSS selector translation, the CSS query is converted into a different XPath query: visitor = Nokogiri::CSS::XPathVisitor.new(
builtins: Nokogiri::CSS::XPathVisitor::BuiltinsConfig::ALWAYS,
doctype: Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5,
)
css_parser.parse("a").first.to_xpath("//", visitor)
# => "//*:a"
css_parser.parse("a/@href").first.to_xpath("//", visitor)
# => "//*:a/*:@href" You probably recognize that last XPath query string from your application's error message. The catchOK, here's the catch: the CSS selectors your app is using are invalid, but happens to work. They've always been invalid, but Nokogiri just happened to not catch the invalid syntax and did the right thing in the past. The XPath query you want to generate is But
An immediate workaround (and, actually, my recommendation regardless) is to update this code: fragments.css("a/@href", "img/@src", "source/@src", "track/@src", "video/@poster") to simply use XPath: fragments.xpath("//a/@href", "//img/@src", "//source/@src", "//track/@src", "//video/@poster") You'll avoid the CSS translation step that way, and work around this particular issue until I can ship a backwards-compatible fix. The fixThis use case is interesting! Although I now understand why it worked in previous versions, we had nothing in our test suite like it and to be honest this mash-up of XPath and CSS wasn't part of my mental model. That said, I really like this syntax, and there's nothing stopping us from formally supporting it going forward. So, my punchlist looks like:
There's a bit of work here, so it may take me a day or two to ship a fix. |
Thank you for the detailed write-up! 👏 I will take your recommendation. It was a bit weird in the first place to use this xpath-but-not-really syntax with the |
@CvX Happy to help. Also, I want to point out a mistake I made above, the XPath version of that fragment search is fragments.xpath(".//a/@href", ".//img/@src", ".//source/@src", ".//track/@src", ".//video/@poster") That is, the XPath query prefix for "anywhere in this fragment" should be |
Oh, fixing this is going to involve a healthy and needed refactoring of the CSS parser and the XPath AST visitor. I'll get to delete some code! 👏 |
A small curveball regarding the css -> xpath transition: doc = Nokogiri::HTML5.fragment('<img src="/test.gif">')
fragment = doc.css('img[src]')
fragment.css("img/@src").length == 1
fragment.xpath(".//img/@src").length == 0
fragment.xpath("//img/@src").length == 0 …but as I was typing this, I think I just worked it out: fragment.xpath(".//descendant-or-self::img/@src") The fascinating world of XPath 😂 btw. I also found another almost-xpath |
@CvX Yeah, searching within fragments is wonky and has edge cases with the search path like this. Sorry for your troubles. I've been tracking some of these issues at https://nokogiri.org/ROADMAP.html#documentfragment but also starting to work on a re-implementation at #2184 Thanks for the note about |
Huh, yet another difference! doc = Nokogiri::HTML5.fragment(<<~HTML)
<p><img src="/test.gif"></p>
<p><a href="/test">x</a></p>
HTML
doc.css("a/@href", "img/@src").map(&:value)
# => ["/test.gif", "/test"]
doc.xpath(".//a/@href", ".//img/@src").map(&:value)
# => ["/test", "/test.gif"] But that can also be worked around! doc.xpath(".//a/@href|.//img/@src").map(&:value)
# => ["/test.gif", "/test"] So my original query will look like this: # Instead of:
# fragments.css("a/@href", "img/@src", "source/@src", "track/@src", "video/@poster")
fragments.xpath(".//descendant-or-self::a/@href|.//descendant-or-self::img/@src|.//descendant-or-self::source/@src|.//descendant-or-self::track/@src|.//descendant-or-self::video/@poster") I'm starting to really appreciate the mixed syntax! 😅 |
Yeah, fragment searching is pretty bad, sorry. If you prefer, you could hold off on upgrading for a day and I should be able to get a patch release out. |
This commit removes "@" from the IDENT token so that we can create a new grammar rule in the parser for XPath attributes. Fixes #2419
And officially document the XPath attribute extensions to CSS selector syntax that we support. See #2419 for context
PR with a proposed fix is at #2422 |
And officially document the XPath attribute extensions to CSS selector syntax that we support. See #2419 for context
This commit removes "@" from the IDENT token so that we can create a new grammar rule in the parser for XPath attributes. Fixes #2419
And officially document the XPath attribute extensions to CSS selector syntax that we support. See #2419 for context
waiting for the backport to go green at #2423 and then I'll cut v1.13.1, probably tomorrow morning. |
v1.13.1 is out now: https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.1 |
The following code used to work before 1.13.0:
Now it raises:
Stack trace:
Additional context
You can see the failure in the wild here: https://github.com/discourse/discourse/runs/4768400926?check_suite_focus=true#step:25:23
I don't know if it's related to #2418
The text was updated successfully, but these errors were encountered: