Skip to content

Commit

Permalink
fix: CSS pseudo-classes that are invalid XPath function names raise
Browse files Browse the repository at this point in the history
This is an alternative to #3197 (which was reverted) in which the
exception is raised from the XPathVisitor and not the CSS
Parser.

Semantically, this is valid CSS, and so the Parser shouldn't
raise. But it is invalid XPath, and so it's the responsibility of the
Visitor to raise.

Closes #3193.
  • Loading branch information
flavorjones committed Jun 11, 2024
1 parent adece3c commit b918ce1
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 1 deletion.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA

### Fixed

* [CRuby] libgumbo (the HTML5 parser) treats reaching max-depth as EOF. This addresses a class of issues when the parser is interrupted in this way. [#3121] @stevecheckoway
* `Node#clone`, `NodeSet#clone`, and `*::Document#clone` all properly copy the metaclass of the original as expected. Previously, `#clone` had been aliased to `#dup` for these classes (since v1.3.0 in 2009). [#316, #3117] @flavorjones
* CSS queries for pseudo-selectors that cannot be translated into XPath expressions now raise a more descriptive `Nokogiri::CSS::SyntaxError` when they are parsed. Previously, an invalid XPath expression was evaluated and a hard-to-understand XPath error was raised by the query engine. [#3193] @flavorjones
* [CRuby] libgumbo (the HTML5 parser) treats reaching max-depth as EOF. This addresses a class of issues when the parser is interrupted in this way. [#3121] @stevecheckoway
* [CRuby] Update node GC lifecycle to avoid a potential memory leak with fragments in libxml 2.13.0 caused by changes in `xmlAddChild`. [#3156] @flavorjones


Expand Down
9 changes: 9 additions & 0 deletions lib/nokogiri/css/xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ def visit_function(node)
is_direct = node.value[1].value[0].nil? # e.g. "has(> a)", "has(~ a)", "has(+ a)"
".#{"//" unless is_direct}#{node.value[1].accept(self)}"
else
validate_xpath_function_name(node.value.first)

# xpath function call, let's marshal those arguments
args = ["."]
args += node.value[1..-1].map do |n|
Expand Down Expand Up @@ -207,6 +209,7 @@ def visit_pseudo_class(node)
when "parent" then "node()"
when "root" then "not(parent::*)"
else
validate_xpath_function_name(node.value.first)
"nokogiri:#{node.value.first}(.)"
end
end
Expand Down Expand Up @@ -270,6 +273,12 @@ def accept(node)

private

def validate_xpath_function_name(name)
if name.start_with?("-")
raise Nokogiri::CSS::SyntaxError, "Invalid XPath function name '#{name}'"
end
end

def html5_element_name_needs_namespace_handling(node)
# if this is the wildcard selector "*", use it as normal
node.value.first != "*" &&
Expand Down
6 changes: 6 additions & 0 deletions test/css/test_xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,12 @@ def assert_xpath(expecteds, asts)
assert_xpath("//*[not(@id='foo')]", parser.parse(":not(#foo)"))
assert_xpath("//*[count(preceding-sibling::*)=0]", parser.parse(":first-child"))
end

it "raises an exception for pseudo-classes that are not XPath Names" do
# see https://github.com/sparklemotion/nokogiri/issues/3193
assert_raises(Nokogiri::CSS::SyntaxError) { Nokogiri::CSS.xpath_for("div:-moz-drag-over") }
assert_raises(Nokogiri::CSS::SyntaxError) { Nokogiri::CSS.xpath_for("div:-moz-drag-over()") }
end
end

describe "combinators" do
Expand Down

0 comments on commit b918ce1

Please sign in to comment.