Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Default custom XPath function handler #464

merged 0 commits into from Dec 12, 2012


None yet
3 participants

mbklein commented May 24, 2011

I've created an easy-to-decorate handler for custom XPath functions, and created code so that Node will use it in the absence of another handler:

  Nokogiri::XML::XPathFunctions.define(:regex) do |node_set, regex|
    node_set.find_all { |node| node['some_attribute'] =~ /#{regex}/ }

  # regex() is now available by default to all XPath expressions on all 
  # nodes of all Nokogiri::XML instances.
  node.xpath('.//title[regex(., "\w+")]')

A custom handler can still be passed to Node#xpath() to achieve the old behavior. To override the baked-in handler without defining a new handler or any new functions, pass nil.

This functionality would allow contributors to build up mixable, shareable libraries of XPath functions without requiring large changes to consumer code.


tenderlove commented May 25, 2011


I'm not 100% opposed to having a default object that we evaluate our context against, but I don't like this API. I would rather that people just do class_eval or extend against some singleton object, and allow them to define methods normally rather than writing our own def. For example:

# Class eval form
Nokogiri::XML::XPathFunctions.class_eval do
  def regex node_set, regexp
    node_set.find_all { |n| n['whatever'] =~ /#{regexp}/ }

# Module Extend
Nokogiri::XML::XPathFunctions.extend(Module.new {
  def regex node_set, regexp
    node_set.find_all { |n| n['whatever'] =~ /#{regexp}/ }

# Module Extend + Super
Nokogiri::XML::XPathFunctions.extend(Module.new {
  def regex node_set, regexp
    super(node_set, regexp + "hi mom!")

I think this would simplify the implementation and allow people to use OO constructs more easily.

The other concern I have with this is performance. If we have a default object that we eval against, we're going to be paying respond_to? method calls on every XPath call when we weren't doing that before. I'm not sure that it would impact performance much at all, but I think it's something we need to mind.

I'm luke warm on including this feature. If there are no performance issues, it's probably fine, but I'd like to hear from @flavorjones too.


mbklein commented May 25, 2011

Thanks for the feedback. For what it's worth, I used a contained, anonymously-classed handler for two reasons. One reason was so I could clear out all the existing methods (other than method_missing and send) from the handler to avoid name collisions (which, admittedly, is less of an issue with class_eval than with define_method). The other was to pave the way for optimization, if performance becomes an issue. By keeping the handler class nil until someone actually attaches a method to it, and by nilling it out again if the last custom method is removed, the performance hit is (by default) eliminated.

Both of those goals are still achievable using class_eval and extend instead of (or in addition to) define, though it would help to keep the XPathFunctions class as a utility wrapper around the actual handler class.

The first version of this code existed as a Node decorator and not a full integration. If you don't think the feature is right for Nokogiri's core, I'd be happy to split it back out into an add-on module.


flavorjones commented May 25, 2011

First, let me say that I am totally on board with improving the API around declaring custom xpath handlers.

Regarding performance: libxml2 is smart enough to not invoke our lookup unless a function is invoked in the xpath expression that is not a built-in function (thereby avoiding respond_to? overhead). So, I'm not worried about performance.

Which means that we don't have to worry about nil-ing the handler class. Which means that we could use a singleton object in which all xpath functions are looked-up, without a runtime penalty.

I'd also like to explicitly note that there's no reason we can't be backwards-compatible with this API: if the user chooses to implement his own class and pass an instance of it in as the last argument to #xpath, then we should lookup methods in that object; and use the singleton class by default (if no handler object is provided).

In summary:

  • agree API needs improvement
  • performance should not be affected any way we build it
  • I like @tenderlove's proposed API better than the one implemented in this pull request
  • I'd like very much to maintain backwards API compatibility

mbklein commented May 25, 2011

That's all good to know (especially the performance bit). Thanks. It simplifies the implementation quite a bit.

I'm happy to replace swap out my API for @tenderlove's. I'd like to clarify a couple things before I do:

  1. Backward compatibility was an explicit goal from the beginning, so no surprises or problems there. Passing another handler (or nil) overrides the default handler.
  2. extend vs. class_eval: XPathFunctions.extend(mod) adds mod's methods to the XPathFunctions class, while they actually need to end up on the instance of the class. Options:
    1. Document it and trust people to know that they need to XPathFunctions.class_eval { ... } but XPathFunctions.instance.extend(module). Leave it to them to figure out what they did wrong if they incorrectly call XPathFunctions.extend(module) instead.
    2. Same as (1), but add a warning to XPathFunctions.extend that lets them know that they probably want to extend the instance, not the class.
    3. Override XPathFunctions.extend to send the call to the singleton instance instead, which assumes that nobody is going to want to explicitly extend the XPathFunctions Class object itself.


@mbklein mbklein merged commit f57469c into sparklemotion:master Dec 12, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment