jruby improvements + weaker xpath caching #1597

kares · 2017-01-31T13:06:19Z

managed to run into more common problems such as cloning non-cloneable (Java) objects.
a bit ugly pattern in general but for the rest of Ruby object cloning it would need a justification.

further I realized some xpath problems :

using a Ruby function which returns an Integer was not supported (resolved xpath function returning an int fails under jruby #1595)
xpath-context caching would not work for xpath-s using binds or a custom fn-handler

this is now avoided (such expressions always create a new xpath context) - they are less common.
xpath(expr) still caches as before, but I realized the cache makes xpath() not thread-safe #1596

UPDATE: also added a fix for #1440 (also resolves #1281 which is a duplicate - had the same cause)

flavorjones · 2017-02-10T00:01:55Z

Thank you for submitting this, @kares!

Can you please rebase this off current master? It doesn't have 92768e0 in it, so the CI pipeline is failing when testing against libxml system libraries. See https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/ruby-2.4-system-pr/builds/3 for specifics.

no need for a synchronized Hashtable here; size will be 4 99% of time

... loading reflectively makes no sense since the class is already loaded as its part of the method's signature (return type)

has been introduced before the xalan .jars where packed with the gem it was only necessary due changing Java version - no longer the case!

... from JRuby's RubyHash API so that we avoid all is1_9() checks

... much safer to reason about AND REVEALED SOME INTERNAL ISSUES !

... also re-format code spacing to match rest of the file

... (temporarily) removed the existing caching mechanism

we now make sure to only cache xpaths without binds or custom handlers ... it has been disabled in previous commit(s)

... since there's logic everywhere dealing with the case also do NOT initialize nodes with newEmptyArray and than append to it

... to potentially save one piece of (char[]) array copy-ing

(incl. previous commits) causes notably less garbage generation!

also faster due re-using a thread-cached encoder (resolves sparklemotion#1440)

also use enum values instead of wrapping it to a set (to find encoding)

kares · 2017-02-10T07:05:06Z

rebased and all 💚 now ... thanks for the feedback.
hoping this will also get into 1.7.1 but take your time if you need to reviewing this.

numbers show that doc parsing is ~ 10% faster + xpath has gone 2x faster in some of the cases I tested.

flavorjones · 2017-02-10T07:09:58Z

Yep, about to merge! Just writing CHANGELOG entries now.

Thanks so much!

flavorjones · 2017-02-10T07:25:55Z

Merged manually.

kares added 29 commits February 10, 2017 05:53

review NamespaceContext impl - register is never null, use HashMap

64231c1

no need for a synchronized Hashtable here; size will be 4 99% of time

re-arrange + cleanup XmlXpathContext code - no need for Class.forName

3bd82e0

... loading reflectively makes no sense since the class is already loaded as its part of the method's signature (return type)

avoid reflected JAXPExtensionsProvider instantiation

cf37374

has been introduced before the xalan .jars where packed with the gem it was only necessary due changing Java version - no longer the case!

fix a typo in user-key: CACHED_XPATH_CONTEXT

740ddb5

faster gsub-ing on xpath eval -> we avoid regexes and re-use buffer

4a3198e

there's always multiple prefixes - re-use buffer to save some garbage

022e15a

newXPath() is completely redundant + rename internal method

7f703fd

refactor ReaderNode internals -> not Cloneables + common Hash method

e744f5e

... from JRuby's RubyHash API so that we avoid all is1_9() checks

get rid of ElementNode.attributeStrings -> can be computed on demand

ee286ad

get rid of XmlXPathContext cloning -> use the internal constructor

13c4939

... much safer to reason about AND REVEALED SOME INTERNAL ISSUES !

retrieve/insantiate xpath-context as its used + less exception nesting

db795be

... also re-format code spacing to match rest of the file

pass down (xpath) expr as error.message to avoid regression

dbd2d82

re-invent the XPathContext without thread-safety issues

89a094f

... (temporarily) removed the existing caching mechanism

bring back xpath-context caching - but somehow more (thread) safely!

a8255f6

we now make sure to only cache xpaths without binds or custom handlers ... it has been disabled in previous commit(s)

dry-up XmlNodeSet creation code into a factory

165e276

cleanup xpath-function wrapper and make sure Integer returns work

66e23a1

review XmlNodeSet impl - keep (internal) nodes lazily null

0124568

... since there's logic everywhere dealing with the case also do NOT initialize nodes with newEmptyArray and than append to it

never executed - even if it was there's no need to set an empty array

ba6f584

refactor to a method which returns (non-null) nodes for XmlNodeSet

103f051

re-use XmlNodeSet (creation) factory method

d5a25aa

re-use the (static) UTF-8 Charset object

d86a5c3

use (non-synchronized) StringBuilder + retrieve internal buffer

517ad51

... to potentially save one piece of (char[]) array copy-ing

use a hash set for boolean attr name checking; make a static final

bc5b817

use CharSequence around escaping + whitespace check & canonical.

bd74672

(incl. previous commits) causes notably less garbage generation!

unused isWhitespace(IRubyObject) - still refactor with less converting

6cf8d01

prefer the isNil() check

f999c84

use implicit charset encode (which handles un-mappable chars)

e8bcea6

also faster due re-using a thread-cached encoder (resolves sparklemotion#1440)

can avoid yet another toString copy-ing when converting to encoding

137afdd

remove non-used fields/methods and degrade indent type to CharSequence

6b9c1bf

kares added 2 commits February 10, 2017 05:53

prefer using fix2int helper to convert Fixnum to int

c6b4648

correctly handle Charset.forName fails (null argument never happens)

377dde4

also use enum values instead of wrapping it to a set (to find encoding)

kares force-pushed the jruby-review-4x branch from 830e5ea to 377dde4 Compare February 10, 2017 04:54

flavorjones added the platform/jruby label Feb 10, 2017

This was referenced Feb 10, 2017

xpath function returning an int fails under jruby #1595

Closed

Runtime error when calling to_html #1440

Closed

RuntimeError when calling to_s with jruby #1281

Closed

flavorjones closed this Feb 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jruby improvements + weaker xpath caching #1597

jruby improvements + weaker xpath caching #1597

kares commented Jan 31, 2017 •

edited

flavorjones commented Feb 10, 2017

kares commented Feb 10, 2017

flavorjones commented Feb 10, 2017

flavorjones commented Feb 10, 2017

jruby improvements + weaker xpath caching #1597

jruby improvements + weaker xpath caching #1597

Conversation

kares commented Jan 31, 2017 • edited

flavorjones commented Feb 10, 2017

kares commented Feb 10, 2017

flavorjones commented Feb 10, 2017

flavorjones commented Feb 10, 2017

kares commented Jan 31, 2017 •

edited