Skip to content

Loading…

Issue #1645: Performance improvement by reusing the ScriptFragment RegExp object #94

Closed
wants to merge 1 commit into from

2 participants

@sdumitriu

Done.

@savetheclocktower
Collaborator

Thanks for the PR.

My first instinct was that, rather than expose these as properties on Prototype, we should just cache them inside the closure in string.js, like so:

  var SCRIPT_FRAGMENT_ONE = new RegExp(Prototype.ScriptFragment, 'im');
  var SCRIPT_FRAGMENT_ALL = new RegExp(Prototype.ScriptFragment, 'img');

But then I realized just exactly how much we're boxed in:

  1. A user may reasonably expect to be able to change Prototype.ScriptFragment (to cover their own bizarre use case) and have everything work as expected.
  2. Even in the case where we expose ScriptFragmentOne and ScriptFragmentAll as in this PR, we've broken backward-compatibility, because if someone has changed Prototype.ScriptFragment in the past and had it Just Work, now the relevant methods are looking at different properties for how to behave.

Because of these issues (and probably others that I haven't yet thought of), unless this is a huge performance boost that can be demonstrated with benchmarks (i.e., at least 500%), I'm inclined to leave things the way they are. I can justify breaking backcompat for massive speedups, but not for micro-optimizations.

(The simplest alternative that has no side effects would be to have a cache object so that we could go...

  var SCRIPT_FRAGMENT_ONE_CACHE = {};
  var SCRIPT_FRAGMENT_ALL_CACHE = {};

  SCRIPT_FRAGMENT_ONE_CACHE[Prototype.ScriptFragment] = new RegExp(Prototype.ScriptFragment, 'im');
  SCRIPT_FRAGMENT_ALL_CACHE[Prototype.ScriptFragment] = new RegExp(Prototype.ScriptFragment, 'img');

...and then do lookups from that table later on, constructing a RegExp only if the cache check fails. That way, the user is able to change Prototype.ScriptFragment at any point and the make-regexp-from-string penalty is paid only once per unique script fragment. If this speeds up these methods by at least 100%, I'd be glad to do something like that.)

So if you're still interested in getting this in, can you link me to a jsperf test that shows how much faster these approaches would be over the status quo?

@sdumitriu
  1. A user may reasonably expect to be able to change Prototype.ScriptFragment

You mean at runtime, between calls of update()?

I wouldn't call it a huge improvement, (25% in my particular instance, but among many other pieces of code), and it only affects Chrome, so I don't think it's so important. I could go the other way and submit a bug report to V8 (or whoever is responsible for this) and ask them to reuse existing RegExp objects when the same expression is passed to the new RegExp constructor. I don't think it's normal for something as small as this to count for 1 whole second of performance.

On the other hand, I wouldn't consider Prototype.ScriptFragment real API, even if it was public and writable. It's not documented, only someone reading the source code could find it, and it isn't specified what would happen when changing its value (nor even that it's something supported). I've seen much bigger things change in the history of Prototype, in a non-backwards compatible way.

Update: I wrote the JSPerf test and it seems that while in Chrome the performance gain isn't that big (between 10 and 20% in several runs), Opera is severely affected by the constructor (went from 2ops/s to 500ops/s).

@savetheclocktower
Collaborator

You mean at runtime, between calls of update()?

Yes. I know how ridiculous that sounds. :)

On the other hand, I wouldn't consider Prototype.ScriptFragment real API, even if it was public and writable. It's not documented, only someone reading the source code could find it, and it isn't specified what would happen when changing its value (nor even that it's something supported).

I certainly don't consider it part of the "real" API, but I am sensitive to the fact that it's public. I'd also consider the whole Prototype namespace to be deprecated, though, which is why I'm reluctant to introduce new properties.

I've seen much bigger things change in the history of Prototype, in a non-backwards compatible way.

True, though I'm trying to do better than we've done in the past.

Opera is severely affected by the constructor (went from 2ops/s to 500ops/s).

Hmmm — the fact that they're so much faster than everyone else in that situation suggests they're doing something extra crazy, like caching the result of String#replace based on argument signature.

@savetheclocktower
Collaborator

Anyway, I'll think about this some more. Right now I'm leaning toward the cache-table approach because that'd get us the biggest bang for the buck, but it might be redundant in some browsers if they're doing their own internal caching of RegExp objects.

@sdumitriu

OK, I'll try to make a PR with your approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 8, 2013
  1. @sdumitriu
Showing with 5 additions and 5 deletions.
  1. +3 −5 src/prototype/lang/string.js
  2. +2 −0 src/prototype/prototype.js
View
8 src/prototype/lang/string.js
@@ -297,7 +297,7 @@ Object.extend(String.prototype, (function() {
* not sufficiently robust to prevent hack attacks.
**/
function stripScripts() {
- return this.replace(new RegExp(Prototype.ScriptFragment, 'img'), '');
+ return this.replace(Prototype.ScriptFragmentAll, '');
}
/**
@@ -331,10 +331,8 @@ Object.extend(String.prototype, (function() {
* // -> [4, undefined] (and displays 'hello world!' in the alert dialog)
**/
function extractScripts() {
- var matchAll = new RegExp(Prototype.ScriptFragment, 'img'),
- matchOne = new RegExp(Prototype.ScriptFragment, 'im');
- return (this.match(matchAll) || []).map(function(scriptTag) {
- return (scriptTag.match(matchOne) || ['', ''])[1];
+ return (this.match(Prototype.ScriptFragmentAll) || []).map(function(scriptTag) {
+ return (scriptTag.match(Prototype.ScriptFragmentOne) || ['', ''])[1];
});
}
View
2 src/prototype/prototype.js
@@ -140,6 +140,8 @@ var Prototype = {
},
ScriptFragment: '<script[^>]*>([\\S\\s]*?)<\/script\\s*>',
+ ScriptFragmentOne: /<script[^>]*>([\S\s]*?)<\/script\s*>/im,
+ ScriptFragmentAll: /<script[^>]*>([\S\s]*?)<\/script\s*>/img,
JSONFilter: /^\/\*-secure-([\s\S]*)\*\/\s*$/,
/**
Something went wrong with that request. Please try again.