Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Remove BOM in jQuery.trim. #897

Closed
wants to merge 1 commit into from

6 participants

@wwalser

In response to pull request #896, see discussion.

This adds 7 bytes

@jaubourg jaubourg commented on the diff
test/unit/core.js
@@ -278,6 +278,10 @@ test("trim", function() {
equal( jQuery.trim( null ), "", "Null" );
equal( jQuery.trim( 5 ), "5", "Number" );
equal( jQuery.trim( false ), "false", "Boolean" );
+
+ equal( jQuery.trim(" ").length, 0, "space should be trimmed");
+ equal( jQuery.trim("\xA0").length, 0, "nbsp should be trimmed");
+ equal( jQuery.trim("\uFEFF").length, 0, "zwsp should be trimmed");
@jaubourg Collaborator

Just curious as to the reasoning for not comparing to the empty string. If trim was to return an empty array when it encounters a BOM character, these tests would still pass.

Also, it would probably be nice to have tests against strings that would not end up empty, like "\xA0hello" and some other variations.

@wwalser
wwalser added a note

I suddenly understand why the typed language guys are all high and mighty. Yeah, I'll assert against the empty string.

Yup, I'll write an assertion or three that doesn't trim to zero length.

Thanks @jaubourg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jaubourg jaubourg commented on the diff
src/core.js
@@ -37,8 +37,8 @@ var
core_rnotwhite = /\S/,
core_rspace = /\s+/,
- // IE doesn't match non-breaking spaces with \s
- rtrim = core_rnotwhite.test("\xA0") ? (/^[\s\xA0]+|[\s\xA0]+$/g) : /^\s+|\s+$/g,
+ // IE doesn't match many whitespace characters with \s
+ rtrim = core_rnotwhite.test("\xA0") ? /^[\s\xA0\uFEFF]+|[\s\xA0\uFEFF]+$/g : /^\s+|\s+$/g,
@jaubourg Collaborator

Just curious, are we sure we don't have an implementation that will understand \xA0 as being a whitespace but not \uFEFF? Seems like a hell of an inference here.

We could probably use something like ( "\xA0\uFEFF".replace( core_rnotwhite, "" ) === "\xA0\uFEFF" ) to make sure (provided we make core_rnotwhite a global regexp). It's not perfect, but it protects us from false negatives (so to speak).

@gibson042 Collaborator

Better still would be core_rnotwhite.test("\uFEFF\xA0") ? /^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g : /^\s+|\s+$/g

@jaubourg Collaborator

And @gibson042 strikes again. You're right, of course. My only excuse is that I didn't get enough sleep.

@wwalser
wwalser added a note

It's an inference based on our understanding of how regular expressions are implemented. If \s matches \xA0 then \s picks up unicode characters, if not then it doesn't. It's not a check for the specific characters that we want to trim as whitespace, it's feature detection.

"Does your \s match unicode characters?"
"Yes, awesome we'll use \s"
"No, too bad, we'll enumerate the unicode characters we want to trim one at a time."

@jaubourg Collaborator

Please, read your code carefully:

"Does your \s match \xA0 and \xA0 only?"
"Yes, awesome, let's assume it matches \uFEFF too and use \s"
"No, too bad, we'll enumerate all of the unicode characters we want to trim one at a time."

What happens if you have an implementation that matches \xA0 and not \uFEFF? It may not be likely but better safe than sorry: let's feature detect for real. Why the need for a weak inference like this?

@wwalser
wwalser added a note

I appreciate what you're getting at it just seems along the same lines of questioning whether test() and replace() do the same thing when encountering a regex containing \s. We know that it does and so after testing with test() we feel safe using replace(). Being that every browser where \s matches \xA0 also matches \uFEFF reality seemed to be on my side.

It wasn't a need, It was for the lack of need of a more strict inference. I had heard you guys were bending over backward trying to save bytes over the past few months.

I'll change the code.

@jaubourg Collaborator

Actually I'd expect the use of \xA0\uFEFF a third time would actually gain size, but I'll defer to @gibson042 on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@FagnerMartinsBrack

Since jQuery.trim uses a polyfill for String.trim it would be nice if it could match the full spec.
If this is used just internally for jQuery as @dmethvin said there's no reason to expose as an utility, right?

@timmywil
Collaborator

Let's not go overboard here. jQuery.trim is a perfectly useful public utility that happens to utilize native trim where it can. We don't need to drop it just because it doesn't trim all random unicode space characters in every browser. Let's be honest and make up statistics. 99.9% of users don't care about these characters. They care about " ".

@dmethvin
Owner

Since jQuery.trim uses a polyfill for String.trim

The documentation says nothing about this.

If this is used just internally for jQuery as @dmethvin said

I said " jQuery's internal utility methods are often intended for specific situations inside the library where the complex edge cases don't matter." It wasn't our intention to expose a polyfill of String.prototype.trim under the name of $.trim and be tied to a spec that requres things we don't need internally.

there's no reason to expose as an utility, right?

The only reason to continue to expose it is because the cow has left the barn, the cat is out of the bag, the condor has taken flight. We definitely would not have exposed it if we understood the level of bikeshedding it would induce.

@FagnerMartinsBrack

The documentation says nothing about this.
99.9% of users don't care about these characters

Sure, that was just a personal thought based in the source and in the fact that is a public api...

@wwalser

Sorry to have brought the bikeshed back to the forefront. I obviously can't speak for the entire community but if this were moved out of core and into a plugin that wouldn't seem inappropriate.

@jaubourg
Collaborator

@wwalser Handling BOM seems like a reasonable endeavor to me. It's the idea of a full-fledged trim polyfill for the sake of it that's a bit overkill.

So thanks for the PR ;)

@dmethvin dmethvin closed this in 2b5b4eb
@dmethvin
Owner

I updated the tests and landed this so it would get into 1.8.1, thanks!

@dmethvin dmethvin referenced this pull request from a commit
@dmethvin dmethvin Revert "Fix #12350. Remove BOM in jQuery.trim. Close gh-897."
This reverts commit 2b5b4eb.

String.prototype.trim doesn't trim BOM in Safari 5.0 so this won't work without additional feature detects.

http://swarm.jquery.org/result/165379
ac043b1
@dmethvin dmethvin reopened this
@dmethvin
Owner

Breaks in Safari 5.0. So now we not only need to detect a missing String.prototype.trim, but we need to detect a broken String.prototype.trim. I fear it will take some bytes to do this. http://swarm.jquery.org/result/165379

@gibson042
Collaborator

Wait, are you saying that Safari 5.0 recognizes BOM as whitespace in the regex engine but doesn't trim it?

@jaubourg
Collaborator

Seems like it... makes you wonder about this BOM/NBSP feature inference now, doesn't it? :P

@wwalser

Maybe. The new inference that if \s recognizes \xA0 it will also recognize \uFEFF may have failed or the existing inference that regexs \s recognition is the same as what's trimmed by prototype.trim might have failed. We can't actually tell from this test result.

It's not actually any of the inferences that are hurting us here. As Dave pointed out it's the existence of a String.prototype.trim which is broken that's hurting us.

    trim: core_trim ?
        function( text ) {
            return text == null ?
                "" :
                core_trim.call( text );
        } :

        // Otherwise use our own trimming functionality
        function( text ) {
            return text == null ?
                "" :
                text.toString().replace( rtrim, "" );
        },

core_trim is true. rtrim, the result of said inference is never used. String.prototype.trim exists and is not implemented to spec. Unfortunate, unsurprising, and true.

This would require additional bytes, scrap it.

Thanks guys! sorry it didn't work out.

@jaubourg
Collaborator

@wwalser the remark shouldn't be taken literally. What I'm saying is that, when you cannot even trust standard methods to be implemented correctly, you're probably better off not making any inference in your code, period.

It's not about the browsers that jQuery supports today, it's about the ones it will have to support tomorrow (FF & Chrome are very aggressive and very breaking in that department). Everytime we release a new version, we hope for it to be the latest in it's 1.x branch. If such is the case, then some sites will have this version for a looooong time (1.4.4 anyone?). Now, comes along a browser that does break a single assumption...

Feature detection should be as bullet proof as humanly possible (as in testing for the exact blank the workaround is supposed to fill), feature inference should be avoided at all cost, since it doesn't test for the blank itself and, thus, is just browser sniffing in disguise, even if it actually sniffs a large range of browsers: "this is true, and in every browser I know I can deduce this totally unrelated thing is true too so it will always be true no matter what".

I find it sad that we could actually feature detect here but just refuse to do so because "we know of no browser that would break the assumption" or "it saves 4 bytes". That doesn't sit well with me regarding how jQuery is supposed to endure a little past the few browser versions it supports right this instant.

And I know we cannot safeguard against all future bugs, but I sure hope we can properly handle the ones we fix today so that the code is adapted to variations on the same theme.

I hope I made my reasoning a little clearer.

@dmethvin
Owner

Landed gh-902 instead. @wwalser thanks for your work here, I hope it's a little clearer what we go through for even the simplest changes. :sweat: When we're supporting so many (broken) browsers, nothing is simple! We are hoping to have automatic Testswarm runs for pull requests soon, which would have identified the problem with Safari 5.0 before the code landed in master.

@dmethvin dmethvin closed this
@wwalser

Nice, thanks for the update.

@mescoda mescoda referenced this pull request from a commit in mescoda/jquery
@wwalser wwalser Fix #12350. Remove BOM in jQuery.trim. Close gh-897. 687cf1d
@mescoda mescoda referenced this pull request from a commit in mescoda/jquery
@dmethvin dmethvin Revert "Fix #12350. Remove BOM in jQuery.trim. Close gh-897."
This reverts commit 2b5b4eb.

String.prototype.trim doesn't trim BOM in Safari 5.0 so this won't work without additional feature detects.

http://swarm.jquery.org/result/165379
76ea003
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 17, 2012
  1. @wwalser

    Remove BOM in jQuery.trim.

    wwalser authored
This page is out of date. Refresh to see the latest.
Showing with 7 additions and 3 deletions.
  1. +2 −2 src/core.js
  2. +5 −1 test/unit/core.js
View
4 src/core.js
@@ -37,8 +37,8 @@ var
core_rnotwhite = /\S/,
core_rspace = /\s+/,
- // IE doesn't match non-breaking spaces with \s
- rtrim = core_rnotwhite.test("\xA0") ? (/^[\s\xA0]+|[\s\xA0]+$/g) : /^\s+|\s+$/g,
+ // IE doesn't match many whitespace characters with \s
+ rtrim = core_rnotwhite.test("\xA0") ? /^[\s\xA0\uFEFF]+|[\s\xA0\uFEFF]+$/g : /^\s+|\s+$/g,
@jaubourg Collaborator

Just curious, are we sure we don't have an implementation that will understand \xA0 as being a whitespace but not \uFEFF? Seems like a hell of an inference here.

We could probably use something like ( "\xA0\uFEFF".replace( core_rnotwhite, "" ) === "\xA0\uFEFF" ) to make sure (provided we make core_rnotwhite a global regexp). It's not perfect, but it protects us from false negatives (so to speak).

@gibson042 Collaborator

Better still would be core_rnotwhite.test("\uFEFF\xA0") ? /^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g : /^\s+|\s+$/g

@jaubourg Collaborator

And @gibson042 strikes again. You're right, of course. My only excuse is that I didn't get enough sleep.

@wwalser
wwalser added a note

It's an inference based on our understanding of how regular expressions are implemented. If \s matches \xA0 then \s picks up unicode characters, if not then it doesn't. It's not a check for the specific characters that we want to trim as whitespace, it's feature detection.

"Does your \s match unicode characters?"
"Yes, awesome we'll use \s"
"No, too bad, we'll enumerate the unicode characters we want to trim one at a time."

@jaubourg Collaborator

Please, read your code carefully:

"Does your \s match \xA0 and \xA0 only?"
"Yes, awesome, let's assume it matches \uFEFF too and use \s"
"No, too bad, we'll enumerate all of the unicode characters we want to trim one at a time."

What happens if you have an implementation that matches \xA0 and not \uFEFF? It may not be likely but better safe than sorry: let's feature detect for real. Why the need for a weak inference like this?

@wwalser
wwalser added a note

I appreciate what you're getting at it just seems along the same lines of questioning whether test() and replace() do the same thing when encountering a regex containing \s. We know that it does and so after testing with test() we feel safe using replace(). Being that every browser where \s matches \xA0 also matches \uFEFF reality seemed to be on my side.

It wasn't a need, It was for the lack of need of a more strict inference. I had heard you guys were bending over backward trying to save bytes over the past few months.

I'll change the code.

@jaubourg Collaborator

Actually I'd expect the use of \xA0\uFEFF a third time would actually gain size, but I'll defer to @gibson042 on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
// A simple way to check for HTML strings
// Prioritize #id over <tag> to avoid XSS via location.hash (#9521)
View
6 test/unit/core.js
@@ -264,7 +264,7 @@ test("noConflict", function() {
});
test("trim", function() {
- expect(9);
+ expect(12);
var nbsp = String.fromCharCode(160);
@@ -278,6 +278,10 @@ test("trim", function() {
equal( jQuery.trim( null ), "", "Null" );
equal( jQuery.trim( 5 ), "5", "Number" );
equal( jQuery.trim( false ), "false", "Boolean" );
+
+ equal( jQuery.trim(" ").length, 0, "space should be trimmed");
+ equal( jQuery.trim("\xA0").length, 0, "nbsp should be trimmed");
+ equal( jQuery.trim("\uFEFF").length, 0, "zwsp should be trimmed");
@jaubourg Collaborator

Just curious as to the reasoning for not comparing to the empty string. If trim was to return an empty array when it encounters a BOM character, these tests would still pass.

Also, it would probably be nice to have tests against strings that would not end up empty, like "\xA0hello" and some other variations.

@wwalser
wwalser added a note

I suddenly understand why the typed language guys are all high and mighty. Yeah, I'll assert against the empty string.

Yup, I'll write an assertion or three that doesn't trim to zero length.

Thanks @jaubourg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
});
test("type", function() {
Something went wrong with that request. Please try again.