Blast treats spaces between words as non-characters. #8

aaronleesmith · 2014-09-26T17:12:41Z

Blast treats spaces between words as non-characters. This causes the character delimiter to return improper indexes. Because of this, lining up character index data with blast's generated blast-indexes is very difficult.

There should be an option to treat whitespace as characters in order to maintain the actual index of characters from start to finish in a string of text.

By the way... this could be handled simply (I think) by adding a "preserveWhitespace" option that would only apply to characters. Then, you could do this inside the delimiter creation section:

case "character":
                /* Matches every non-space character. */
                /* Note: This is the slowest delimiter. However, its slowness is only noticeable when it's used on larger bodies of text (of over 500 characters) on <=IE8.
                   (Run Blast with opts.debug=true to monitor execution times.) */
                if (opts.preserveWhitespace)
                    delimiterRegex = /(.)/;
                else
                    delimiterRegex = /\S/;
                break;

I don't know if "." is the best way to do this. Perhaps it should only preserve the actual space character.

As a sidenote, to render properly you would need to give white-space: pre|pre-line|pre-wrap to the blasted element to make the spaces not collapse in their own HTML elements.

@julianshapiro , did you consider any of this during initial development?

The text was updated successfully, but these errors were encountered:

julianshapiro · 2014-10-08T21:44:00Z

I've emailed you the revised version of Blast.js :)

What do you think of calling the delimiter "character-strict" instead of making this feature flaggable through an option? Can you think of a more appropriate delimiter name?
The behavior is as follows: All characters are matched then all space characters are inline-styled to white-space: pre-line (as per your suggestion). What do you think of this implementation?
I've also changed Blast's overall behavior on how index numbers are treated for generateIndexID: they now start at 0 instead of 1.
Could you let me know if it works as intended? Any other feedback?

Thanks, man!

aaronleesmith · 2014-10-08T22:15:29Z

I see your email and will take a look soon. Let me speak to your questions above now.

"Strict" does have other meanings in JS (use strict) and programming in general. How about "character-precise"?
I think that's a fine way to do it.
I wonder what your original intent was for the index IDs. I can see this going either way. It is normal for programmers to expect index=0 to be the first character. However, in my experience with text analytics (which is what I'm utilizing your library for) the 'locations' of the analyzed characters start at 1. Of course you can't predict every use case, and since 0-based is the standard for software, that seems more like the way to go. I do wonder why your original implementation was 1-based, was there some motivation for it?
Will update after I've tested it out.

By the way, another product I work on uses Velocity under the hood and we see significant improvement of the animation quality. Thanks a ton for your hard work.

julianshapiro · 2014-10-08T22:50:45Z

Thanks for the kind words! My pleasure.

As for -precise, I think that's a bit of an odd sounding word despite it being contextually accurate. You make a good point about using strict. What about just all as the delimiter name?

My motivation for starting at 1-based was that it wasn't matching every character and therefore wasn't a true index of all matches relative to their original bodies of text. Further, these wrapper elements wind up being referred to in CSS sometimes, and it just seemed odd to mix 0th-based counting with styling. But it's what I should have done to begin with. I have no real good reason. Thanks for waxing poetic on this point for me.

Awaiting your test results. Thanks again.

julianshapiro · 2014-10-24T17:43:29Z

how'd this work out?

julianshapiro · 2015-02-25T04:11:44Z

Fixed. See new release. there's now an all delimiter type.

julianshapiro added the enhancement label Sep 26, 2014

julianshapiro added the releasing soon label Oct 8, 2014

julianshapiro closed this as completed Feb 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blast treats spaces between words as non-characters. #8

Blast treats spaces between words as non-characters. #8

aaronleesmith commented Sep 26, 2014

julianshapiro commented Oct 8, 2014

aaronleesmith commented Oct 8, 2014

julianshapiro commented Oct 8, 2014

julianshapiro commented Oct 24, 2014

julianshapiro commented Feb 25, 2015

Blast treats spaces between words as non-characters. #8

Blast treats spaces between words as non-characters. #8

Comments

aaronleesmith commented Sep 26, 2014

julianshapiro commented Oct 8, 2014

aaronleesmith commented Oct 8, 2014

julianshapiro commented Oct 8, 2014

julianshapiro commented Oct 24, 2014

julianshapiro commented Feb 25, 2015