Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blast treats spaces between words as non-characters. #8

Closed
aaronleesmith opened this issue Sep 26, 2014 · 5 comments
Closed

Blast treats spaces between words as non-characters. #8

aaronleesmith opened this issue Sep 26, 2014 · 5 comments

Comments

@aaronleesmith
Copy link

Blast treats spaces between words as non-characters. This causes the character delimiter to return improper indexes. Because of this, lining up character index data with blast's generated blast-indexes is very difficult.

There should be an option to treat whitespace as characters in order to maintain the actual index of characters from start to finish in a string of text.

By the way... this could be handled simply (I think) by adding a "preserveWhitespace" option that would only apply to characters. Then, you could do this inside the delimiter creation section:

case "character":
                /* Matches every non-space character. */
                /* Note: This is the slowest delimiter. However, its slowness is only noticeable when it's used on larger bodies of text (of over 500 characters) on <=IE8.
                   (Run Blast with opts.debug=true to monitor execution times.) */
                if (opts.preserveWhitespace)
                    delimiterRegex = /(.)/;
                else
                    delimiterRegex = /\S/;
                break;

I don't know if "." is the best way to do this. Perhaps it should only preserve the actual space character.

As a sidenote, to render properly you would need to give white-space: pre|pre-line|pre-wrap to the blasted element to make the spaces not collapse in their own HTML elements.

@julianshapiro , did you consider any of this during initial development?

@julianshapiro
Copy link
Owner

I've emailed you the revised version of Blast.js :)

  1. What do you think of calling the delimiter "character-strict" instead of making this feature flaggable through an option? Can you think of a more appropriate delimiter name?
  2. The behavior is as follows: All characters are matched then all space characters are inline-styled to white-space: pre-line (as per your suggestion). What do you think of this implementation?
  3. I've also changed Blast's overall behavior on how index numbers are treated for generateIndexID: they now start at 0 instead of 1.
  4. Could you let me know if it works as intended? Any other feedback?

Thanks, man!

@aaronleesmith
Copy link
Author

I see your email and will take a look soon. Let me speak to your questions above now.

  1. "Strict" does have other meanings in JS (use strict) and programming in general. How about "character-precise"?
  2. I think that's a fine way to do it.
  3. I wonder what your original intent was for the index IDs. I can see this going either way. It is normal for programmers to expect index=0 to be the first character. However, in my experience with text analytics (which is what I'm utilizing your library for) the 'locations' of the analyzed characters start at 1. Of course you can't predict every use case, and since 0-based is the standard for software, that seems more like the way to go. I do wonder why your original implementation was 1-based, was there some motivation for it?
  4. Will update after I've tested it out.

By the way, another product I work on uses Velocity under the hood and we see significant improvement of the animation quality. Thanks a ton for your hard work.

@julianshapiro
Copy link
Owner

Thanks for the kind words! My pleasure.

As for -precise, I think that's a bit of an odd sounding word despite it being contextually accurate. You make a good point about using strict. What about just all as the delimiter name?

My motivation for starting at 1-based was that it wasn't matching every character and therefore wasn't a true index of all matches relative to their original bodies of text. Further, these wrapper elements wind up being referred to in CSS sometimes, and it just seemed odd to mix 0th-based counting with styling. But it's what I should have done to begin with. I have no real good reason. Thanks for waxing poetic on this point for me.

Awaiting your test results. Thanks again.

@julianshapiro
Copy link
Owner

how'd this work out?

@julianshapiro
Copy link
Owner

Fixed. See new release. there's now an all delimiter type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants