Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String.indexOf() cannot match phrases with variable whitespace #7355

Closed
rpedela opened this issue May 25, 2016 · 2 comments · Fixed by #13261
Closed

String.indexOf() cannot match phrases with variable whitespace #7355

rpedela opened this issue May 25, 2016 · 2 comments · Fixed by #13261

Comments

@rpedela
Copy link

rpedela commented May 25, 2016

String.indexOf() is being used for phrase search. There are two scenarios where this will fail because of whitespace:

  1. Justified text, very common in legal documents, which can result in variable amount of whitespace between words.
  2. Currently spaces are not added between lines therefore phrases that span multiple lines will not be found. See Improve Copy/Paste #5783 for more information.

Even if 2 is fixed, we still have to deal with 1 for phrase search. I have written a JS function that is equivalent to String.indexOf() but ignores " " while matching. This function also returns the end index since the length of searchValue cannot be trusted because the amount of whitespace between text and searchValue may differ. It could be modified to handle all whitespace characters, but I think it is unnecessary in this case since the text layer converts all whitespace to " ".

function indexOfIgnoreSpace(text, searchValue, fromIndex) {

  var begin = 0;
  var end = -1;
  var fragment = searchValue.slice(0).split(' ').join(''); // remove whitespace

  if (!fromIndex) {
    fromIndex = 0;
  }

  for (var i = fromIndex; i < text.length; i++) {

    // do not start matching on whitespace
    if (text[i] === ' ') {
      continue;
    }

    var index = 0;
    begin = i;

    for (var j = i; j < text.length; j++) {

      if (text[j] === ' ') {
          continue;
      }

      if (text[j] !== fragment[index]) {
          break;
      }

      index++;
      if (index === fragment.length) {
          end = j + 1;
          break;
      }
    }

    if (end !== -1) {
      return {
        begin: begin,
        end: end
      };
    }
  }

  return null;
}
@luomancs
Copy link

hi,
where should I put your function in the code so that it can resolve the space mismatch problem? thank you.

@calixteman
Copy link
Contributor

It's partially fixed thanks to af4dc55 which helps to detect EOL.
And this PR: #13261 should definitely fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants