Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: fix lastIndexOf and indexOf in various edge cases #6511

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented May 2, 2016

Checklist
  • tests and code linting passes
  • a test and/or benchmark is included
  • the commit message follows commit guidelines
Affected core subsystem(s)

buffer

Description of change

Return -1 in Buffer.lastIndexOf if the needle is longer than the haystack. The previous check only tested the corresponding condition for forward searches.

Fixes: #6510

@addaleax addaleax added the buffer Issues and PRs related to the buffer subsystem. label May 2, 2016
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label May 2, 2016
@jasnell
Copy link
Member

jasnell commented May 2, 2016

You beat me to it. Had the fix ready to push lol.
Can you check to make sure this works with ucs2, utf8, and binary encode strings also?

@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

@jasnell … sorry. :-/ Sure.

@addaleax addaleax force-pushed the fix-buffer-lastindexof-needle-gt-haystack branch from fb4ec11 to 2d5f775 Compare May 2, 2016 02:27
@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

Updated it. CI: https://ci.nodejs.org/job/node-test-commit/3131/

@jasnell
Copy link
Member

jasnell commented May 2, 2016

Having trouble getting to github from my laptop so I can't test your patch just yet. Can you give this one a try with your patch?

var b = Buffer.from([1,2]);
b.lastIndexOf('€a', 0, 'ucs2')

@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

$ ./node
> var b = Buffer.from([1,2]);
undefined
> b.lastIndexOf('€a', 0, 'ucs2')
-1

And yep, without this patch the same code crashes Node.

@jasnell
Copy link
Member

jasnell commented May 2, 2016

Ok. Might want to add a few tests with characters that definitely expand out to multi-bytes under utf8 also. Just to be extra safe.

@jasnell
Copy link
Member

jasnell commented May 2, 2016

Looking good so far. @trevnorris ... PTAL

@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

Hmmmm, you’re right – but the other way around: Multi-byte characters with UTF-8 are fine, but ASCII characters with UTF-16 fail because needle_length underestimates the needle length. That one’s also happening for indexOf and must have been around for a really long time…

@jasnell
Copy link
Member

jasnell commented May 2, 2016

Yep, I was just noticing the same thing. In v5.11.0 the following dies:

bash-3.2$ node -v
v5.11.0
bash-3.2$ node
> b = Buffer.from('abc')
<Buffer 61 62 63>
> b.indexOf('abc','ucs2')
Assertion failed: (0 <= index && index < length_), function operator[], file ../src/string_search.h, line 56.
Abort trap: 6
bash-3.2$ 

@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

Updated again, assuming StringBytes::Size is the right one for this (?).

@jasnell
Copy link
Member

jasnell commented May 2, 2016

+1 ... I'm going to look at this a bit more in the morning to find more ways of breaking this. It definitely looks like v5.x is having some related issues also so we'll need to look at backporting. I haven't tested v4 yet but I suspect it's an issue there also.

bash-3.2$ node -v
v5.11.0
bash-3.2$ node
> b = Buffer.from('abc')
<Buffer 61 62 63>
> Buffer.from('ab','ucs2')
<Buffer 61 00 62 00>
> b.indexOf('ab','ucs2')
2

@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

Do that, please. That specific last one you posted would be fixed with this, too, but it seems to imply that there’s something more going wrong, doesn’t it? In any case, you have write perms to my repo if you want to add something here.

@jasnell
Copy link
Member

jasnell commented May 2, 2016

Yeah, that's what I'm afraid of. Really appreciate you looking at this also

@addaleax addaleax changed the title buffer: fix lastIndexOf crash for overlong needle buffer: fix lastIndexOf and indexOf in various edge cases May 2, 2016
@addaleax addaleax force-pushed the fix-buffer-lastindexof-needle-gt-haystack branch from 26596a1 to dc4b946 Compare May 2, 2016 06:03
@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

@jasnell Heh, that last one you found is funny. Node v5.11.0:

Buffer.from("abc").indexOf("ab","ucs2") === 2
Buffer.from("abcd").indexOf("ab","ucs2") === -1
Buffer.from("abcde").indexOf("ab","ucs2") === 4
Buffer.from("abcdef").indexOf("ab","ucs2") === -1
Buffer.from("abcdefg").indexOf("ab","ucs2") === 6

This seems to be because the haystack_length is divided by 2 before being passed to SearchString and result is multiplied with 2 after that, but that doesn’t cancel out when the string length isn’t even.

Updating again with a fix for that, too, and reordered the commits so that the ones that may be eligible for backporting come first in this PR.

@addaleax addaleax force-pushed the fix-buffer-lastindexof-needle-gt-haystack branch from dc4b946 to 414877c Compare May 2, 2016 06:08
@jasnell
Copy link
Member

jasnell commented May 2, 2016

nice. well, at least it's consistently wrong :-) when I look at this more tomorrow I'm definitely going to test things out on v4 also. I think you're right that this has likely been a problem for quite some time.

@addaleax addaleax force-pushed the fix-buffer-lastindexof-needle-gt-haystack branch from 414877c to a06edb1 Compare May 2, 2016 14:53
@jasnell
Copy link
Member

jasnell commented May 2, 2016

@addaleax ... ok, ran this through a few paces and couldn't find anywhere else it breaks down. The abort on that ascii/ucs2 case definitely happens in v4 also so we'll need to backport at least part of this to both v5 and v4.

Also can confirm that the following fails in v4:

> (new Buffer("abc")).indexOf("ab","ucs2")
2

@jasnell
Copy link
Member

jasnell commented May 2, 2016

@addaleax
Copy link
Member Author

addaleax commented May 2, 2016

The first two commits here cherry-pick cleanly to the v4.x and v5.x branches, so yep, sounds like a plan.

@jasnell
Copy link
Member

jasnell commented May 2, 2016

Awesome. keep in mind tho that Buffer.from() does not exist in v4 so the test cases will need to be modified a bit there. Let's see what CI has to say and I definitely want @trevnorris to give it a look over when he gets a chance.

// Extended latin-1 characters are 2 bytes in Utf8.
const size_t needle_length =
enc == BINARY ? needle->Length() : needle->Utf8Length();
// Round down to the neares multiple of 2 in case of UCS2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/neares/nearest

MylesBorins pushed a commit that referenced this pull request May 16, 2016
Fix `buffer.indexOf` for the case that the haystack has odd length
and the needle is not found in it. `StringSearch()` would return
the length of the buffer in multiples of `sizeof(uint16_t)`, but
checking that against `haystack_length` would not work if the latter
one was odd.

PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
MylesBorins pushed a commit that referenced this pull request May 16, 2016
PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
evanlucas pushed a commit that referenced this pull request May 17, 2016
Fix `buffer.lastIndexOf()` for the case that the first character
of the needle is contained in the haystack, but in a location that
makes it impossible to be part of a full match.

For example, when searching for 'abc' in 'abcdef', only the 'cdef'
part needs to be searched for 'c', because earlier locations can
be excluded by index calculations alone.

Previously, such a search would result in an assertion failure.

This applies only to Node.js v6, as `lastIndexOf` was added in it.

PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
evanlucas pushed a commit that referenced this pull request May 17, 2016
Return -1 in `Buffer.lastIndexOf` if the needle is longer than the
haystack. The previous check only tested the corresponding
condition for forward searches.

This applies only to Node.js v6, as `lastIndexOf` was added in it.

Fixes: #6510
PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
evanlucas pushed a commit that referenced this pull request May 17, 2016
Use `StringBytes::Size` to determine the needle string length
instead of assuming latin-1 or UTF-8.

Previously, `Buffer.indexOf` could fail with an assertion failure
when the needle's byte length, but not its character count,
exceeded the haystack's byte length.

PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
evanlucas pushed a commit that referenced this pull request May 17, 2016
Fix `buffer.indexOf` for the case that the haystack has odd length
and the needle is not found in it. `StringSearch()` would return
the length of the buffer in multiples of `sizeof(uint16_t)`, but
checking that against `haystack_length` would not work if the latter
one was odd.

PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
evanlucas pushed a commit that referenced this pull request May 17, 2016
PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
evanlucas added a commit that referenced this pull request May 17, 2016
- **buffer**: fix lastIndexOf and indexOf in various edge cases (Anna
  Henningsen) [#6511](#6511)
- **child_process**: use /system/bin/sh on android (Ben Noordhuis)
  [#6745](#6745)
- **deps**:
  - upgrade npm to 3.8.9 (Rebecca Turner)
    [#6664](#6664)
  - upgrade to V8 5.0.71.47 (Ali Ijaz Sheikh)
    [#6572](#6572)
  - upgrade libuv to 1.9.1 (Saúl Ibarra Corretgé)
    [#6796](#6796)
  - Intl: ICU 57 bump (Steven R. Loomis)
    [#6088](#6088)
- **repl**:
  - copying tabs shouldn't trigger completion (Eugene Obrezkov)
    [#5958](#5958)
  - exports `Recoverable` (Blake Embrey)
    [#3488](#3488)
- **src**: add O_NOATIME constant (Rich Trott)
  [#6492](#6492)
- **src,module**: add --preserve-symlinks command line flag (James M
  Snell) [#6537](#6537)
- **util**: adhere to `noDeprecation` set at runtime (Anna Henningsen)
  [#6683](#6683)
evanlucas added a commit that referenced this pull request May 17, 2016
- **buffer**: fix lastIndexOf and indexOf in various edge cases (Anna
  Henningsen) [#6511](#6511)
- **child_process**: use /system/bin/sh on android (Ben Noordhuis)
  [#6745](#6745)
- **deps**:
  - upgrade npm to 3.8.9 (Rebecca Turner)
    [#6664](#6664)
  - upgrade to V8 5.0.71.47 (Ali Ijaz Sheikh)
    [#6572](#6572)
  - upgrade libuv to 1.9.1 (Saúl Ibarra Corretgé)
    [#6796](#6796)
  - Intl: ICU 57 bump (Steven R. Loomis)
    [#6088](#6088)
- **repl**:
  - copying tabs shouldn't trigger completion (Eugene Obrezkov)
    [#5958](#5958)
  - exports `Recoverable` (Blake Embrey)
    [#3488](#3488)
- **src**: add O_NOATIME constant (Rich Trott)
  [#6492](#6492)
- **src,module**: add --preserve-symlinks command line flag (James M
  Snell) [#6537](#6537)
- **util**: adhere to `noDeprecation` set at runtime (Anna Henningsen)
  [#6683](#6683)

As of this release the 6.X line now includes 64-bit binaries for Linux
on Power Systems running in big endian mode in addition to the existing
64-bit binaries for running in little endian mode.

PR-URL: #6810
evanlucas added a commit that referenced this pull request May 17, 2016
- **buffer**: fix lastIndexOf and indexOf in various edge cases (Anna
  Henningsen) [#6511](#6511)
- **child_process**: use /system/bin/sh on android (Ben Noordhuis)
  [#6745](#6745)
- **deps**:
  - upgrade npm to 3.8.9 (Rebecca Turner)
    [#6664](#6664)
  - upgrade to V8 5.0.71.47 (Ali Ijaz Sheikh)
    [#6572](#6572)
  - upgrade libuv to 1.9.1 (Saúl Ibarra Corretgé)
    [#6796](#6796)
  - Intl: ICU 57 bump (Steven R. Loomis)
    [#6088](#6088)
- **repl**:
  - copying tabs shouldn't trigger completion (Eugene Obrezkov)
    [#5958](#5958)
  - exports `Recoverable` (Blake Embrey)
    [#3488](#3488)
- **src**: add O_NOATIME constant (Rich Trott)
  [#6492](#6492)
- **src,module**: add --preserve-symlinks command line flag (James M
  Snell) [#6537](#6537)
- **util**: adhere to `noDeprecation` set at runtime (Anna Henningsen)
  [#6683](#6683)

As of this release the 6.X line now includes 64-bit binaries for Linux
on Power Systems running in big endian mode in addition to the existing
64-bit binaries for running in little endian mode.

PR-URL: #6810
MylesBorins pushed a commit that referenced this pull request May 18, 2016
Use `StringBytes::Size` to determine the needle string length
instead of assuming latin-1 or UTF-8.

Previously, `Buffer.indexOf` could fail with an assertion failure
when the needle's byte length, but not its character count,
exceeded the haystack's byte length.

PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
MylesBorins pushed a commit that referenced this pull request May 18, 2016
Fix `buffer.indexOf` for the case that the haystack has odd length
and the needle is not found in it. `StringSearch()` would return
the length of the buffer in multiples of `sizeof(uint16_t)`, but
checking that against `haystack_length` would not work if the latter
one was odd.

PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
MylesBorins pushed a commit that referenced this pull request May 18, 2016
PR-URL: #6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
@MylesBorins MylesBorins mentioned this pull request May 18, 2016
MylesBorins pushed a commit that referenced this pull request May 18, 2016
Notable changes:

* **buffer**:
  * Buffer no longer errors if you call lastIndexOf with a search term
    longer than the buffer (Anna Henningsen)
    #6511

* deps:
  * update npm to 2.15.5 (Rebecca Turner)
    #6663

* http:
  * Invalid status codes can no longer be sent. Limited to 3 digit
    numbers between 100 - 999 (Brian White)
    #6291
@keltheceo
Copy link

👍 good work

MylesBorins pushed a commit that referenced this pull request May 20, 2016
Notable changes:

* **buffer**:
  * Buffer no longer errors if you call lastIndexOf with a search term
    longer than the buffer (Anna Henningsen)
    #6511

* contextify:
  * Context objects are now properly garbage collected, this solves a
    problem some individuals were experiencing with extreme memory
    growth (Ali Ijaz Sheikh)
    #6871

* deps:
  * update npm to 2.15.5 (Rebecca Turner)
    #6663

* http:
  * Invalid status codes can no longer be sent. Limited to 3 digit
    numbers between 100 - 999 (Brian White)
    #6291
MylesBorins pushed a commit that referenced this pull request May 23, 2016
Notable changes:

* **buffer**:
  * Buffer no longer errors if you call lastIndexOf with a search term
    longer than the buffer (Anna Henningsen)
    #6511

* contextify:
  * Context objects are now properly garbage collected, this solves a
    problem some individuals were experiencing with extreme memory
    growth (Ali Ijaz Sheikh)
    #6871

* deps:
  * update npm to 2.15.5 (Rebecca Turner)
    #6663

* http:
  * Invalid status codes can no longer be sent. Limited to 3 digit
    numbers between 100 - 999 (Brian White)
    #6291
MylesBorins pushed a commit that referenced this pull request May 24, 2016
Notable changes:

* **buffer**:
  * Buffer no longer errors if you call lastIndexOf with a search term
    longer than the buffer (Anna Henningsen)
    #6511

* contextify:
  * Context objects are now properly garbage collected, this solves a
    problem some individuals were experiencing with extreme memory
    growth (Ali Ijaz Sheikh)
    #6871

* deps:
  * update npm to 2.15.5 (Rebecca Turner)
    #6663

* http:
  * Invalid status codes can no longer be sent. Limited to 3 digit
    numbers between 100 - 999 (Brian White)
    #6291
MylesBorins pushed a commit that referenced this pull request May 24, 2016
Notable changes:

* **buffer**:
  * Buffer no longer errors if you call lastIndexOf with a search term
    longer than the buffer (Anna Henningsen)
    #6511

* contextify:
  * Context objects are now properly garbage collected, this solves a
    problem some individuals were experiencing with extreme memory
    growth (Ali Ijaz Sheikh)
    #6871

* deps:
  * update npm to 2.15.5 (Rebecca Turner)
    #6663

* http:
  * Invalid status codes can no longer be sent. Limited to 3 digit
    numbers between 100 - 999 (Brian White)
    #6291
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants