Fix `maxBuffer` bug with `TextDecoder()` #105

ehmicky · 2023-08-16T21:40:02Z

If the stream contains a UTF-8 string with a partial multibyte sequence at the end, a final character \ufffd (�) is appended. This is the standard behavior by TextDecoder and other tools.

At the moment, this final byte is not checked by the maxBuffer logic. I.e. the returned string could be one byte over maxBuffer when appending �.

This PR fixes this bug. This required some refactoring.

ehmicky · 2023-08-16T21:44:00Z

source/contents.js

 		}

-		return finalize(contents, length, textDecoder);
+		appendFinalChunk({state, convertChunk, getSize, addChunk, getFinalChunk, maxBuffer});


The final chunk (�) must use the same logic as the previous chunks, including the maxBuffer logic.
To do this required two changes:

Adding a getFinalChunk() method that returns either undefined (no final chunk) or the final chunk, with the right type

Extracting the appendChunk() logic so that it can be used after getFinalChunk(). To do this, the inside of the main for loop had to be extracted to its own function appendChunk(). To do this, stateful arguments had to be put into a state object. Changing this required refactoring some additional functions.

ehmicky · 2023-08-16T21:44:26Z

test/string.js

@@ -112,6 +112,11 @@ test('get stream with truncated UTF-8 sequences', async t => {
 	t.is(result, `${multiByteString.slice(0, -1)}${INVALID_UTF8_MARKER}`);
 });

+test('handles truncated UTF-8 sequences over maxBuffer', async t => {


This test succeeds with the current PR, and fails without it.

ehmicky · 2023-08-16T21:45:05Z

source/array-buffer.js


 export async function getStreamAsArrayBuffer(stream, options) {
 	return getStreamContents(stream, arrayBufferMethods, options);
 }

-const initArrayBuffer = () => new Uint8Array(0);
+const initArrayBuffer = () => ({contents: new Uint8Array(0)});


The changes in this file and the following ones are related to refactoring stateful variables into a state object. See comment below.

ehmicky · 2023-08-16T21:45:43Z

source/array-buffer.js

@@ -77,5 +77,6 @@ const arrayBufferMethods = {
 	},
 	getSize: getLengthProp,
 	addChunk: addArrayBufferChunk,
+	getFinalChunk: noop,


getFinalChunk() is only used by getString(). It is a noop otherwise.

sindresorhus · 2023-08-16T23:03:46Z

Feel free to do another release when you are ready. You have publish access on npm.

ehmicky · 2023-08-17T02:04:46Z

Sounds good! Will do after #106.

ehmicky · 2023-08-17T15:39:56Z

Done in 8.0.1.

ehmicky commented Aug 16, 2023

View reviewed changes

Fix maxBuffer bug with TextDecoder()

547afe2

sindresorhus merged commit 7e191bb into sindresorhus:main Aug 16, 2023
3 checks passed

ehmicky deleted the max-buffer-string branch August 16, 2023 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `maxBuffer` bug with `TextDecoder()` #105

Fix `maxBuffer` bug with `TextDecoder()` #105

ehmicky commented Aug 16, 2023

ehmicky Aug 16, 2023 •

edited

ehmicky Aug 16, 2023

ehmicky Aug 16, 2023

ehmicky Aug 16, 2023

sindresorhus commented Aug 16, 2023

ehmicky commented Aug 17, 2023

ehmicky commented Aug 17, 2023

Fix maxBuffer bug with TextDecoder() #105

Fix maxBuffer bug with TextDecoder() #105

Conversation

ehmicky commented Aug 16, 2023

ehmicky Aug 16, 2023 • edited

Choose a reason for hiding this comment

ehmicky Aug 16, 2023

Choose a reason for hiding this comment

ehmicky Aug 16, 2023

Choose a reason for hiding this comment

ehmicky Aug 16, 2023

Choose a reason for hiding this comment

sindresorhus commented Aug 16, 2023

ehmicky commented Aug 17, 2023

ehmicky commented Aug 17, 2023

Fix `maxBuffer` bug with `TextDecoder()` #105

Fix `maxBuffer` bug with `TextDecoder()` #105

ehmicky Aug 16, 2023 •

edited