Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace deps encoding with build in textdecoder #39

Closed
wants to merge 5 commits into from
Closed

replace deps encoding with build in textdecoder #39

wants to merge 5 commits into from

Conversation

Uzlopak
Copy link
Contributor

@Uzlopak Uzlopak commented Nov 29, 2021

We should discuss about this first. This replaces the jsencoding code with the built in textdecoder.

We have unfortunately no busboy benchmarks. Maybe we should add first some benchmarks before checking the validity of this PR.

I cache the instantiated TextDecoder in a Map because I expect that instantiating a TextDecoder is expensive and reusing it is cheap and it has a low memory footprint. Maybe my expectation is wrong?

Nice sideeffect, the test coverage increases massively. LOL

File               | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s                                             
-------------------|---------|----------|---------|---------|---------------------------------------------------------------
All files          |   93.15 |    85.52 |   95.38 |   93.03 |                                                               

Checklist

@kibertoad
Copy link
Member

I second the idea of having busboy benchmarks, we could do an informed decision here then.

lib/utils.js Outdated
/**
* https://datatracker.ietf.org/doc/html/rfc7578#section-4.7
*/
switch (destEncoding) {
Copy link
Member

@kibertoad kibertoad Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no default? what will happen if we don't resolve anything there?

edit: right, we will use the passed one

lib/utils.js Outdated
destEncoding = 'binary';
break;
}
if (text) {
Copy link
Member

@kibertoad kibertoad Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need destEncoding outside of this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we get from outside if it is 7bit or 8bit. But I think I made a mistake because this is actually the textEncoding and not the destEncoding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering that all tests still passed, it's a very alarming sign. we do need to improve our code coverage for this part of functionality ahead of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a huge task to do that, with little to no benefits. Or are there any nodejs unit tests for the builtin textdecoder, which we could reuse?

lib/utils.js Outdated Show resolved Hide resolved
lib/utils.js Outdated Show resolved Hide resolved
@kibertoad
Copy link
Member

I am very scared of regressing edge cases here, considering both the size of original implementation and low code coverage of it. Wonder if we could transitively increase code coverage of encoder via writing additional tests for busboy itself first.

@Uzlopak Uzlopak marked this pull request as draft November 29, 2021 23:49
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 30, 2021

I think that jsencoding dependency is actually an archaic TextDecoder Implementation.

But what i realized is, that if I output the error messages of the created try catch blocks, it actually throws errors in node 10 and 12 reporting that iso-8891-1, despite the unit test passes. Maybe because the files are utf-8 encoded?

    ✔ Multiple extended parameters (RFC 5987) with mixed charsets
RangeError [ERR_ENCODING_NOT_SUPPORTED]: The "iso-8859-1" encoding is not supported

So I think, the issue is, that the built in TextDecoder is depending on the built-in ICU.
https://nodejs.org/docs/latest-v14.x/api/util.html#util_encodings_supported_when_icu_is_disabled

So what we can do is, to try to get a built-in TextDecoder and if we throw we use the deps/encoding like before as fallback. Maybe if we split the encoders, and require them if necessary reduces the start up time of busboy. But only if it means, that we have some performance gains.

So we should postpone this PR till we have some busboy benchmarks.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 30, 2021

@kibertoad We could remove actually all encoders, as we dont need the encoders but only the decoders.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 30, 2021

I removed the encoders and extracted the big json array into encodings.json.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 30, 2021

I tested it with the benchmarks from #40 and it is surprisingly slower. So I would say, We should not use the built-in TextDecoder...

@Uzlopak Uzlopak closed this Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants