Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to prevent double encoding #78

Closed
diegomansua opened this issue Oct 31, 2022 · 3 comments
Closed

Option to prevent double encoding #78

diegomansua opened this issue Oct 31, 2022 · 3 comments

Comments

@diegomansua
Copy link

diegomansua commented Oct 31, 2022

Hello,

First of all thanks to everyone that has made this lib possible.

This is not a bug report but rather than a feature suggestion.

I'm using this lib to import data from a third party into an old database that only supports ISO-8859-1.

I was using it like encode(<text>, {mode: 'nonAscii'}).

But I hit an issue as it turns out that the third party already uses entities for some characters. This means that I ended up with &amp;#39; whenever there was a &#39; entity, for example.

So I thought it'd be nice to have a preventDoubleEncoding option (only with a better name), to prevent encoding the ampersand whenever it's already part of an entity. E.g.:

encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: true}); -> returns you &amp; me
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: false}); -> returns you &amp; me
encode('you &amp; me', {mode: 'nonAscii', preventDoubleEncoding: true}); -> returns you &amp; me
encode('you &amp; me', {mode: 'nonAscii', preventDoubleEncoding: false}); -> returns you &amp;amp; me

@mdevils
Copy link
Owner

mdevils commented Jun 5, 2023

Hello @diegomansua,

Sorry for a long delay in the response. Can it be that this PR #86 solves your problem?

@diegomansua
Copy link
Author

@mdevils unless I'm doing something wrong it doesn't seem like it would solve my problem; I've checked out the PR branch and built it and tried the following:

console.log(encode('you &amp; me', {mode: 'nonAsciiPrintableOnly'})); // prints 'you &amp; me' ✅
console.log(encode('you & me', {mode: 'nonAsciiPrintableOnly'})); // prints 'you & me' ❌ (expected 'you &amp; me')

I've tried also with level: 'xml'.

Basically what I'd need is an option so that if an entity is already encoded (e.g. &amp;), it shouldn't encode it again (i.e. it should leave it as &amp; instead of doing &amp;amp;).

@mdevils
Copy link
Owner

mdevils commented Jun 24, 2023

Hello @diegomansua.

I'm afraid you have a very specific use-case.

I'd suggest you to use a combination of encode and decode like so:

console.log(encode(decode('you &amp; me and you & me'), {mode: 'nonAscii'}));

Hope this helps.

@mdevils mdevils closed this as completed Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants