Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\r\n is being normalized to \n in the decrypted file content #752

Closed
dwjft opened this issue Aug 10, 2018 · 10 comments · Fixed by #772
Closed

\r\n is being normalized to \n in the decrypted file content #752

dwjft opened this issue Aug 10, 2018 · 10 comments · Fixed by #772

Comments

@dwjft
Copy link

dwjft commented Aug 10, 2018

If I'm decrypting a file, I want the original file contents, not a normalized version of the content. I can't imagine why that would be an agreed upon pattern for the library.

Anyway, the use-case I'm having trouble with is a file is based on a fixed-width file format.

Say for example someone sends me an encrypted file that is sending me strings of n bytes in length, and new lines are not a guarantee of a line ending.

For example, a file may look like this:

THIS IS 16 BYTES                             \r\n
THIS IS THE NEXT \r\n
16 BYTES .  \r\n
THIS IS THE THIRD SET OF BYTES \r\n

Obviously the above is not an accurate representation of a fixed length set of strings, but it illustrates the point perfectly. By normalizing the \r\n character to \n you are effectively shortening a line by 1 character and invalidating the entire schema of the fixed-width file.

I found the following two issues for reference:
#636
#211

@tomholub
Copy link
Contributor

tomholub commented Aug 10, 2018

Does this still happen when you do:

let m = openpgp.message.read(...);
let decrypted = await m.decrypt(...);
let data = decrypted.getLiteralData();

This will use binary representation, and should not fiddle with your data.

@dwjft
Copy link
Author

dwjft commented Aug 10, 2018

I am using .readArmored which has no getLiteralData() function on the return result.

The file itself isn't armored but I get a different error when trying to just use .read which reads .input.subarray is not a function. I pre-armorize non-armored content because I am never sure whether it is armored or not.

@dwjft
Copy link
Author

dwjft commented Aug 10, 2018

Okay so reading through the source code I found this getLiteralData and what it does.

Apparently if you pass format: 'binary' to the .decrypt function, it will return the literal data.

Can we get this documented somewhere?

Thanks.

@tomholub
Copy link
Contributor

tomholub commented Aug 10, 2018

I see a little bit in the readme, this one works for wrapped openpgp.decrypt function:

options = {
    message: openpgp.message.read(encrypted), // parse encrypted bytes
    passwords: ['secret stuff'],              // decrypt with password
    format: 'binary'                          // output as Uint8Array
};

openpgp.decrypt(options).then(function(plaintext) {
    return plaintext.data // Uint8Array([0x01, 0x01, 0x01])
});

In general, my expierence with OpenPGP.js is that the openpgp.decrypt(...) function adds a tad-bit too much magic, so I'd personally recommend using message.decrypt(...) where you get to tightly control the whats and hows.

I think improvements to docs are always welcome.

@tomholub
Copy link
Contributor

tomholub commented Aug 10, 2018

To be fair, it doesn't look immediately obvious from the readme that binary format is somehow treated differently than the default. It makes sense if you're used to it, but maybe we could make it more obvious.

@tomholub
Copy link
Contributor

Just to add, read and readArmored should result in the same Message object, they only differ in how the data is read initially - unless I'm mistaken.

@bartbutler
Copy link
Member

So, the rationale here is that if you are treating text data, the implication is that you want the input and output to be Javascript strings. Javascript strings have \n as newline and are in UCS-2. OpenPGP canonical text format is UTF-8, network normal line endings (\r\n). So this conversion to and from is performed automatically.

Binary mode is provided for exactly the purpose you want, to ensure no formatting or processing to the data whatsoever. But a general purpose Javascript OpenPGP library absolutely needs to be able to interface with standard Javascript strings to be useful, and that is what 'text' mode, or default mode, does. I hope that helps explain why the library does what it does.

@tomholub
Copy link
Contributor

That does make sense. I guess OP was displeased that it wasn't obvious from documentation. So the question is whether the docs need to be improved or not. If not, the issue can be closed, else we could talk about how to improve it.

One way to force API users to think about this would be to make it a requirement to supply format to decrypt(), but that would be for v4, if ever, if even desired.

@bartbutler
Copy link
Member

More clear documentation is always better. As I think the main use for the library is text, I probably wouldn't require the format parameter to be specified, but I don't really care.

@tomholub
Copy link
Contributor

If I get around to it, I'll submit a PR that improves the JSDoc anywhere format is used. Current form:

format | String | (optional) return data format either as 'utf8' or 'binary'

Updated doc should mention how to get the data results back from the return value, as this differs based on format used, and that utf8 format may alter the literal data, treating it as a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants