Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to support parsing of "quoted-printable" encoded property values #31

Merged
merged 4 commits into from
Aug 23, 2019
Merged

Conversation

mlandes
Copy link
Contributor

@mlandes mlandes commented Aug 21, 2019

Example of valid vcard property:

NOTE;ENCODING=QUOTED-PRINTABLE;CHARSET=utf-8:foobar foobar foobar foobar fo=
obar foobar foobar foobar foobar=0Afoobar foobar foobar foobar foobar fooba=
r=0Afoobar foobar foobar foobar foobar foobar=0Afoobar foobar foobar foobar=
 foobar foobar foobar foobar foobar

With "visible" EOL characters ...:

NOTE;ENCODING=QUOTED-PRINTABLE;CHARSET=utf-8:foobar foobar foobar foobar fo=\n
obar foobar foobar foobar foobar=0Afoobar foobar foobar foobar foobar fooba=\n
r=0Afoobar foobar foobar foobar foobar foobar=0Afoobar foobar foobar foobar=\n
 foobar foobar foobar foobar foobar\r\n

... or as string:

"NOTE;ENCODING=QUOTED-PRINTABLE;CHARSET=utf-8:foobar foobar foobar foobar fo=\nobar foobar foobar foobar foobar=0Afoobar foobar foobar foobar foobar fooba=\nr=0Afoobar foobar foobar foobar foobar foobar=0Afoobar foobar foobar foobar=\n foobar foobar foobar foobar foobar\r\n"

@jhermsmeier
Copy link
Owner

The lines of the note in your example seem to be incorrectly folded, the subsequent lines are missing a leading whitespace character. The following, with folded lines, parses correctly:

NOTE;ENCODING=QUOTED-PRINTABLE;CHARSET=utf-8:foobar foobar foobar foobar fo=\r\n
 obar foobar foobar foobar foobar=0Afoobar foobar foobar foobar foobar fooba=\r\n
 r=0Afoobar foobar foobar foobar foobar foobar=0Afoobar foobar foobar foobar=\r\n
 foobar foobar foobar foobar foobar\r\n

@jhermsmeier
Copy link
Owner

@mlandes with what did you run into this issue? Is it a specific program / platform / whatever that outputs QP encoded notes like that?

@mlandes
Copy link
Contributor Author

mlandes commented Aug 22, 2019

I use a webservice by ABBYY for OCR of business cards, which outputs vcards with a Quoted-Printable encoded note property in this form:

NOTE;ENCODING=QUOTED-PRINTABLE;CHARSET=utf-8:foobar foobar foobar foobar fo=\nobar foobar foobar foobar foobar=0Afoobar foobar foobar foobar foobar fooba=\nr=0Afoobar foobar foobar foobar foobar foobar=0Afoobar foobar foobar foobar=\n foobar foobar foobar foobar foobar\r\n

According to RFC this is correct syntax, because only a combination of \r\n (CRLF) is a correct property delimiter and folding is defined as:

a CRLF immediately followed by a single white space character (space (U+0020) or horizontal tab (U+0009)). The folded line MUST contain at least one character.

See: https://tools.ietf.org/html/rfc6350#section-3.2

In Quoted-Printable encoding a separate sequence of =\n is used as encoding-specific folding.
This is compatible with RFC folding of vcards.
For vcards in RFC, =\n is just a normal sequence of characters without any impact, neither folding nor delimiting the vcard property.

@jhermsmeier
Copy link
Owner

I see – the formatting of the examples in your initial comment tripped me up there. Would you mind adding a test to go along with this, so we don't break it again in the future?

@mlandes
Copy link
Contributor Author

mlandes commented Aug 23, 2019

done

@jhermsmeier
Copy link
Owner

Neat, thanks!

@jhermsmeier jhermsmeier merged commit 5edb471 into jhermsmeier:master Aug 23, 2019
@jhermsmeier
Copy link
Owner

Published in vcf@2.0.5

@@ -31,6 +32,12 @@ suite( 'vCard', function() {
assert.deepEqual( card.get( 'tel' ).type, [ 'voice', 'home' ] )
})

test( 'should parse vCard property values containing isolated \\n without delimiting, e.g. used in quoted-printable encoding (issue #31)', function() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I meant Pullrequest #31, not Issue #31 ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no worries – issue and PR numbers don't overlap on GitHub, so it'll be found :)

@mlandes
Copy link
Contributor Author

mlandes commented Aug 27, 2019

@jhermsmeier

Hi again,
after testing some vcards in real life, which I extracted from QR-Codes on business cards,
it seems that the world does not always stick to the RFC specifications as intended... meaning often vcards are generated wrongly using \n instead of the correct \r\n for delimiting.

Therefore I had to use a small workaround to support these "common-law" vcards:

import vCard from "vcf";

createFromVcfString(vcfString) {
    if (!vcfString.includes("\r\n")) {
        vcfString = vcfString.replace(/\n/g, "\r\n");
    }
    return new vCard().parse(vcfString);
}

Maybe this should be added in your module so that this pullrequest does not break these often used but wrongly generated vcards.

@jhermsmeier
Copy link
Owner

jhermsmeier commented Aug 27, 2019

Well, that's why the line-splitting regex was /\r?\n/g before – but obviously I forgot about that, and didn't have a test for either.

Thinking about it though, it'll be best to stick to the specifications. Supporting LF only vcards would also break support for QP encoded newlines. And users can still work around this by replacing LF with CRLF in their input should they need to parse malformed vcf data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants