Parse_MIME does not properly deal with Unicode #353

jengelh · 2018-08-22T17:00:27Z

#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <libical/vcc.h>

static void handle_n(VObject *v)
{
        VObjectIterator tt;
        printf("N:\n");
        for (initPropIterator(&tt, v); moreIteration(&tt); ) {
                VObject *vv = nextVObject(&tt);
                int type = vObjectValueType(vv);
                if (type == VCVT_STRINGZ) {
                } else if (type == VCVT_USTRINGZ) {
                        const wchar_t *s = vObjectUStringZValue(vv);
                        while (*s != '\0')
                                printf(" U+%04x", *s++);
                        printf("\n");
                } else {
                }
        }
}

int main(void)
{
        char utf8string[] =
                "BEGIN:VCARD\n"
                "VERSION:3.0\n"
                "N:\xd0\x91\xd0\x9d;\xd0\x95\n" /* some russian */
                "N;ENCODING=QUOTED-PRINTABLE:=C3=A4=C3=B6;=C3=BC=C3=9F\n" /* some umlauts */
                "N;ENCODING=QUOTED-PRINTABLE:=c3=a4=c3=b6;=c3=bC=c3=9f\n" /* this causes a hang in libical */
                "TEL;TYPE=WORK:+7 1\n"
                "UID:040000008200E00074C5B7101A82E0080000000080C48A8CC733D401000000000000000001000000DE02EE07D0274B29A0031412BD51B565\n"
                "REV:2018-08-14T12:08:45Z\n"
                "END:VCARD\n";
        VObject *v = Parse_MIME(utf8string, strlen(utf8string));
        if (v == NULL) {
                printf("error in vcf\n");
                return 1;
        }
        VObjectIterator t;
        for (initPropIterator(&t, v); moreIteration(&t); ) {
                v = nextVObject(&t);
                const char *name = vObjectName(v);
                if (strcmp(name, VCNameProp) == 0)
                        handle_n(v);
        }
        return 0;
}

Observed behavior

The program hangs. If I remove the offending line, then it outputs:

N:
 U+00d0 U+0091 U+00d0 U+009d
 U+00d0 U+0095
N:
 U+00c3 U+00a4 U+00c3 U+00b6 U+003b U+00c3 U+00bc U+00c3 U+009f

Expected behavior

N:
 U+0411 U+041D
 U+0415
N:
 U+00E4 U+00F6
 U+00FC U+00DF

RFC6350 §3.1 suggests UTF-8 is supposed to be the default, but it looks more like Parse_MIME is partially 8-bit preserving - to the point that it erroneously upconverts everything to wchar_t..
libical seems to ignore the ; while in QP mode.
What am I missing?

The text was updated successfully, but these errors were encountered:

winterz · 2018-08-23T15:04:58Z

Q: "What am I missing?"
A: Probably nothing. the vcal parsing code is ancient and barely maintained.
I'm curious which branch you are using.

of course we don't want a hang. not good.

jengelh · 2018-08-23T21:34:59Z

version 3.0.3 and 3.0.4.

winterz · 2018-08-26T16:53:44Z

this patch fixes the hang:

diff --git a/src/libicalvcal/vcc.c b/src/libicalvcal/vcc.c
index d47bc099..178f34be 100644
--- a/src/libicalvcal/vcc.c
+++ b/src/libicalvcal/vcc.c
@@ -1144,6 +1144,8 @@ static char* lexGetQuotedPrintable()
c = c * 16 + next[i] - '0';
else if (next[i] >= 'A' && next[i] <= 'F')
c = c * 16 + next[i] - 'A' + 10;

               else if (next[i] >= 'a' && next[i] <= 'f')

                   c = c * 16 + next[i] - 'a' + 10;
               else
                   break;
               }

jengelh · 2018-09-17T20:25:06Z

"N;ENCODING=BASE64:w6TDtg==;w7zDnw==\n"

also produces an infinite loop :-(

gitsnuit · 2018-10-17T14:09:00Z

@winterz Any update?

winterz · 2018-10-17T14:39:39Z

apologies for being so slow about this. i've been trying to recover after hurricane Florence last month and have no time. I do recall being a bit reluctant to add a hard dependency for Windows.

jengelh mentioned this issue Sep 17, 2018

Fix UTF-8 and QP VCF input decoding #354

Closed

winterz added this to the 3.1 milestone May 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse_MIME does not properly deal with Unicode #353

Parse_MIME does not properly deal with Unicode #353

jengelh commented Aug 22, 2018

winterz commented Aug 23, 2018

jengelh commented Aug 23, 2018

winterz commented Aug 26, 2018

jengelh commented Sep 17, 2018

gitsnuit commented Oct 17, 2018

winterz commented Oct 17, 2018

Parse_MIME does not properly deal with Unicode #353

Parse_MIME does not properly deal with Unicode #353

Comments

jengelh commented Aug 22, 2018

Observed behavior

Expected behavior

winterz commented Aug 23, 2018

jengelh commented Aug 23, 2018

winterz commented Aug 26, 2018

jengelh commented Sep 17, 2018

gitsnuit commented Oct 17, 2018

winterz commented Oct 17, 2018