Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content line parameter parsing doesn't support quoted parameter values #185

Closed
praichor opened this issue Jul 13, 2019 · 6 comments · Fixed by #194
Closed

Content line parameter parsing doesn't support quoted parameter values #185

praichor opened this issue Jul 13, 2019 · 6 comments · Fixed by #194

Comments

@praichor
Copy link

praichor commented Jul 13, 2019

The function ics.parse.ContentLine.parse doesn't support quoted parameters as specified in https://tools.ietf.org/html/rfc5545#page-8.
This causes trouble for example for some calendars from Outlook.

Possible solution, which did it for me:

@classmethod
def parse(cls, line):
    idx = 0
    # Parse key and parameters:
    while idx < len(line) and line[idx] != ':':
        # Parse key:
        while idx < len(line) and line[idx] not in ':;':
            idx += 1
        name = line[:idx]
        params = {}
        # Parse parameters:
        while line[idx] == ';':
            idx += 1
            paramname = ''
            # Parse parameter name:
            while idx < len(line) and line[idx] != '=':
                paramname += line[idx]
                idx += 1
            # Parse parameter values:
            if line[idx] != '=':
                raise ParseError("No '=' in line '{}'".format(paramstr))
            idx += 1
            paramvals = []
            while True:
                # Parse single parameter:
                paramval = ''
                if line[idx] == '"':
                    idx += 1
                    while idx < len(line) and line[idx] != '"':
                        paramval += line[idx]
                        idx += 1
                    idx += 1
                else:
                    while idx < len(line) and line[idx] not in ':;,':
                        paramval += line[idx]
                        idx += 1
                paramvals.append(paramval)
                if line[idx] != ',':
                    break
                idx += 1
            params[paramname] = paramvals
    # Parse value:
    if line[idx] != ':':
        raise ParseError("No ':' in line '{}'".format(line))
    idx += 1
    value = line[idx:].strip()
    return cls(name, params, value)
@C4ptainCrunch
Copy link
Member

Hi @praichor !
Thank you for reporting this issue. Could you support a minimal file that i could use to reproduce it ?

@praichor
Copy link
Author

praichor commented Jul 23, 2019

Hi @C4ptainCrunch,
Btw, thanks for your great work! In this case it is irreplaceable for importing external calendars into my calendar server.
Reproduction script:

not_working_input = """
BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook 16.0 MIMEDIR//EN
VERSION:2.0
METHOD:PUBLISH
X-CALSTART:20190103T230000Z
X-CALEND:20191219T230000Z
X-WR-RELCALID:{0000002E-0702-2B5E-B8A1-8E52149384CE}
X-WR-CALNAME:JPL2019
BEGIN:VTIMEZONE
TZID:W. Europe Standard Time
BEGIN:STANDARD
DTSTART:16011028T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:16010325T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Test
CLASS:PRIVATE
CREATED:20181209T181307Z
DTEND;TZID="W. Europe Standard Time":20190107T000000
DTSTAMP:20181209T182521Z
DTSTART;TZID="W. Europe Standard Time":20190104T000000
LAST-MODIFIED:20181209T181307Z
LOCATION:Test
PRIORITY:5
SEQUENCE:0
SUMMARY:Test
TRANSP:TRANSPARENT
UID:AAAAAA7yqOF4pKxDnjNcS1lZa4oHAP1tU+LVLl5KjQnKvp6K2xgAAAAAAj8AANs0z1XS86d
	DlPhJenbJRzoAAAAAngAAAA==
X-MICROSOFT-CDO-BUSYSTATUS:FREE
X-MICROSOFT-CDO-IMPORTANCE:1
END:VEVENT
END:VCALENDAR
"""

working_input = not_working_input.replace('"W. Europe Standard Time"', 'W. Europe Standard Time')

import ics

# Time zone of the event is correctly linked to the time zone definition VTIMEZONE:
working_cal = ics.Calendar(working_input)
print(list(working_cal.events)[0].begin)

# Time zone of the event is not linked to the time zone definition VTIMEZONE,
# because the quotation marks around the time zone in the event are not removed
# as they should be, and therefore the time zone lookup fails:
not_working_cal = ics.Calendar(not_working_input)
print(list(not_working_cal.events)[0].begin)

@C4ptainCrunch
Copy link
Member

Also see #193

@praichor
Copy link
Author

Also see #193

What about the following to also cover #193? Adding

                        if line[idx] == '\\':
                            idx += 1

at two places. New full function:

@classmethod`
def parse(cls, line):
    idx = 0
    # Parse key and parameters:
    while idx < len(line) and line[idx] != ':':
        # Parse key:
        while idx < len(line) and line[idx] not in ':;':
            idx += 1
        name = line[:idx]
        params = {}
        # Parse parameters:
        while line[idx] == ';':
            idx += 1
            paramname = ''
            # Parse parameter name:
            while idx < len(line) and line[idx] != '=':
                paramname += line[idx]
                idx += 1
            # Parse parameter values:
            if line[idx] != '=':
                raise ParseError("No '=' in line '{}'".format(paramstr))
            idx += 1
            paramvals = []
            while True:
                # Parse single parameter:
                paramval = ''
                if line[idx] == '"':
                    idx += 1
                    while idx < len(line) and line[idx] != '"':
                        if line[idx] == '\\':
                            idx += 1
                        paramval += line[idx]
                        idx += 1
                    idx += 1
                else:
                    while idx < len(line) and line[idx] not in ':;,':
                        if line[idx] == '\\':
                            idx += 1
                        paramval += line[idx]
                        idx += 1
                paramvals.append(paramval)
                if line[idx] != ',':
                    break
                idx += 1
            params[paramname] = paramvals
    # Parse value:
    if line[idx] != ':':
        raise ParseError("No ':' in line '{}'".format(line))
    idx += 1
    value = line[idx:].strip()
    return cls(name, params, value)

@C4ptainCrunch
Copy link
Member

I'm working on a branch that uses a formal grammar (see grammar.ebnf) that solves both issues.

This is still an experiment, but working with a real parser looks like the way to go.

C4ptainCrunch added a commit that referenced this issue Aug 17, 2019
Add 竜 TatSu as a dependency.
This enables us to have a real PEG parser and not a combination of
regexes and string splitting.

Fix parsing of quoted values as well as escaped semi-columns
This fixes #185 and fixes #193

Note : Adding Tatsu might have made the parser significantly slower in some cases.
@praichor
Copy link
Author

Using Tatsu looks like a good choice!
Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants