Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for more Unicode characters
When attempting to parse an iCalendar ICS export from Google, I discovered a few Unicode code points in event `DESCRIPTION`s that were breaking the parser. The three that I directly observed that broke the parser: - U+0009 Horizontal Tab - https://unicode.org/cldr/utility/character.jsp?a=0009 - U+200B Zero Width Space - https://unicode.org/cldr/utility/character.jsp?a=200B - U+00AD Soft Hyphen - https://unicode.org/cldr/utility/character.jsp?a=00ad Based on my understanding, according to RFC5545[0], these characters should be supported. However, `unicode.IsGraphic` does not return true for them. This may be a hacky way to support it, but I think supporting all non-control unicode Characters is closer to the spec without being too verbose/complex. --- Rationale for support: contentline = name *(";" param ) ":" value CRLF value = *VALUE-CHAR VALUE-CHAR = WSP / %x21-7E / NON-US-ASCII NON-US-ASCII = UTF8-2 / UTF8-3 / UTF8-4 - U+009 is defined in WSP/HTAB in RFC5234 https://tools.ietf.org/html/rfc5234 - U+200B/U+00AD is defined by UTF8-2/UTF8-3/UTF8-4 in RFC3629 https://tools.ietf.org/html/rfc3629
- Loading branch information