add BADCHARSET support #9

nevans · 2020-11-09T03:24:06Z

Although RFC3501 defined the charset as astring, the RFC3501 Errata (and RFC4466 too) updates charset to be atom / quoted.

I was having a hard time extracting this one from my codebase without creating lots of merge conflicts. So instead of making it narrowly self-contained PR, I built it on top of #6 and #8. The entire unique PR is contained in 41c2aca and everything before that commit belongs to those other two PRs.

This could lead to unexpected bugs if `@next_token` was filled using one parser and lookahead is called with an incompatible parser, so I've made that an error. All matches that set lex_state should be immediately preceded by a "match" or "shift_token". Probably better would be to push/pop on a `lex_state` stack.

These are mostly useful when violating the RFCs, in order to workaround servers that violate the RFCs.

This encapsulates the `@lex_state` change.

Specifically, "text" is allowed to begin with "[" or "=". Disallowing that was an older RFC2060 restriction. (RFC 2060 was written in 1996. RFC 3501 in 2003) :)

`accept` can be used to replace `lookahead` + check `token.symbol` + `shift_token`. It's not being used here, but has been separated out into its own commit to reduce merge conflicts from multiple branches which use it.

`astring_chars` roughly matches the RFC2060 definition of `atom`, and is used by RFC3501's `astring`. `atom` matches the RFC3501 definition. Although nothing in the parser currently uses `atom`, future commits will update use it where it's used by RFC3501, RFC4466, etc. Made a helper method, `combine_adjacent` which is used by both `atom` and `astring_chars` to combine adjacent tokens into a single string. It would probably be better to update the lexer regexps, possibly using negative lookahead assertions, so that it returns a single token.

Always returns an array, even if empty.

shugo · 2021-03-09T09:27:56Z

@nevans I've merged it. Thank you!

nevans mentioned this pull request Nov 14, 2020

Support for IMAP4rev2 and modern extensions #12

Open

nevans force-pushed the BADCHARSET-parsing-RFC3501-errata branch from 03955e7 to 41c2aca Compare December 7, 2020 09:19

nevans and others added 7 commits December 8, 2020 23:25

add accept_space(s) helpers

72a13f9

These are mostly useful when violating the RFCs, in order to workaround servers that violate the RFCs.

Add ResponseParser#text. Use in text_response.

f7661a6

This encapsulates the `@lex_state` change.

Fix resp_text to match RFC 3501

d3566cb

Specifically, "text" is allowed to begin with "[" or "=". Disallowing that was an older RFC2060 restriction. (RFC 2060 was written in 1996. RFC 3501 in 2003) :)

Add Net::IMAP::ResponseParser#accept method

79f3d3e

`accept` can be used to replace `lookahead` + check `token.symbol` + `shift_token`. It's not being used here, but has been separated out into its own commit to reduce merge conflicts from multiple branches which use it.

Net::IMAP: parse BARDCHARSET charsets

dcee03e

Always returns an array, even if empty.

nevans force-pushed the BADCHARSET-parsing-RFC3501-errata branch from 41c2aca to dcee03e Compare December 10, 2020 23:21

Merge branch 'master' into BADCHARSET-parsing-RFC3501-errata

ff55916

shugo merged commit 13dffe8 into ruby:master Mar 9, 2021

nevans deleted the BADCHARSET-parsing-RFC3501-errata branch March 19, 2021 18:16

nevans added the IMAP4rev1 Requirement for IMAP4rev1, RFC3501 label Feb 12, 2023

nevans mentioned this pull request Feb 12, 2023

Update parser grammar to match RFC3501 and RFC4466 #50

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add BADCHARSET support #9

add BADCHARSET support #9

nevans commented Nov 9, 2020 •

edited

shugo commented Mar 9, 2021

add BADCHARSET support #9

add BADCHARSET support #9

Conversation

nevans commented Nov 9, 2020 • edited

shugo commented Mar 9, 2021

nevans commented Nov 9, 2020 •

edited