Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add BADCHARSET support #9

Merged
merged 8 commits into from
Mar 9, 2021

Conversation

nevans
Copy link
Collaborator

@nevans nevans commented Nov 9, 2020

Although RFC3501 defined the charset as astring, the RFC3501 Errata (and RFC4466 too) updates charset to be atom / quoted.

I was having a hard time extracting this one from my codebase without creating lots of merge conflicts. So instead of making it narrowly self-contained PR, I built it on top of #6 and #8. The entire unique PR is contained in 41c2aca and everything before that commit belongs to those other two PRs.

nevans and others added 7 commits December 8, 2020 23:25
This could lead to unexpected bugs if `@next_token` was filled using one
parser and lookahead is called with an incompatible parser, so I've made
that an error.  All matches that set lex_state should be immediately
preceded by a "match" or "shift_token".

Probably better would be to push/pop on a `lex_state` stack.
These are mostly useful when violating the RFCs, in order to workaround
servers that violate the RFCs.
This encapsulates the `@lex_state` change.
Specifically, "text" is allowed to begin with "[" or "=". Disallowing
that was an older RFC2060 restriction.

(RFC 2060 was written in 1996. RFC 3501 in 2003)  :)
`accept` can be used to replace `lookahead` + check `token.symbol` +
`shift_token`.  It's not being used here, but has been separated out
into its own commit to reduce merge conflicts from multiple branches
which use it.
`astring_chars` roughly matches the RFC2060 definition of `atom`, and is
used by RFC3501's `astring`.  `atom` matches the RFC3501 definition.
Although nothing in the parser currently uses `atom`, future commits
will update use it where it's used by RFC3501, RFC4466, etc.

Made a helper method, `combine_adjacent` which is used by both `atom`
and `astring_chars` to combine adjacent tokens into a single string.

It would probably be better to update the lexer regexps, possibly
using negative lookahead assertions, so that it returns a single token.
Always returns an array, even if empty.
@nevans nevans force-pushed the BADCHARSET-parsing-RFC3501-errata branch from 41c2aca to dcee03e Compare December 10, 2020 23:21
@shugo shugo merged commit 13dffe8 into ruby:master Mar 9, 2021
@shugo
Copy link
Member

shugo commented Mar 9, 2021

@nevans I've merged it. Thank you!

@nevans nevans deleted the BADCHARSET-parsing-RFC3501-errata branch March 19, 2021 18:16
@nevans nevans added the IMAP4rev1 Requirement for IMAP4rev1, RFC3501 label Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IMAP4rev1 Requirement for IMAP4rev1, RFC3501
Development

Successfully merging this pull request may close these issues.

None yet

2 participants