Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mailbox parsing using chumsky #839

Merged
merged 31 commits into from Feb 20, 2023
Merged

Conversation

soywod
Copy link
Contributor

@soywod soywod commented Dec 10, 2022

Since #827 #832 and #838, I thought about improving the parsing using https://github.com/zesterer/chumsky. This lib helps to build solid parsers and to be as close as possible to what RFCs propose.

Here (in this draft PR) a preview of what it could look like, is it sth you would be interested in or not?

@paolobarbolini
Copy link
Member

Thanks for all of the contributions you've been sending lately.
I'm a bit worried about all of the allocations, and I'll have to learn to read chumsky's style, but overall looks good 👍

@soywod
Copy link
Contributor Author

soywod commented Dec 17, 2022

I'm a bit worried about all of the allocations, and I'll have to learn to read chumsky's style

I totally understand, that's why: please think well if you really want to add those changes to your lib and let me know.

My 2 cents opinion: using parser combinators is losing a bit of performance for a more maintainable/readable/less buggy program, and I find it really efficient for parsing RFCs: you just need to go through sections, develop tiny parsers, test them, and then compose them together.

If you do not want to add those changes directly inside your lib, I can also create a separated lib (in case you accept using chumsky).

Think well and let me know!

@paolobarbolini
Copy link
Member

I think the performance penalty is worth it for for as you said maintainability and readability and I'd be happy to have it in lettre

@soywod
Copy link
Contributor Author

soywod commented Dec 17, 2022

Awesome 🎉 I let you know when the PR is ready.

@soywod
Copy link
Contributor Author

soywod commented Dec 17, 2022

I saw a test in the code base:

#[test]
fn format_single_with_utf8_name() {
    let from = vec!["Кайо <kayo@example.com>".parse().unwrap()];

    let mut headers = Headers::new();
    headers.set(From(from.into()));

    assert_eq!(
        headers.to_string(),
        "From: =?utf-8?b?0JrQsNC50L4=?= <kayo@example.com>\r\n"
    );
}

and from the RFC the name part of a mailbox cannot contain non-ASCII chars, so parsing Кайо <kayo@example.com> leads to an error. My question is: (1) do you need to parse real mailbox according to the RFC or (2) do you need to parse pseudo-mailbox that can be converted into RFC compliant mailbox? If I want to make all actual tests pass we need the option (2).

EDIT: I will go for the (2) and see how it behaves.

src/address/types.rs Outdated Show resolved Hide resolved
src/message/header/mailbox.rs Outdated Show resolved Hide resolved
@soywod soywod marked this pull request as ready for review December 17, 2022 19:17
@soywod soywod changed the title [POC] Improve parsing using chumsky Improve mailbox parsing using chumsky Dec 17, 2022
@soywod
Copy link
Contributor Author

soywod commented Dec 17, 2022

I guess the PR is ready, I left some comments for you to get more context.

Copy link
Contributor

@kevincox kevincox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see the use of a proper parser builder. I haven't reviewed the actual parsing code closely but have some meta comments.

Cargo.toml Outdated Show resolved Hide resolved
src/address/types.rs Outdated Show resolved Hide resolved
src/address/types.rs Outdated Show resolved Hide resolved
src/message/mailbox/parsers.rs Outdated Show resolved Hide resolved
src/message/mailbox/serde.rs Show resolved Hide resolved
src/message/mailbox/parsers.rs Outdated Show resolved Hide resolved
src/message/mailbox/parsers.rs Outdated Show resolved Hide resolved
@soywod
Copy link
Contributor Author

soywod commented Jan 6, 2023

@kevincox thanks a lot for your review, I will reply whenever I can. I let you know.

@kevincox
Copy link
Contributor

kevincox commented Jan 6, 2023

No rush :)

@soywod
Copy link
Contributor Author

soywod commented Jan 8, 2023

@kevincox ready for another review whenever you can 🙂

Copy link
Contributor

@kevincox kevincox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is looking really good. I think the only real blocking issue is how flexible we want to be with parsing mailboxes.

src/message/header/mailbox.rs Show resolved Hide resolved
src/message/mailbox/parsers/rfc2234.rs Outdated Show resolved Hide resolved
src/message/mailbox/parsers/rfc2234.rs Show resolved Hide resolved
src/message/mailbox/parsers/rfc2234.rs Show resolved Hide resolved
src/message/mailbox/parsers/rfc2822.rs Outdated Show resolved Hide resolved
src/message/mailbox/parsers/rfc2822.rs Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
benches/mailbox_parsing.rs Outdated Show resolved Hide resolved
@paolobarbolini
Copy link
Member

Thanks for all of the work you're doing on this @soywod and @kevincox.

I went through the code and the conversations and this is an awesome change ❤️

@soywod
Copy link
Contributor Author

soywod commented Jan 14, 2023

@kevincox @paolobarbolini any other changes in mind? If not then I guess it is ready for merge 🙂

@kevincox
Copy link
Contributor

Did you push the reduction of Vec allocation? #839 (comment) I don't see these changes.

@paolobarbolini
Copy link
Member

I'm preparing a 0.10.2 release at #853, after which we'll also make a 0.11 pre-release with this

@soywod
Copy link
Contributor Author

soywod commented Jan 24, 2023

Did you push the reduction of Vec allocation? #839 (comment) I don't see these changes.

Not yet indeed, I will do those changes during this week.

I'm preparing a 0.10.2 release at #853, after which we'll also make a 0.11 pre-release with this

Awesome 🙂

@soywod
Copy link
Contributor Author

soywod commented Jan 28, 2023

I removed parsers related to comments (I guess lettre mailboxes should never contain comments) and simplified white space folding, the perfs gain is insane:

$ cargo bench --bench mailbox_parsing 
   Compiling lettre v0.10.1 (/home/soywod/code/lettre)
    Finished bench [optimized] target(s) in 3.65s
     Running benches/mailbox_parsing.rs (target/release/deps/mailbox_parsing-382c53ca32910006)
parse single mailbox    time:   [4.0927 µs 4.1017 µs 4.1130 µs]
                        change: [-81.903% -81.857% -81.812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

parse multiple mailboxes
                        time:   [11.352 µs 11.372 µs 11.390 µs]
                        change: [-76.088% -76.020% -75.958%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

@soywod
Copy link
Contributor Author

soywod commented Jan 28, 2023

Ready for the last review!

examples/smtp_selfsigned.rs Outdated Show resolved Hide resolved
src/message/mailbox/parsers/rfc2822.rs Outdated Show resolved Hide resolved
src/message/mailbox/parsers/rfc2822.rs Outdated Show resolved Hide resolved
@soywod
Copy link
Contributor Author

soywod commented Jan 28, 2023

Ready for merge then 🎉 thanks a lot @kevincox for you help!

@paolobarbolini
Copy link
Member

I released 0.10.2. I'll give another look at the PR and then we can proceed.

@paolobarbolini
Copy link
Member

BTW I did RUSTFLAGS="--cfg lettre_ignore_tls_mismatch" cargo clippy --all-features --all-targets --fix on master so merging the latest changes should cleanup the diff.

@soywod
Copy link
Contributor Author

soywod commented Jan 29, 2023

BTW I did RUSTFLAGS="--cfg lettre_ignore_tls_mismatch" cargo clippy --all-features --all-targets --fix on master so merging the latest changes should cleanup the diff.

Done, indeed the diff reduced and the CI seems to pass!

@soywod
Copy link
Contributor Author

soywod commented Feb 5, 2023

Just a question. I plan to extract this piece of code in an external lib (because I have the same need in another crate and I do not want to include all lettre), would you be interested to use it or to keep the ownership of the code?

Forget about it, plans changed!

@soywod
Copy link
Contributor Author

soywod commented Feb 7, 2023

I released 0.10.2. I'll give another look at the PR and then we can proceed.

Any news about this?

@paolobarbolini
Copy link
Member

I'm about to release 0.10.3 just so I don't leave things sitting on master for too long and then together with #861 I'll merge this and either immediately or in the coming weeks release it as 0.11.0-alpha.1

@paolobarbolini paolobarbolini merged commit 5f37b66 into lettre:master Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants