Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add structured results for all values using imap-proto #58

Merged
merged 8 commits into from
Feb 9, 2018
Merged

Add structured results for all values using imap-proto #58

merged 8 commits into from
Feb 9, 2018

Conversation

jonhoo
Copy link
Collaborator

@jonhoo jonhoo commented Nov 12, 2017

This PR modifies all methods on Client that return data from the server so that they return it in a structured format instead of as raw, multi-line strings. The resulting interface is much less error-prone, and saves clients from doing parsing themselves, as outlined in #28. It does this by using imap-proto to parse the server responses and then constructing appropriate types.

This PR does not yet provide a non-UTF-8 API (#54), as that will depend on the resolution to djc/imap-proto#5. It does, however, fix #28.


One particularly important thing introduced in this PR is the ZeroCopy type (in src/types/mod.rs), and it's a little finicky, so it deserves some special attention. imap-proto returns values that point into the original bytes that were parsed. This allows it to not copy strings and other large values unnecessarily. However, it also means that it's tricky to expose those values outside of a Client (since the lifetime of the returned reference would be ill-defined). ZeroCopy is a way around that, and operates similar to the owning-ref crate. It stores a Vec<u8>, and its parsed output, inside a single struct, and then derefs to the parsed struct. This is safe as long as the parsed struct never gives out references that outlive &self. Essentially, all accessor methods must be of the form:

fn get<'a>(&'a self) -> T + 'a

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.6%) to 61.348% when pulling 22e9fe5 on jonhoo:structured into 4332d09 on mattnenterprise:master.

@sanmai-NL
Copy link
Contributor

Great work. I'll test and review your branch too.

src/error.rs Outdated
&String::from(self.description()),
&data.join("\n")
),
Error::NoResponse(ref data) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two patterns can be merged Error::NoResponse(ref data) | Error::BadResponse(ref data).

status => Err((status, None)),
})
}
IResult::Done(..) => None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IResult::Done(..) | IResult::Incomplete(..) =>

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 17, 2017

I'm testing this PR against a real-world Exchange server. The imap-proto parsing seems to choke with Error(Alt) on a message, produced by the basic example code. Emulated by this test:

#[test]
fn parse_response() {
    use imap_proto::parse_response;
    let resp = parse_response(b"* 2 FETCH (BODY[TEXT] {6620}\r\n").unwrap();
    eprintln!("{:?}", resp);
}

Update:
It's a response to the command a4 FETCH 2 body[text]

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 62.007% when pulling c2ec0a1 on jonhoo:structured into 4332d09 on mattnenterprise:master.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 17, 2017

@sanmai-NL hmm, that looks like an issue for https://github.com/djc/imap-proto, no?

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 17, 2017

@jonhoo: Yeah I know. I think maybe this upstream issue results in a functional regression for this crate. Let's investigate before merging. Chances are the upstream issue can be resolved before landing this PR, in cooperation with @djc.

@djc
Copy link

djc commented Nov 17, 2017

That looks like an unbalanced parenthesis, or is that just the start of the message? What is the parser result you are expecting? It definitely doesn't look like a well-formed message.

@djc
Copy link

djc commented Nov 17, 2017

Ah, never mind, this just happens because BODY[TEXT] isn't implemented yet in the parser; otherwise it would return Needed, though, because this is still an incomplete message.

@djc
Copy link

djc commented Nov 18, 2017

djc/imap-proto@5b9dc9b adds support for BODY[TEXT] (and a number of other BODY variations).

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 18, 2017

@sanmai-NL can you confirm that the updated imap-proto fixes your issue?

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 18, 2017

Not yet unfortunately. There's progress though:

>> " * 2 FETCH (BODY[TEXT] {6620} " (as UTF-8 lossy); " [42, 32, 50, 32, 70, 69, 84, 67, 72, 32, 40, 66, 79, 68, 89, 91, 84, 69, 88, 84, 93, 32, 123, 54, 54, 50, 48, 125, 13, 10] " (as bytes)
>> " --_d999ac00-c320-4940-9907-c91bb78a97a4_ " (as UTF-8 lossy); " [45, 45, 95, 100, 57, 57, 57, 97, 99, 48, 48, 45, 99, 51, 50, 48, 45, 52, 57, 52, 48, 45, 57, 57, 48, 55, 45, 99, 57, 49, 98, 98, 55, 56, 97, 57, 55, 97, 52, 95, 13, 10] " (as bytes)
Error Fetching email 2: Unable to parse status response
>> " Content-Transfer-Encoding: quoted-printable " (as UTF-8 lossy); " [67, 111, 110, 116, 101, 110, 116, 45, 84, 114, 97, 110, 115, 102, 101, 114, 45, 69, 110, 99, 111, 100, 105, 110, 103, 58, 32, 113, 117, 111, 116, 101, 100, 45, 112, 114, 105, 110, 116, 97, 98, 108, 101, 13, 10] " (as bytes)
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parse(Invalid([67, 111, 110, 116, 101, 110, 116, 45, 84, 114, 97, 110, 115, 102, 101, 114, 45, 69, 110, 99, 111, 100, 105, 110, 103, 58, 32, 113, 117, 111, 116, 101, 100, 45, 112, 114, 105, 110, 116, 97, 98, 108, 101, 13, 10]))', /checkout/src/libcore/result.rs:906:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

It seems this can be resolved by not letting the IMAP protocol parser process actual data returned for a successful FETCH command.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 18, 2017

@sanmai-NL I'm not sure I understand what you mean -- can you elaborate?

@sanmai-NL
Copy link
Contributor

The IMAP protocol parser cannot parse "Content-Transfer-Encoding: quoted-printable". That seems logical, since that's actual data not protocol messages.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 18, 2017

Ahh, I see, it's trying to parse the contents of the BODY. Yeah, that doesn't seem right. I'm guessing this is another one for @djc then?

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 18, 2017

No, that's not the case. client::read_response_onto() must be adapted to not keep parsing once it gets an untagged FETCH response, in this case * 2 FETCH (BODY[TEXT] {6620}. So when a Response::Fetch comes in, rust-imap should break out of response parsing. It may then parse the e-mail and yield it, or yield the raw e-mail bytes if that is deemed out of scope.

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 18, 2017

You can use @staktrace's mailparse or @djc's email-parser, or @mikedilger's email-format. I believe mailparse is currently best maintained most feature complete, but I haven't done a thorough review and don't know about the ideas of these maintainers to perhaps join forces in some way, and not duplicate e-mail parsing efforts as well. 💯

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 18, 2017

@sanmai-NL that doesn't seem right -- read_response_onto is supposed to read an entire server response until its completion result line (Response::Done), including all untagged lines in between. This is so that the entire response can be passed to a function like parse_fetches (in src/parse.rs) which parses and extracts the individual FETCH results. BODY should just be a regular attribute given as an untagged response to FETCH according to §6.4.5, and shouldn't prevent correct parsing at all. parse_fetches also shouldn't choke on attributes it doesn't know about, it should just ignore them (see the match inside there).

The one place I could see this error stemming from is if imap_proto does not return Response::Fetch for some entry parsed as the result of a FETCH, but that seems incorrect. RFC3501 specifically stipulates that the response to a FETCH command should be a sequence of untagged FETCH responses.

I don't think this library should be responsible for parsing message responses. Instead, just the raw bytes of the body should be returned. Currently, this is only exposed through RFC822 responses (under the RFC822 attribute for a FETCH response), but parse_fetches could easily be augmented to also include data passed as BODY.

@sanmai-NL
Copy link
Contributor

@jonhoo: Yes I like your take on it better. It is indeed the responsibility of imap-proto to handle the e-mail data as part of the response.

Also agree that it's better if imap-proto would not parse the e-mails, at least as long as there isn't one golden standard e-mail message parsing crate.

@djc
Copy link

djc commented Nov 19, 2017

It seems like you're talking past each other. It seems to me neither imap-proto nor rust-imap should try to process or parse the raw byte data as returned by RFC822*, BODY[*], etc. So, we should probably just return [u8] slices to the user, who can then handle the parsing. As far as imap-proto is concerned, that's already what's happening; and from the messages above, it doesn't seem like imap-proto is implicated in the failure mode you're seeing.

@sanmai-NL
Copy link
Contributor

@djc, imap-proto returns a parse error on the e-mail though.

@djc
Copy link

djc commented Nov 19, 2017

@sanmai-NL imap-proto currently only provides a stateless parser, by which I mean it can only parse a full IMAP server response. It's not quite clear to me what that transcript is supposed to mean, but it looks like something is giving imap-proto different parts of an IMAP response in different parser calls -- that maybe a rust-imap bug in this PR, then?

@sanmai-NL
Copy link
Contributor

@djc, yeah, that's what it means. @jonhoo: As @djc suggests there's still some work to do around src/client.rs#L495.

@djc
Copy link

djc commented Nov 19, 2017

You should probably not rely on line endings anymore. Instead, keep a buffer and (a) grow it if the parser returns Incomplete, and (b) split it if the parser returns Done. BytesMut from the bytes crate should help doing this efficiently.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 19, 2017

I believe this should be fixed in 5a3ee10.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 62.338% when pulling 5a3ee10 on jonhoo:structured into 4332d09 on mattnenterprise:master.

@sanmai-NL
Copy link
Contributor

In my test everything works. 🙂

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 19, 2017

@mattnenterprise I believe this is now in the state where it can be reviewed and merged. It does change the API a fair bit, so it will require a major version bump, but I think it's the right direction to take the interface in.

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 25, 2017

@jonhoo: I’m trying to review -- haven’t done that before. I just started a review but it appears I only get to see an old revision, in which parse_many is not unsafe. So how can I review against the latest revision? E.g. I was wondering why parse_many has to be unsafe?

I so far am not able to get the BODY[], headers plus text, using the rust-imap API with your PR branch. Only the IMAP response message.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 25, 2017

I think the easiest way to review is to go to the "Files changed" tab, and then start adding comments. When you add the first comment, you can select "start a review", and then you can use the "Review changes" button in the top right to approve/reject the change at the end..

parse_many has to be unsafe because of ZeroCopy. ZeroCopy contains a Vec<u8>, and gives a &'static [u8] to the inner datastructure. However, that ref isn't really 'static; it is the same lifetime as the ZeroCopy struct itself, but there is no way to express that in Rust currently. Using ZeroCopy for a type T is only safe if T only has methods that return owned values, or values whose lifetime is tied to the lifetime of self. Since parse_many is generic over T, it is up to the caller to ensure that this is true for its chosen T.

I'm not sure what you mean by the "IMAP response message"? The PR in its current form does not actually parse out BODY. If you look at the Fetch struct, it only contains the message number, flags, UID, and RFC822 message contents. Any other attributes are ignored. Adding more attributes should be straightforward by adding an arm to https://github.com/mattnenterprise/rust-imap/pull/58/files#diff-258c0555e6f9c4bc4fba3a551d81638aR96

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 25, 2017

@jonhoo:
RFC 3501 says a FETCH can return a BODY as RFC 2822. Note that RFC 2822 supersedes RFC 822, let's talk about and refer in code to RFC 2822 to reduce confusion among contributors new to IMAP. See RFC 3501, 1.3:

IMAP was originally developed for the older [RFC-822] standard, and as a consequence several fetch items in IMAP incorporate "RFC822" in their name. With the exception of RFC822.SIZE, there are more modern replacements; for example, the modern version of RFC822.HEADER is BODY.PEEK[HEADER]. In all cases, "RFC822" should be interpreted as a reference to the updated [RFC-2822] standard.

What I meant is that it appears as though the IMAP response is being parsed and returned, yet not the BODY (e-mail, etc. whatever was requested in the command) that follows.

You pointed out the Fetch::rfc822 method to get what I was looking for. A problem is rfc822() consistently returns None in my tests. The command FETCH 1 BODY[] returns:

Ok([Fetch { message: 1, flags: [], uid: None, rfc822: None }])

Update: The seems to be due to a difference in FETCH response for BODY[] vs RFC822, see RFC 3502, 6.4.5, RFC822.

Another problem is that the rfc288 fields/method aren't documented in imap-proto not in rust-imap. The lack of documentation makes understanding what's going on and moving forward together a lot harder.

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 25, 2017

@jonhoo:

So I take it that with this PR applied you got an error as expected?

Confirmed: Err(BadResponse("Command Argument Error. 11"))

@jonhoo
Copy link
Collaborator Author

jonhoo commented Nov 25, 2017

@sanmai-NL the code (and Fetch::rfc822) operate specifically on the contents of the RFC822 attribute (which is what the original code also did). I agree this is probably not what it should do (we should just switch to BODY altogether), but that's a topic for another PR. The change should be straightforward I think.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Dec 5, 2017

ping @mattnenterprise again

Cargo.toml Outdated
@@ -28,6 +28,9 @@ path = "src/lib.rs"
native-tls = "0.1"
regex = "0.2"
bufstream = "0.1"
#imap-proto = "0.1"
imap-proto = { git = "https://github.com/djc/imap-proto.git" }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this use the git repo for this dependency instead of the one at crates.io ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the required changes from imap-proto have since been released. Removed in e687e85

@coveralls
Copy link

coveralls commented Jan 20, 2018

Coverage Status

Coverage decreased (-3.7%) to 85.877% when pulling 29fa1cf on jonhoo:structured into dc21eae on mattnenterprise:master.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Jan 23, 2018

ping @mattnenterprise

@mattnenterprise
Copy link
Owner

mattnenterprise commented Feb 8, 2018

@jonhoo Why did the coverage go to 0% ? Also this looks good to merge once that gets figured out.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Feb 9, 2018

Coverage went to 0% because of a Travis CI change: travis-ci/travis-ci#9061. The fix is to add sudo: required, as I did in dc21eae. However, I only made that change after I last pushed to this PR. I'll rebase and push, which should trigger the coverage check again.

Do note that this significantly changes the API, so a major version bump would be needed.

@mattnenterprise mattnenterprise merged commit 590b80e into mattnenterprise:master Feb 9, 2018
@mattnenterprise
Copy link
Owner

I have bumped it from 0.6.0 -> 0.7.0 as we are still on major version zero. If we believe the API should be considered stable now then we can always bump up to 1.0.0.

@jonhoo
Copy link
Collaborator Author

jonhoo commented Feb 9, 2018

@mattnenterprise I don't know that the README example works any more after merging this. You may want to look into cargo-readme, as it lets the example in the README be a doctest in the library docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More involved parsing
5 participants