Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mox corrupting 822 email part 0.0.9 #117

Closed
haraldrudell opened this issue Jan 15, 2024 · 8 comments
Closed

mox corrupting 822 email part 0.0.9 #117

haraldrudell opened this issue Jan 15, 2024 · 8 comments

Comments

@haraldrudell
Copy link

I have a 32,262 byte message that mox is corrupting by inserting an extra U+000D
The insertion is at index 12,973
The corruption is in the html mime part. There are lots of lines, but the message is always corrupted at the same place
— at the end of a particular line, a CR is converted to CR CR

version: 0.0.8

with this bug mox cannot be used
an imap server corrupting messages is not OK

— I was able to verify that on the socket sending to mox, the extra CR is not present: mox is receiving good data
— In mox file-system store, the extra CR is present. mox is storing BAD data after receiving good data
— on retrieving the message from mox, the length is correct but the inserted CR causes the final U+000A to be lost

mox should not be tampering or filtering the opaque rfc822 bytes
if mox gets corrupt rfc822 in, it should send the same corrupt rfc822 out

— I would suggest to implement sha256 hashing that would detect corrupt data
— this would detect software bugs or file system errors
— use a type with both the byte slice and the hash, say first 8 character-array, then always have them go together
— if the server reads corrupt data from disk, it should stop serving requests
— I have also seen other people using Go scanners intended for text on their sockets, that may not work for imap

@haraldrudell
Copy link
Author

I found this on precompiled mox 0.0.8 on Linux amd64

It is not present on compile-from-source 0.0.9 arm64 macOS 1d9e80f

@haraldrudell
Copy link
Author

haraldrudell commented Feb 4, 2024

This still exists after upgrade to 0.0.9 linux-amd precompiled
— for some emails 32 KiB or larger but often 100 KiB, mox modifies exactly one line ending from crlf to crcrlf
— a particular mail fails every time. Almost always it is the same line. It settles on a line, after that it is always the same line
— when it happens, it has been determined in socket taps that the crcrlf sequence does not appear on the socket to mox
— the crcrlf sequence can be found in files in mox’ msg directory
— crcr cannot appear in imap or email and it does not exist in Linux/macOS text files
— with 0.0.8 the same length was returned. In 0.0.9 the length+1 is returned APPEND/UID FETCH
— if a Go test Write/ReadBytes/ReadFull client is used, it does not appear
— with the previous 32 KiB email it happened for a non-compliant 32 KiB non-synchronizing literal
— for the larger 100 KiB email it happens with synchronizing literals

So far it only happens when using a TLS connection across a network segment to Linux/amd

A key issue is that it happens at a line ending in some mime part 64 KiB or more into the email
— why would any code dig through a literal for a line ending?
the email bytes is an opaque byte-slice
additionally, the cr is inserted, pushing subsequent bytes, which is an unusual low-performance operation for a Go slice

Because it is always at a line ending, this is unlikely to be a socket or networking issue
therefore, it likely has to do with timing.
Also, the client obviously sends the whole literal in one uninterrupted thread-safe Write invocation.
So this is likely in some weird processing on the mox server
between receive and write to disk
— a particular clue is that the cr is INSERTED
— a share of Go bytes-slice underlying array would be overwrite, not insert
— the second clue is that is ALWAYS AT LINE ENDING
— a random problem would happen anywhere in the bytes, not always at a line ending
so it is line-wise processing and merging of byte slices inside a huge opaque byte-slice that mox should not be tampering with

The msg directory can be examined by

cd ~mox/…/msg
find . -type f -exec cat -v {} \; | fgrep "^M^M"
=20=20=20=20=20=20=20=20=20=20=20=20^M^M

one can also examine files:

cat -eutv 1,835.txt | less -N
 1 Delivered-To: joe@example.com^M$

in less type a search:
/\^M\^M [return]
1746 =20=20=20=20=20=20=20=20=20=20=20=20^M^M$

@haraldrudell haraldrudell reopened this Feb 4, 2024
@haraldrudell
Copy link
Author

haraldrudell commented Feb 4, 2024

Actually the client used, which is the least garbage among all the Go imap garbage out there,
sends via bufio.Writer which may send in chunks of 64 KiB. This is not the most efficient, involves copying and futile efforts and fragmented writes

if the delay between two such chunks is longer than the time to receive a chunk, then mox will receive chunks separately

possibly, mox is unable to handle receiving a literal across multiple read invocations, and mox does that reception somehow involving newlines and therefore erroneously inserts a cr. possibly

The changed bug behavior in 0.0.9 changes the byte-slice length
An assertion could be implemented that checks:
— whether the length of the processed literal is longer than expected
— whether the literal ends with a lone cr that has lost its lf, the 0.0.8 bug behavior
— mox should accept corrupt rfc822, however, if mox itself by software deficiency loses the final lf, such check is worthwhile until the issue has been resolved

Update:
after changing the client to do a single uninterrupted unbuffered thread-safe Write for each literal, the problem remains
— mox may still receive chunks separately for literals exceeding 1,500 byte mtu or tcp packet length of 64 KiB, which the failing email is

running against a temporary-storage mox via bidirectional stream does not fail either

@haraldrudell
Copy link
Author

haraldrudell commented Feb 5, 2024

TOTAL BUG

server.go:2729 just read the literal to file correct length msize
server.go:2778 about to write to store: length is +1 m.Size

the bug is message.NewWriter that parses literal and inserts \r. in the LITERAL? that’a a NONO
clearly, message.Writer.Write invocations occur on the split of"\r\n" so dumb code thinks lone lf

@haraldrudell
Copy link
Author

Here’s a fix to the very serious fatal not good indeed email corruption bug
— root cause: message.NewWriter cannot handle chunking into multiple Write invocations
— more bad news: mox should NEVER NEVER tamper with the rfc822 field
— even more badies: there’s some byte sniffing to figure things out. In Go, if anyone wants to know what’s in an email, the enmimed library is used

diff --git a/imapserver/server.go b/imapserver/server.go
index 3b6dc77..e4f3cb4 100644
--- a/imapserver/server.go
+++ b/imapserver/server.go
@@ -2725,13 +2725,19 @@ func (c *conn) cmdAppend(tag, cmd string, p *parser) {
        }()
        defer c.xtrace(mlog.LevelTracedata)()
        mw := message.NewWriter(msgFile)
-       msize, err := io.Copy(mw, io.LimitReader(c.br, size))
+       var b = make([]byte, size)
+       _, err = io.ReadFull(io.LimitReader(c.br, size), b)
+       msize, e := mw.Write(b)
+       b = nil
+       if err == nil && e != nil {
+               err = e
+       }
        c.xtrace(mlog.LevelTrace) // Restore.
        if err != nil {
                // Cannot use xcheckf due to %w handling of errIO.
                panic(fmt.Errorf("reading literal message: %s (%w)", err, errIO))
        }
-       if msize != size {
+       if int64(msize) != size {
                xserverErrorf("read %d bytes for message, expected %d (%w)", msize, size, errIO)
        }
 
@@ -2772,7 +2778,7 @@ func (c *conn) cmdAppend(tag, cmd string, p *parser) {
                                Received:      tm,
                                Flags:         storeFlags,
                                Keywords:      keywords,
-                               Size:          mw.Size,
+                               Size:          size,
                        }
 
                        ok, maxSize, err := c.account.CanAddMessageSize(tx,  m.Size)

@haraldrudell haraldrudell changed the title mox corrupting 822 email part mox corrupting 822 email part 0.0.9 Feb 5, 2024
@mjl-
Copy link
Owner

mjl- commented Feb 8, 2024

thanks for staying on this!

— root cause: message.NewWriter cannot handle chunking into multiple Write invocations

i suspect this is the real problem, and what needs fixing. the cr/lf accounting in message.Writer.Write may be lacking.

— more bad news: mox should NEVER NEVER tamper with the rfc822 field

mox requires all messages in the store to end with CRLF. in part because IMAP servers are required to serve messages that way, and in part because the message mime parser would become more complicated. i.e. the more you accept and store malformed messages, the more it ripples through other pieces of code, making them more complicated and leading to bugs, including security issues.

— even more badies: there’s some byte sniffing to figure things out. In Go, if anyone wants to know what’s in an email, the enmimed library is used

part of the reason mox has its own mime handling code: the parsed form of a message (including byte offsets to the parts) are stored in the message index database. it's all a bit more integrated than in many other software. question: which byte sniffing in mox do you think has issues, and which issues?

i'm going to dive into that message.Writer, and figure out where exactly the problem is.

mjl- added a commit that referenced this issue Feb 8, 2024
…riting a message

message.Writer.Write() adds missing \r's, but the buffer of "last bytes
written" was only being updated while writing the message headers, not while
writing the body. so for Write()'s in the body section (depending on
buffering), we were compensating based on the "last bytes written" as set
during the last write in the header section. that could cause a spurious \r to
be added when a Write starts with \n while the previous Write did properly
end with \r.

for issue #117, thanks haraldrudell for reporting and investigating
@mjl-
Copy link
Owner

mjl- commented Feb 8, 2024

i reproduced the problem of the spurious \r and wrote a test that is only triggered without the fix.

@mjl-
Copy link
Owner

mjl- commented Mar 9, 2024

v0.0.10 is coming soon, it has instructions for finding & fixing up messages that have the \r\r\n line endings. Closing this for now, we can reopen if needed.

@mjl- mjl- closed this as completed Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants