New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart text messages not returned fully by ->getMessageBody() #163

Closed
dorianfm opened this Issue Oct 13, 2017 · 7 comments

Comments

Projects
None yet
2 participants
@dorianfm
Copy link
Contributor

dorianfm commented Oct 13, 2017

I've noticed that if you have a message which has an inline attachment only the initial text of that email is returned, rather than all text parts concatenated together, and there is no straightforward way - that I can see - to get the concatenated body with the existing code.

For my purposes I've hacked Parser->getMessageBody() as follows:

    if (in_array($type, array_keys($mime_types))) {
        foreach ($this->parts as $partId => $part) {
            if ($this->getPart('content-type', $part) == $mime_types[$type]
                && $this->getPart('content-disposition', $part) != 'attachment'
                && !$this->partIdIsChildOfAnAttachment($partId)
                ) {
                $headers = $this->getPart('headers', $part);
                $encodingType = array_key_exists('content-transfer-encoding', $headers) ?
                $headers['content-transfer-encoding'] : '';
                if (is_array($encodingType)) {
                    $encodingType = $encodingType[0];
                }
                $undecoded_body = $this->decodeContentTransfer($this->getPartBody($part), $encodingType);
                $body .= $this->charset->decodeCharset($undecoded_body, $this->getPartCharset($part));
            }
        }
    } else {
        throw new Exception('Invalid type specified for getMessageBody(). "type" can either be text or html.');
    }

I.E. set it not to break after the first matching part and to concatenate onto $body.

which works for me, for text parts, and HTML emails I've tested, but not sure this is clean enough for a PR - it certainly passes the existing tests, but the change in how this works may be a deal breaker for some. Any thoughts much appreciated.

(PS. about to go on holiday for a fortnight so won't check back on this for a while, I'm not ignoring you)

@eXorus

This comment has been minimized.

Copy link
Member

eXorus commented Oct 14, 2017

Hello,

Thanks for your contribution, could you send me the raw email and give a link to the RFC that say that we need to concatenate the body ?

@eXorus eXorus added the question label Oct 14, 2017

@dorianfm

This comment has been minimized.

Copy link
Contributor

dorianfm commented Oct 24, 2017

I’m not aware of an RFC that explicitly says you need to concatenate the body, or one that says you should not concatenate the body - I'm not really familiar with the RFCs at all.

https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html does say that "A body part is NOT to be interpreted as actually being an RFC 822 message” which, from my interpretation, implies that the whole idea of their being a getMessageBody() method is possibly wrong, as there is no message body, only a pre-amble and multiple parts.

Looking at the source it seems that getMessageBody() and getAttachments() are filtering the content in different ways, and that perhaps there should be a complimentary method, eggetInlineParts($type = 'text') which returns an array of parts that are either 'text/plain' or 'text/html', similar to how getMesageBody($type ='text') works, but returning similarly to getAttachments(), allowing the user to then handle it the response more appropriately in their application.

If you think that's a better approach I'll look at creating some text cases and try to implement it and make a PR.

I attach a raw email example, which is an actual sent email which I've edited the headers on to hide personal information.

Multiple text blocks.txt

@eXorus

This comment has been minimized.

Copy link
Member

eXorus commented Oct 24, 2017

Thanks you.

I don't know yet if it's a good idea. The purpose of this lib is to be easy to understand, when I parse an email I only have one text version and on html version. If you want to have the technical view of the email you can use directly mailparse for that.

But I understand your reply and it's true that there is no sense for getMessageBody().

Maybe another approach will be to get all others text or html in attachment of the email. The first text or html is the main body and others are attachments.

Another solution could be to have
getMessageBody() that called getInlineParts()[0]
public method getInlineParts() if you need it.

@dorianfm

This comment has been minimized.

Copy link
Contributor

dorianfm commented Oct 24, 2017

Thanks for the feedback, I totally understand the purpose as digging into MailParse can be a bit daunting and a bit of a chore, hence why I chose your library and initially suggested the kludge above without thinking it through.

I like either of the solutions you suggest, though would tend towards the latter,

getMessageBody() that called getInlineParts()[0]
public method getInlineParts() if you need it.

As that would fit my use case better. I'll fork and give that a shot and you can see what you feel about merging it.

@eXorus

This comment has been minimized.

Copy link
Member

eXorus commented Oct 24, 2017

Ok let's do it, I will review your PR.

I love to have contribution :)

dorianfm added a commit to theusefularts/php-mime-mail-parser that referenced this issue Oct 24, 2017

@dorianfm

This comment has been minimized.

Copy link
Contributor

dorianfm commented Oct 26, 2017

hey @eXorus I've made the PR and been testing it in production (admittedly not very high load production!) and works well for me, and seems to pass all tests and work compatibly with the old interfaces. Let me know if I messed anything up!

eXorus added a commit that referenced this issue Nov 2, 2017

@eXorus

This comment has been minimized.

Copy link
Member

eXorus commented Nov 2, 2017

@dorianfm Thanks you so much for your help on this issue.

@eXorus eXorus closed this Nov 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment