Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAMES-2997 Extraction of charset cary over in Attachment content type #3316

Closed
wants to merge 8 commits into from

Conversation

chibenwa
Copy link
Member

To retrace the issue:

#3061 (comment)

rouazana 2 days ago Member

why the charset is lost?

chibenwa 2 days ago Author Member

I think because the test no longer position it upon upload.

I can take extra time to check this.

#3061 (comment)

Please note that in previous code some tests where passing because "SetMessages was not returning the values actually stored"

This is due to the buggy shortcut of metadata attachment passing in SetMessageMethodTest

We need the MIMEMessageConverter & MessageParser to carry out charset information.

This logic leads to the code presented here.

#3061 (comment)

rouazana 12 hours ago Member

why only charset must be preserved? Have you a rfc reference somewhere?

chibenwa 3 hours ago Author Member

https://jmap.io/spec-core.html#uploading-binary-data

type: String The media type of the file (as specified in [@!RFC6838], Section 4.2) as set in the Content-Type header of the upload HTTP request.

Looks like it should be the over way around: only keep the mimeType and always discard other informations.

#3061 (comment)

Let's discuss the fundamentals of keeping the charset at all before takling your other comments.

(I would be in favor of merging existing work without JAMES-2997 MessageParser & MIMEMessageConverter should preserve charset )

Would you agree?

Here is the separate pull request ;-)

My position is:

  • close this pull request.
  • Charset should be discarded upon uploads

@rouazana
Copy link

Looks like it should be the over way around: only keep the mimeType and always discard other informations.

I don's see where you read this. This section of the RFC is linked to RFC 6838, which has a chapter about encoding (4.8).

Let's suppose you upload a text file with content type text, charset iso-8859-1, and then you download it with content type text, charset utf-8. What is supposed to happen?

Now I'm a bit worry about the merge of #3061 ...

Anyway if you cannot set a different download charset, you should at least be returned the charset that was use for uploading, else your interpretation of the file can vary...

@chibenwa
Copy link
Member Author

Now I'm a bit worry about the merge of #3061 ...

The merge don't change anything. It don't affect stored things. It just removed a lye (where SetMessages returns the attachment contentType intended to be used, and not the actual one).

@chibenwa
Copy link
Member Author

Actually we should be thankful: removing the lie allow us to see that issue :-)

@chibenwa
Copy link
Member Author

Let's suppose you upload a text file with content type text, charset iso-8859-1, and then you download it with content type text, charset utf-8. What is supposed to happen?

Then if charset option is valid in content-type field, let's merge this.

@chibenwa chibenwa changed the title JAMES-2997 Extraction of cherset cary over in Attachment content type JAMES-2997 Extraction of charset cary over in Attachment content type Apr 21, 2020
@chibenwa
Copy link
Member Author

Ok, it's clear to me that charset is part of the RFC definition of media type,n and that this change is legitimate.

@rouazana
Copy link

can you retrieve the comments I made on the other PR?

@chibenwa
Copy link
Member Author

can you retrieve the comments I made on the other PR?

#3061 (comment)

...n/java/org/apache/james/mailbox/store/mail/model/impl/MessageParser.java
Outdated
@@ -156,7 +160,22 @@ private ParsedAttachment retrieveAttachment(Entity entity) throws IOException {
    }

    private Optional<String> contentType(Optional<ContentTypeField> contentTypeField) {
        return contentTypeField.map(ContentTypeField::getMimeType);
        return contentTypeField.map(this::contentTypePreserveCharset);
@rouazana
rouazana 23 hours ago Member

why only charset must be preserved? Have you a rfc reference somewhere?

#3061 (comment)

mailbox/store/src/test/resources/eml/charset.eml
Outdated
LVVTOldpbGwgYmUgYmV0dGVyIGZvciBtZVwsIHRoeFxuDQpFTkQ6VkVWRU5UDQpFTkQ6VkNBTEVO
REFSDQo=
--_000_AM5P190MB03542A58E344F2C68475A951AFD20AM5P190MB0354EURP_--
@rouazana
rouazana 23 hours ago Member

what about having a test with different charsets in different attachments?

#3061 (comment)

...t/java/org/apache/james/jmap/draft/methods/MIMEMessageConverterTest.java
Outdated
@@ -645,6 +645,54 @@ void convertToMimeShouldAddAttachment() throws Exception {
                });
        }

        @Test
        void test() throws Exception {
@rouazana
rouazana 23 hours ago Member

name?

@chibenwa
Copy link
Member Author

why only charset must be preserved? Have you a rfc reference somewhere?

We can preserve all if you prefer.

Previous implementations where not preserving charset upon:
 - composing a message via JMAP
 - storing an attachment related to a mailbox

Note that JMAP attachment upload was preserving charset.

**master** is also affected by this issue but the problem was hidden as
SetMessages was returning the expected attachment metadata and not the
actual one.
@mbaechler
Copy link

  • storing an attachment related to a mailbox

What does it mean?

@chibenwa
Copy link
Member Author

storing an attachment related to a mailbox

What does it mean?

I miss context for explaining the quote. A ctrl+f on it did not highlight anything.

.body(createdPath + ".attachments", hasSize(2))
.extract().asString();

assertThatJson(json)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just assert the few paths we want to check rather than setting so many parameters to exclude things?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to do that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonpath + individual assertions

@rouazana
Copy link

why only charset must be preserved? Have you a rfc reference somewhere?

We can preserve all if you prefer.

That's not a question of preference. That's a question of rightness. We are implementing protocols, what is the right solution? Personally I don't know, that's why I ask.
As you carry this change, I ask you to search for the good solution, respecting standards. If you don't know and don't find the right information in the RFC, you can look at what other products are doing or ask on different mailing lists for help. But you should not care of my personal preference.

@chibenwa
Copy link
Member Author

[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [ERROR] src/main/java/org/apache/james/mailbox/store/mail/model/impl/MessageParser.java:[22,15] (imports) UnusedImports: Unused import - org.apache.james.mime4j.dom.field.ContentTypeField.PARAM_CHARSET.
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [INFO] 
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [INFO] <<< maven-source-plugin:3.1.0:jar (attach-sources) < generate-sources @ james-server-mailetcontainer-api <<<
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [INFO] 
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [INFO] 
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [INFO] --- maven-source-plugin:3.1.0:jar (attach-sources) @ james-server-mailetcontainer-api ---
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [INFO] Building jar: /james-project/server/mailet/mailetcontainer-api/target/james-server-mailetcontainer-api-3.6.0-SNAPSHOT-sources.jar
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [ERROR] src/main/java/org/apache/james/mailbox/store/mail/model/impl/MessageParser.java:[29,8] (imports) UnusedImports: Unused import - java.util.Map.
[489fbe6b9cb3d671920edfcc8628d2628d6eadd0] [ERROR] src/main/java/org/apache/james/mailbox/store/mail/model/impl/MessageParser.java:[44,8] (imports) UnusedImports: Unused import - org.apache.james.mime4j.field.Fields.

@chibenwa
Copy link
Member Author

[fcaa50f2e9dd9ade4a14c02c3abdcb3c6f0dd280] [ERROR] Failures: 
[fcaa50f2e9dd9ade4a14c02c3abdcb3c6f0dd280] [ERROR]   CassandraGetMessagesMethodTest 
[fcaa50f2e9dd9ade4a14c02c3abdcb3c6f0dd280] Expecting:
[fcaa50f2e9dd9ade4a14c02c3abdcb3c6f0dd280]  <"image/jpeg; name="4037_014.jpg"">
[fcaa50f2e9dd9ade4a14c02c3abdcb3c6f0dd280] to be equal to:
[fcaa50f2e9dd9ade4a14c02c3abdcb3c6f0dd280]  <"image/jpeg">
[fcaa50f2e9dd9ade4a14c02c3abdcb3c6f0dd280] but was not.

Copy link

@mbaechler mbaechler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks ok (there's only the complex assertion that I would like to be fixed)

@chibenwa
Copy link
Member Author

(there's only the complex assertion that I would like to be fixed)

I had a look. I don't see easy wins

@chibenwa
Copy link
Member Author

DiscardProtocolTest.testRecord:84 » Timed out wait for be notified that read ...

test this please

@chibenwa chibenwa added the waiting_merge We are about to merge this! label Apr 24, 2020
@chibenwa
Copy link
Member Author

Merged

@chibenwa chibenwa closed this Apr 27, 2020
@chibenwa
Copy link
Member Author

RFC notes on this work:

https://jmap.io/spec-core.html#uploading-binary-data

Defines type as :

type: String The media type of the file (as specified in [@!RFC6838], Section 4.2) as set in the Content-Type header of the upload HTTP request.

(hence my initial mis-understanding)

https://tools.ietf.org/html/rfc6838 only defined registration procedures and is thus of little help.

https://tools.ietf.org/html/rfc2045#section-5 helps

   The purpose of the Content-Type field is to describe the data
   contained in the body fully enough that the receiving user agent can
   pick an appropriate agent or mechanism to present the data to the
   user, or otherwise deal with the data in an appropriate manner. The
   value in this field is called a media type.

Espacially the last part:

The
   value in this field is called a media type.

Thus the JMAP quote above means "use the full header"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting_merge We are about to merge this!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants