Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MOB] JAMES-2997 Remove byte array from attachment #3061

Conversation

chibenwa
Copy link
Member

Massive effort to port our MOB into a viable pull request (wooot!)

I did my best to clean up the (very) messy history inherited from the MOC but did not managed to renabe test as part of primary work.

@chibenwa chibenwa added perf Contributes some performance enhencements requires-load-testing Running load testing to back these changes is required for adoption labels Jan 22, 2020
@chibenwa chibenwa added this to the Sprint 14 - Robusta beans milestone Jan 22, 2020
@chibenwa chibenwa self-assigned this Jan 22, 2020
@chibenwa chibenwa force-pushed the mob-programming-remove-byte-array-from-attachment branch 2 times, most recently from 9e30bde to ea3b883 Compare January 22, 2020 06:39
Copy link
Member

@Arsnael Arsnael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read it

@Arsnael
Copy link
Member

Arsnael commented Jan 22, 2020

[ea3b88335f055649169f02ea3c5f0f3dd2958700] [ERROR] Failures: 
[ea3b88335f055649169f02ea3c5f0f3dd2958700] [ERROR]   IndexableMessageTest.hasAttachmentsShouldReturnTrueWhenPropertyIsPresentAndTrue:334 
[ea3b88335f055649169f02ea3c5f0f3dd2958700] Expecting:
[ea3b88335f055649169f02ea3c5f0f3dd2958700]  <false>
[ea3b88335f055649169f02ea3c5f0f3dd2958700] to be equal to:
[ea3b88335f055649169f02ea3c5f0f3dd2958700]  <true>
[ea3b88335f055649169f02ea3c5f0f3dd2958700] but was not.

@trantienduchn
Copy link

It's too big. can you split it into readable parts?

@chibenwa
Copy link
Member Author

It's too big. can you split it into readable parts?

Note sure about how to do that: it comes as a "big ball of mud", and without disabling features, it will be hard to get that done.

Copy link

@rouazana rouazana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall nice, but:

  • I'm pretty sure you can extract some refactoring steps into other PR
  • do you have some gatling tests to see if it improves thnigs?
    (I've not finished reviewing all the changes)

@@ -73,7 +73,7 @@ static CassandraMailboxManager createMailboxManager(CassandraMailboxSessionMappe
SessionProviderImpl sessionProvider = new SessionProviderImpl(mock(Authenticator.class), mock(Authorizator.class));

QuotaComponents quotaComponents = QuotaComponents.disabled(sessionProvider, mapperFactory);
MessageSearchIndex index = new SimpleMessageSearchIndex(mapperFactory, mapperFactory, new DefaultTextExtractor());
MessageSearchIndex index = new SimpleMessageSearchIndex(mapperFactory, mapperFactory, new DefaultTextExtractor(), null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why null ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not bother building a complicated value.


import org.apache.commons.lang3.NotImplementedException;

public class SizeInputStream extends InputStream {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably can use ByteSource for this kind of use cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ByteSource is not applicable for any kind of InputStream (think JMAP attachment download).

Should we add a copy to a temporary file in the path?


return mapperFactory.getMessageMapper(session).execute(() -> {
storeAttachment(message, messageAttachments, session);
List<MessageAttachment> attachments = storeAttachments(messageId, content, session);
MailboxMessage message = createMessage(internalDate, size, bodyStartOctet, content, flags, propertyBuilder, attachments);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think that it's not the responsibility of storeAttachments to generate the MessageAttachement ids, It makes everything much more complex than it should be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a common pattern we have to have the storage layer generating ids

I don't really alternatives regarding this...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the same problem with blob-store recently. We decided to give the responsibility to the client. It's IMO a very important design decision.
In the case of this PR it will make some things simpler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I agree. It has not been fixed?


public AttachmentWithBytes(Attachment attachment, byte[] bytes) {
this.attachment = attachment;
this.bytes = bytes;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it relate to the commit title?

@chibenwa chibenwa force-pushed the mob-programming-remove-byte-array-from-attachment branch from 2067ed0 to 8e383a3 Compare February 4, 2020 04:52
@chibenwa
Copy link
Member Author

chibenwa commented Feb 4, 2020

(forced pushed to try to get a 🍏 build... - there was a rebase conflict )

@Arsnael
Copy link
Member

Arsnael commented Feb 6, 2020

tests are 🍏

[5cc9699085f400f2ee5ef28c64425c8f43b492b5] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-jar-plugin:3.1.2:jar (default-jar) on project apache-james-mpt-core: Execution default-jar of goal org.apache.maven.plugins:maven-jar-plugin:3.1.2:jar failed: Unable to load the mojo 'jar' (or one of its required components) from the plugin 'org.apache.maven.plugins:maven-jar-plugin:3.1.2': com.google.inject.ProvisionException: Unable to provision, see the following errors:

test this please

@chibenwa
Copy link
Member Author

chibenwa commented Feb 6, 2020

test this please

@@ -35,7 +31,7 @@

@FunctionalInterface
interface RequireContent {
RequireName content(InputStream stream);
RequireName content(byte[] bytes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pr is named "Remove byte array from attachment " and here we put it back, apparently.
Did I missed something?

Copy link
Member Author

@chibenwa chibenwa Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an attachment stored in the mailbox-api but a result of a parsing.

Different topic! We do not retrieve these bytes on every read!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that means that parsing is no more streamed, while storing is now streamed?

@Arsnael
Copy link
Member

Arsnael commented Feb 7, 2020

[5cc9699085f400f2ee5ef28c64425c8f43b492b5] [ERROR] Errors: 
[5cc9699085f400f2ee5ef28c64425c8f43b492b5] [ERROR]   WebAdminUtilsTest.serverShouldBeAbletoStartConcurrently:38 » NotTerminated

test this please

@chibenwa
Copy link
Member Author

chibenwa commented Feb 7, 2020

test this please

@chibenwa
Copy link
Member Author

chibenwa commented Feb 7, 2020

:green: wooot!

@chibenwa
Copy link
Member Author

Conclusion: using provisionned workload, I notice a p99 GetMessages response time 60% decrease

Before

av

After

ap

@rouazana
Copy link

Conclusion: using provisionned workload, I notice a p99 GetMessages response time 60% decrease

Did you notice the big difference for ListMessages? It's really bad, no? (mean time 40ms -> 169ms)

@chibenwa
Copy link
Member Author

Did you notice the big difference for ListMessages? It's really bad, no? (mean time 40ms -> 169ms)

I have no idea where it comes from to be honnest

@rouazana
Copy link

I have no idea where it comes from to be honnest

That's really annoying to praise a 60% improvment on the p99 while not looking at a 4x performance decrease in median and mean!

Either you try again to reproduce your issue, or it's worthless to do performance testing at this level.

@chibenwa
Copy link
Member Author

Either you try again to reproduce your issue, or it's worthless to do performance testing at this level.

Agreed I need to redo it, that's why I did not remove the label, btw.

@chibenwa chibenwa force-pushed the mob-programming-remove-byte-array-from-attachment branch from 28e9d85 to 974ac25 Compare February 28, 2020 03:43
@chibenwa chibenwa force-pushed the mob-programming-remove-byte-array-from-attachment branch from d111a90 to 147fb9b Compare April 21, 2020 04:21
@chibenwa
Copy link
Member Author

Forced pushed to solve conflict and remove charset commit.

@chibenwa
Copy link
Member Author

#3316 discusses the charset issue fixing (if needed) separatly.

@chibenwa
Copy link
Member Author

I still have a question regarding performance (which is the main point of this PR).
Looking at #3061 (comment) and #3061 (comment) the after results are very similar, but the before results are very different. Why?

Running perf tests on hosted VMs can trivially explain the difference between two runs separated in time (several hours difference between the two runs in the first one).

Do we have a plan for a perf testing platform?

Also, the results of run #2 are coherent with the expectations we can have following the changes performed (p99 improvment of GetMessages only, but important). It can be trivially explained by the fact we no longer load useless attachments.

Thus I have no concerns accepting the second one.

If you still have concerns, is me running systematically perf tests useful, or am I just wasting my time?

@rouazana
Copy link

If you still have concerns, is me running systematically perf tests useful, or am I just wasting my time?

Do you think my goal is to make you wasting time?

I see some (very) strange things, I ask for explanation.

If I look at the issue globally, either this PR is causing some performance troubles of getMessageList in this context, or we have globally a performance issue on current master. I would like that we dig into the issue instead of grumbling about VMs.

*/
Pair<MessageMetaData, Optional<List<MessageAttachmentMetadata>>> appendMessageToStore(Mailbox mailbox, Date internalDate, int size, int bodyStartOctet, SharedInputStream content, Flags flags, PropertyBuilder propertyBuilder, MailboxSession session) throws MailboxException;

/**
* MessageStorer to parsing, storing and returning AttachmentMetadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or remove theto, or don't conjugate the verbs parse, store and return

Copy link

@rouazana rouazana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some details now, it's globally ok for me even if I'm worry about the performace issue

* If supported by the underlying implementation, this method will parse the messageContent to retrieve associated
* attachments and will store them.
*
* Otherwize an empty optional will be returned on the right side of the pair.
*/
Pair<MessageMetaData, Optional<List<MessageAttachmentMetadata>>> appendMessageToStore(Mailbox mailbox, Date internalDate, int size, int bodyStartOctet, SharedInputStream content, Flags flags, PropertyBuilder propertyBuilder, MailboxSession session) throws MailboxException;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's at the implementation level, why not Optional<Pair<MessageMetaData, List<MessageAttachmentMetadata>> ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You always have the message metadata but the underlying implementation might not return you attachment metadata (because it does not support it)

@chibenwa
Copy link
Member Author

chibenwa commented Apr 21, 2020

If I look at the issue globally, either this PR is causing some performance troubles of getMessageList in this context, or we have globally a performance issue on current master. I would like that we dig into the issue instead of grumbling about VMs.

#3266 (comment) adds significant performance gains by moving GetMessageList into reactive style

Before

79181996-e47e3980-7e37-11ea-9503-4f80885acc80

After

79181989-e0521c00-7e37-11ea-839a-a2de5015a67a

Looking at the perf test result not benefiting (amongst others) about this improvment, differences in performance can be explained by the different load (not doing 5 GetMessages)

74133916-2e6e5780-4c1c-11ea-801f-be75b4530727

This seems consistent to me.

And we don't have performance issue on GetMessageList on master as #3266 (comment) proves it.

@chibenwa
Copy link
Member Author

some details now, it's globally ok for me even if I'm worry about the performace issue

Again, it's not exactly the same load than the other (recent) perf tests you have been seeing and the latest test on this lack most of the recent optimization: I'm not worry.

@chibenwa chibenwa added the waiting_merge We are about to merge this! label Apr 21, 2020
@chibenwa
Copy link
Member Author

Merged

@chibenwa chibenwa closed this Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Contributes some performance enhencements waiting_merge We are about to merge this!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants