-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize queries and indexes on posts table #13217
Conversation
Added RootId column to two indexes. Query in getParentPosts split into two queries - for big channels can be ~1000x times faster in MySQL, but a bit slower in PostgreSQL. Rewritten query in GetPostsSince for MySQL - around 20% faster. Benchmark results (/1 - 140k posts in channel, /2 - 800 posts): benchmark old ns/op new ns/op delta BenchmarkPosts/postgres/GetFlaggedPostsForTeam/1-16 12068006 12130615 +0.52% BenchmarkPosts/postgres/GetFlaggedPostsForChannel/1-16 7334992 7388359 +0.73% BenchmarkPosts/postgres/GetPosts(skipThreads=true)/1-16 1845547 1979362 +7.25% BenchmarkPosts/postgres/GetPosts(skipThreads=false)/1-16 2260061 2595112 +14.82% BenchmarkPosts/postgres/GetPostsSince(skipThreads=true)/1-16 38510212 40625368 +5.49% BenchmarkPosts/postgres/GetPostsSince(skipThreads=false)/1-16 32389821 32581044 +0.59% BenchmarkPosts/postgres/GetFlaggedPostsForTeam/2-16 1604215 1584941 -1.20% BenchmarkPosts/postgres/GetFlaggedPostsForChannel/2-16 1278623 1277473 -0.09% BenchmarkPosts/postgres/GetPosts(skipThreads=true)/2-16 1921049 1984581 +3.31% BenchmarkPosts/postgres/GetPosts(skipThreads=false)/2-16 3478147 3000086 -13.74% BenchmarkPosts/postgres/GetPostsSince(skipThreads=true)/2-16 4813332 5198276 +8.00% BenchmarkPosts/postgres/GetPostsSince(skipThreads=false)/2-16 3475847 3816201 +9.79% BenchmarkPosts/mysql/GetFlaggedPostsForTeam/1-16 9674132 9708361 +0.35% BenchmarkPosts/mysql/GetFlaggedPostsForChannel/1-16 5780763 5818874 +0.66% BenchmarkPosts/mysql/GetPosts(skipThreads=true)/1-16 2261194 2268826 +0.34% BenchmarkPosts/mysql/GetPosts(skipThreads=false)/1-16 2371804023 3184120 -99.87% BenchmarkPosts/mysql/GetPostsSince(skipThreads=true)/1-16 35552813 27709811 -22.06% BenchmarkPosts/mysql/GetPostsSince(skipThreads=false)/1-16 28758400 22622865 -21.33% BenchmarkPosts/mysql/GetFlaggedPostsForTeam/2-16 1174064 1205933 +2.71% BenchmarkPosts/mysql/GetFlaggedPostsForChannel/2-16 1007026 1091551 +8.39% BenchmarkPosts/mysql/GetPosts(skipThreads=true)/2-16 2274397 2408730 +5.91% BenchmarkPosts/mysql/GetPosts(skipThreads=false)/2-16 7454395 2542741 -65.89% BenchmarkPosts/mysql/GetPostsSince(skipThreads=true)/2-16 8879200 6843435 -22.93% BenchmarkPosts/mysql/GetPostsSince(skipThreads=false)/2-16 6293932 5373276 -14.63%
This issue has been automatically labelled "stale" because it hasn't had recent activity. /cc @jasonblais @hanzei |
Sorry for the delay @Pomyk. I'm queuing your PR for dev review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pomyk - looks great! Can you sync with master please?
@lieut-data - what do you think?
Synced. I wanted to check one more edge case - threads with a lot of responses. |
With a big thread it's still possible for MySQL to choose a slow plan with index merge, but it's a rather small edge case. I have also found some other small optimizations for posts but will make another PR for that. |
@Pomyk / @reflog, if needed, I can provide a more holistic review, but for now let me make sure to surface the history of optimizing this particular function: https://community.mattermost.com/core/pl/ezfwp364ct8bbewbji3ikpusrr and then https://community.mattermost.com/core/pl/n8bfq7wq77rkff8mgrrstqeuye. TLDR; customers that eschew threads altogether ran into severe performance issues from my original changes. Not saying that will happen here, but just wanted to call out the context. |
@lieut-data I've read that, it was one of the reasons I looked at that function. The proposed solution should work very well in case of small amount of threads as it loads root ids separately. The main downside is 5-20% slower execution on postgresql due to two queries instead of one. |
If that's the only problem we could probably leverage |
because new version using two queries is around 15% slower.
Restored old version of getParentsPosts for postgres. Updated benchmark results:
|
Overall looks great, thanks @Pomyk 🎉 , just a couple of suggestions. |
Test server destroyed |
@grundleborg pointed out that changing indexes on Posts table can take a long time for some clients, so it might not be a good idea to do it here. Both postgresql and mysql have the ability to add indexes concurrently but it's not used in mattermost. |
Test server destroyed |
Hello @Pomyk - sorry to jump in here. I just have a small request: could you please post the code used for the benchmark in the PR commit message ? So that it remains there for posterity. Thanks ! |
@agnivade I have the code here: https://github.com/Pomyk/mattermost-server/commits/store_benchmarks |
Thanks ! I was wondering how do we include that code as part of this PR. I would think it would be okay to include micro-benchmarks like these in the codebase itself so that future improvements can be easily checked. It can be done in a future PR ofcourse. Doesn't need to block this one. But this change has some numbers included in it, and I'm afraid if we lose track of how to reliably get those numbers again, somebody in future might have a hard time. @streamer45 @lieut-data - thoughts ? |
Here's a Jira ticket to review and test the changes for this PR. Thanks @Pomyk for the contribution! Note that the review may not occur until a few weeks from now |
Can you help with test steps / information for this PR please? |
@lindy65 I don't think it's QA testable, it's a performance change that will be tested during load-testing |
Agreed. From a QA point of view I would probably only test that fetching posts/threads is working as expected since we are touching that part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @reflog @streamer45 - passing this from QA's perspective then 👍
Thanks @Pomyk ! |
Summary
Added RootId column to two indexes. Query in getParentPosts split into
two queries - for big channels can be ~1000x times faster in MySQL, but
a bit slower in PostgreSQL.
Rewritten query in GetPostsSince for MySQL - around 20% faster.
Benchmark results (/1 - 140k posts in channel, /2 - 800 posts):