Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text search (with MySQL) is too slow #14569

Open
jonepet opened this issue May 14, 2020 · 4 comments
Open

Text search (with MySQL) is too slow #14569

jonepet opened this issue May 14, 2020 · 4 comments
Labels
Bug Report/Open Bug report/issue

Comments

@jonepet
Copy link

jonepet commented May 14, 2020

Summary

After importing about 600 000 messages from a Hipchat instance a simple word search is timing out, with database query running for 80+ seconds. This is way less than the expected 2,5M messages before migrating to Elasticsearch is recommended in the docs.

Steps to reproduce

Import a large number of chat history into a new instance using MySQL (5.7)
Try to run a simple word search

Expected behavior

Search results should be returned in a reasonable time

Observed behavior (that appears unintentional)

No search results are returned (times out after 30 seconds), database query still runs for 80+ seconds with 100% CPU

Possible fixes

Query run when searching runs for 80 seconds as-is, but if removing the replycount part it returns in less than 80ms,

`
SELECT
                                * ,(SELECT COUNT(Posts.Id) FROM Posts WHERE q2.RootId = '' AND Posts.RootId = q2.Id AND Posts.DeleteAt = 0) as ReplyCount
                        FROM
                                Posts q2
                        WHERE
                                DeleteAt = 0
                                AND Type NOT LIKE 'system_%'

                                AND ChannelId IN (
                                        SELECT
                                                Id
                                        FROM
                                                Channels,
                                                ChannelMembers
                                        WHERE
                                                Id = ChannelId
                                                        AND (TeamId = 'btbg9cfyd7b7pmrzy9ebx3j34o' OR TeamId = '')
                                                        AND UserId = 'ugzns7ixr3gdixmk7kar79nzny'
                                                        AND DeleteAt = 0

                                                        )

                                AND MATCH (Message) AGAINST ('+test' IN BOOLEAN MODE)
                                ORDER BY CreateAt DESC
                        LIMIT 100;
`
returns in 80+ seconds

Remove reply count part:

`SELECT
                                *
                        FROM
                                Posts q2
                        WHERE
                                DeleteAt = 0
                                AND Type NOT LIKE 'system_%'

                                AND ChannelId IN (
                                        SELECT
                                                Id
                                        FROM
                                                Channels,
                                                ChannelMembers
                                        WHERE
                                                Id = ChannelId
                                                        AND (TeamId = 'btbg9cfyd7b7pmrzy9ebx3j34o' OR TeamId = '')
                                                        AND UserId = 'ugzns7ixr3gdixmk7kar79nzny'
                                                        AND DeleteAt = 0

                                                        )

                                AND MATCH (Message) AGAINST ('+test' IN BOOLEAN MODE)
                                ORDER BY CreateAt DESC
                        LIMIT 100;`
and the query returns in <100ms, which means the query is not slow on the full text index search, but the searching for replies.

Possible fix: Add database index covering both "RootId" and "DeleteAt" on "Posts" table and the original query returns in <100ms

@agnivade
Copy link
Member

Thank you for filing the issue @jonepet. I have created https://mattermost.atlassian.net/browse/MM-25239 to track this. Feel free to subscribe to the ticket to be up to date with its progress.

@agnivade agnivade added the Bug Report/Open Bug report/issue label May 15, 2020
@amarjeetanandsingh
Copy link
Contributor

Hi @agnivade

If we have enough data in the MySQL docker (that we start in local for development) to reproduce this issue, I would like to investigate this issue if allowed.

Or maybe we can have access to some remote db with enough data?

@agnivade
Copy link
Member

Sure, please feel free to investigate and send a PR. Thanks a lot!

@amarjeetanandsingh
Copy link
Contributor

amarjeetanandsingh commented Jun 26, 2020

Hi @agnivade Could you please help me find this query in the codebase?

Also, could you please help me setup enough test data to test the execution time of this query?
Because select count(*) from mattermost_test.Posts; returns 0 , in the mysql database that we start with make run-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Report/Open Bug report/issue
Projects
None yet
Development

No branches or pull requests

3 participants