Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-52867] feat: add advisor warn log for Elasticsearch #23729

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

fmartingr
Copy link
Contributor

Summary

This commit addresses the addition of a new warning log line for server admins that advise the usage of Elasticsearch when the post count is too big and Elastic is not enabled.

Notes

This is merely a draft, requesting some feedback since this is my first contribution to the suite:

  • Made a single method for the logic since the codebase is pretty big and I wasn't sure where this should land. Suggestions appreciated.
  • Initially I used app.GetAnalytics(), but that performs more queries that I needed and I didn't wan't to modify that method (at least not without confirming that I should). Maybe we could cache the analytics?
  • This will add two COUNT queries daily. It shouldn't be much in terms of performance, and since the requirements are pretty specific it shouldn't pose an issue. If you have any concerns please let me know.
  • I tried to replicate the same conditions used in the webapp. Please let me know if that's appropriate or if I should get the data from somewhere else.

Ticket Link

https://mattermost.atlassian.net/browse/MM-52867

Release Note

feat: advise admins of elasticsearch usage in console periodically

@fmartingr fmartingr added the 2: Dev Review Requires review by a developer label Jun 13, 2023
@fmartingr fmartingr self-assigned this Jun 13, 2023
@mm-cloud-bot mm-cloud-bot added kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 13, 2023
Copy link
Member

@isacikgoz isacikgoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a great idea, I'd rather use a simple job mechanism like we do in various jobs. Eg:

func runSessionCleanupJob(s *Server) {

As a future improvement, we can use the admin notices instead. WDYT?

@fmartingr
Copy link
Contributor Author

Overall this is a great idea, I'd rather use a simple job mechanism like we do in various jobs. Eg: [...] As a future improvement, we can use the admin notices instead. WDYT?

Thanks for the input @isacikgoz, I will make the changes to fit the job system we have in place. I'll also take a look at notices while I'm on it.

Copy link
Member

@agarciamontoro agarciamontoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!

return
}

if postCount > 2000000 && userCount > 500 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why also the user count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server/channels/app/server.go Outdated Show resolved Hide resolved
Copy link
Member

@isacikgoz isacikgoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a bit of changes needed :)

server/channels/app/server.go Outdated Show resolved Hide resolved
server/channels/app/server.go Outdated Show resolved Hide resolved
server/channels/app/server.go Outdated Show resolved Hide resolved
fmartingr and others added 3 commits June 21, 2023 15:00
Co-authored-by: Ibrahim Serdar Acikgoz <serdaracikgoz86@gmail.com>
@fmartingr
Copy link
Contributor Author

Thanks for the reviews, addressed the above comments.

Copy link
Member

@agnivade agnivade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wanted to understand what the end goal here is. Nobody reads warning logs. It's not actionable at all.

A better way might be to have a big red banner in the system console asking admins to use elasticsearch.

Comment on lines 1246 to 1253
model.CreateRecurringTaskFromNextIntervalTime("Elasticsearch workspace optimization check (startup)", func() {
doElasticsearchWorkspaceOptimizationCheck(s)
}, time.Minute*10)

// Schedule recurring job
model.CreateRecurringTask("Elasticsearch workspace optimization check", func() {
doElasticsearchWorkspaceOptimizationCheck(s)
}, time.Hour*24)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is the right thing? This sets up the same job in two different frequencies, once every 10 mins, once every 24 hours. Any reason we are doing it this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, I though it worked in a different way. Let me take a look.

@agnivade
Copy link
Member

It shouldn't be much in terms of performance, and since the requirements are pretty specific it shouldn't pose an issue.

It does have an impact in performance. A full post count can take around 10s on large installations. That consumes a lot of DB CPU. It can be noticeable for large customers.

@fmartingr
Copy link
Contributor Author

fmartingr commented Jun 22, 2023

It does have an impact in performance. A full post count can take around 10s on large installations. That consumes a lot of DB CPU. It can be noticeable for large customers.

Yeah, what I meant is that this only happens if the customer has specific conditions (2m+ posts & 500+ users) and does not have elasticsearch enabled, so it's not like that this is a performance hit for everybody, and besides that, is once a day.

I also wanted to understand what the end goal here is. Nobody reads warning logs. It's not actionable at all.

The main idea here is that since this is a WARN level, the customers alert systems will trigger an alert upon seeing this every day.

@agnivade
Copy link
Member

Yeah, what I meant is that this only happens if the customer has specific conditions (2m+ posts & 500+ users) and does not have elasticsearch enabled, so it's not like that this is a performance hit for everybody, and besides that, is once a day.

Right. The thing is that if it's a small customer, then it's useless to them but we are running a query anyways. And if it's a large customer, it's a performance hit.

The main idea here is that since this is a WARN level, the customers alert systems will trigger an alert upon seeing this every day.

I want to confirm that we have data points on this and have confirmed this from multiple large customers that all of them have alert systems configured on WARN levels. AFAI am aware, large customers have alerts on system level health checks like CPU, mem etc. And they process their logs for compliance. I am not aware of any alerts set up from logs, but I might be wrong on that. Just wanted to confirm we aren't going ahead on assumptions.

I have two recommendations:

  1. Use the statistics table to get an estimate of the post count. It won't be super accurate but it's accurate to the right order of magnitude.
  2. Avoid this daily check, and move this to be a banner to be shown in system console when an admin logs in.

This way we avoid the perf hit and make this more visible to admins.

@fmartingr
Copy link
Contributor Author

@agnivade Valid points. Let me discuss your feedback with the team.

@fmartingr fmartingr marked this pull request as draft June 23, 2023 10:29
@isacikgoz isacikgoz removed their request for review September 11, 2023 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2: Dev Review Requires review by a developer kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants