Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Investigate Encrypted Saved Objects for Synthetic monitors #406

Closed
dominiqueclarke opened this issue Nov 22, 2021 · 8 comments
Closed
Assignees
Labels
Team:Uptime Label for the Uptime team

Comments

@dominiqueclarke
Copy link

Monitor configuration can often include sensitive data, including TLS configuration and authentication information. We should consider ways to secure this via saved objects in Kibana.

We should also consider the performance costs of decrypting these saved objects when syncing the configuration and sending it to the Synthetics Service via task manager.

@dominiqueclarke dominiqueclarke added Team:Uptime Label for the Uptime team v8.1.0 labels Dec 8, 2021
@andrewvc
Copy link
Contributor

We should engage with the Kibana security team for review of our approach here, we can also speak with the Fleet team since there is prior art here.

@paulb-elastic
Copy link

The source for the script should also be encrypted, in case it contains anything sensitive (e.g. webstie credentials etc.)

@dominiqueclarke dominiqueclarke self-assigned this Feb 7, 2022
@dominiqueclarke
Copy link
Author

Tested 500 regular monitors vs 500 encrypted monitors for load on Kibana.

I used HTTP monitors for these tests. HTTP monitors contain the most amount of fields that need potential encryption, including things like request headers, response body checks, TLS settings, and more.

Control (Simple)
Screen Shot 2022-02-18 at 4 10 48 PM
Screen Shot 2022-02-18 at 4 10 54 PM
Screen Shot 2022-02-18 at 4 11 07 PM

Test (Encrypted)
Screen Shot 2022-02-18 at 4 11 33 PM
Screen Shot 2022-02-18 at 4 07 25 PM
Screen Shot 2022-02-18 at 4 11 49 PM

As you can see, CPU utilization raises every 5 when our sync task is executed in both tests, but CPU usage spikes significantly for encrypted saved objects. In addition to the significant CPU increase, encrypted saved objects also take a long time to decrypt. When scaling to 500 monitors, the decryption step by itself can take over 1m.

Screen Shot 2022-02-18 at 4 49 55 PM

Screen Shot 2022-02-18 at 4 33 22 PM

Due to the time it takes to decypt encrypted monitors, we should remove our 1m timeout. The default is 5m, but we could also consider making this configurable for users who need to scale monitor management.

For the sync task, we may want to consider some type of pagination solution, perhaps scheduling staggered sync tasks in chunks of, for example, 100 monitors to limit the load on Kibana CPU at any given time.

@jportner
Copy link

Here's a link to a Kibana PR where we've done some performance testing of encrypted saved objects: elastic/kibana#72420 (comment)

@dominiqueclarke
Copy link
Author

One thing to note is that I did include a rather large list of fields to encrypt in my POC, upwards of 30, to be safe. We could likely see performance improvements if we determined only a smaller subset of fields actually needed to be encrypted. https://github.com/elastic/kibana/pull/125168/files#diff-0e2159135205dee5b6cef3d83eebf4767f94b3a664d68887a2489c77d7ad6574R39

@jportner
Copy link

jportner commented Mar 2, 2022

We spoke offline in Slack, @azasypkin clarified that our benchmark indicates ~30ms to decrypt each saved object (with significant HTTP overhead), and he suggested that a best practice is to encrypt all saved object attributes in a single field to reduce overhead. Dominique indicated she would try that next week and get back to us for next steps.

In the meantime, it's clear we are lacking proper guidance on this topic, so I opened elastic/kibana#126696 to address that.

@dominiqueclarke
Copy link
Author

@jportner @azasypkin Good news, adding all fields under a single payload has taken the average to decrypt 500 synthetics monitors down to approximately 3500 milliseconds on average. I haven't calculated load on Kibana CPU yet, but you can already plainly see that the issue is all but resolved.

@dominiqueclarke
Copy link
Author

Closed by implementation elastic/kibana#127158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Uptime Label for the Uptime team
Projects
None yet
Development

No branches or pull requests

4 participants