-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for hystrix dashboard support feature (issue #1244) #523
Conversation
docs/root/operations/admin.rst
Outdated
|
||
.. http:get:: /hystrix_event_stream | ||
|
||
Start streaming :ref:`event stream<https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events>` to hystrix dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/hystrix/Hystrix/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your comments. Will update everything with new commit, also DCO was missing.. In any case, this depends on envoyproxy/envoy#2736 which currently is blocked by failing checks (taking care of this as well)
docs/root/operations/admin.rst
Outdated
|
||
The dashboard should set the stream source to this admin endpoint. | ||
|
||
Note on usage of Hystrix dashboard to visualize Envoy statistics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify this looks as expected in the doc output? You will need to fix the docs build.
docs/root/operations/admin.rst
Outdated
|
||
.. http:get:: /hystrix_event_stream | ||
|
||
Start streaming :ref:`event stream<https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events>` to hystrix dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use external RST link syntax rather than internal here, this is what is breaking docs build.
docs/root/operations/admin.rst
Outdated
The dashboard should set the stream source to this admin endpoint. | ||
|
||
Note on usage of Hystrix dashboard to visualize Envoy statistics: | ||
Detalied description on Hystrix and its dashboard can be found :ref:\here<https://github.com/Netflix-Skunkworks/hystrix-dashboard>`. Not all the data presented in the Hystrix dashboard is relevant in Envoy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point on RST link syntax as above here.
docs/root/operations/admin.rst
Outdated
* "Bad request" does not exist in Envoy, and therefore is set to '0'. | ||
* Requests rejected as a result of exceeding max number of connections or queue size (in Envoy, referred as "short circuited") are presented as "Rejected", which better suits Hystrix's terminology. | ||
* Latency information is currently not availbale. | ||
* Window size is set to 10sec, with 10 buckets of 1sec each. Currently, the value is not configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: prefer " ", e.g. 10 seconds
, rather than 10sec
.
docs/root/operations/admin.rst
Outdated
* Hystrix's "Short circuit" is similar to Envoy's outlier detection, in that it is triggered by error responses. In Envoy the result of high error rate is the host being ejected from the load balancer pool, but it does not cause rejected requests, and therefore we set "Short circuited" to '0', and circuit breaker is set to "Forced Closed". | ||
* "Bad request" does not exist in Envoy, and therefore is set to '0'. | ||
* Requests rejected as a result of exceeding max number of connections or queue size (in Envoy, referred as "short circuited") are presented as "Rejected", which better suits Hystrix's terminology. | ||
* Latency information is currently not availbale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/availbale/available/. Please spell check.
7cde934
to
5ec6113
Compare
Signed-off-by: talis <talis@il.ibm.com>
5ec6113
to
46c7334
Compare
Signed-off-by: trabetti <talis@il.ibm.com>
a0ea55a
to
d2dc845
Compare
Signed-off-by: trabetti <talis@il.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, couple more comments and we can merge.
docs/root/operations/admin.rst
Outdated
error responses. In Envoy the result of high error rate is the host being ejected from the load | ||
balancer pool, but it does not cause rejected requests, and therefore we set "Short circuited" | ||
to '0', and circuit breaker is set to "Forced Closed". | ||
* "Bad request" does not exist in Envoy, and therefore is set to '0'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer avoiding quoting numbers in these docs, so I'd just write set to 0.
.
docs/root/operations/admin.rst
Outdated
Not all the data presented in the Hystrix dashboard is relevant in Envoy. | ||
|
||
* Success, Failure, Timeout, Error rate, Number of host, are shown in the dashboard. | ||
* Hystrix's "Short circuit" is similar to Envoy's outlier detection, in that it is triggered by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we do *Short circuit*
rather than quoting in Envoy docs to call out distinguished terms, but that's mostly done for headers. I don't feel that strongly, just pointing it out.
Signed-off-by: trabetti <talis@il.ibm.com>
In offline discussion with @mrice32, once thing that came up is that we should probably think a bit about the security model here. The admin port is not secured today, e.g. it doesn't do auth or serve up via SSL. This is probably not ideal for general operational deployment if this is relied upon for stats serving. It's not ideal to add features to Envoy unless we have a sensible way for users to deploy them in a network topology that can be secured. This actually reminds me a lot of earlier conversations on how we handled Prometheus, see #210 and #220. CCing some of the folks who were involved there: @mattklein123 @douglas-reid @ramaraochavali |
c807743
to
e716087
Compare
Also note that currently, the hystrix dashboard is only intended to be run in trusted network, or behind authentication and authorization. https://github.com/Netflix-Skunkworks/hystrix-dashboard#security |
@trabetti Interesting. I think there's multiple security concerns here; how do we lock down the Envoy --> Hystrix path and how do we lock down Hystrix --> user, I think that link is talking only about the latter. It's worth just taking a quick poll to see if there are any thoughts from folks CCed on where we want to head with securing the non-data plane incoming ports on Envoy for pull based monitoring. FYI, this hasn't been an issue to date as all stats (other than the manual /stats on admin) is push based, so could make use of outgoing connections secured by normal gRPC SSL. |
@htuch I would suggest we defer discussion of admin security until we open the issue on adding admin security (from the security audit). I think so far there is a general assumption that there is no security on the admin endpoint and that it is accessed over a fully trusted network. That may not always be the case but I think that's a substantial design/development effort which we should track separately? |
@mattklein123 I have a concern, that since endpoints are part of a stable API that we don't want to break, we want to be more deliberate about which endpoints we open up to external systems that will depends on them (CC: @wora). We are now seeing a multiplication of systems depending on the admin console, see DataDog/integrations-core#1156 as well. Do we really want the admin console to be the point of integration for services that access Envoy via a pull model? Or should we have a gRPC serving entity that does mTLS for auth and SSL? The admin console has multiple roles conflated today, e.g. should the same person who can access stats also be able to |
Should we add an enable to this feature from config file, and set it to disabled as default? |
@htuch I think i'm really not that worried about admin security and back-compat because I think it's very straightforward to add-in optional security later by basically promoting the admin listener to a full listener with filters. This would mean it can support mTLS, a route table with inline RBAC filter (when we implement it soon), etc. We could also eventually allow splitting admin endpoints across multiple listeners/ports, etc. Given this I really don't think we need to worry about this right now and should just assume that admin is currently not secured. |
@mattklein123 I've opened envoyproxy/envoy#2763 to start the discussion on admin endpoint security futures. Can you post the above points there? With this in mind, I don't have any objections to merging this and the Hystrix endpoint. |
Sure will do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can wait for envoyproxy/envoy#2736 to merge before merging this in case there are any additional changes needed.
@trabetti this is not going to merge before we move data-plane-api over to envoy this weekend so I'm going to go ahead and close this. Please add these docs to your Envoy side PR once the merge happens. Thank you! |
Documentation for PR: envoyproxy/envoy#2736