Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prometheus] Add new prometheus metrics and metrics endpoint #827

Merged
merged 1 commit into from May 8, 2018

Conversation

Projects
None yet
3 participants
@dolbik
Copy link
Contributor

dolbik commented Apr 26, 2018

Signed-off-by: Dmitry Dolbik dolbik@gmail.com

It is draft implementation to support Prometheus in Hydra (#669). Prometheus collects all metrics and more then current telemetry does. Current telemetry and prometheus works together in this pull request
Added two new Prometheus metrics:

  • Secure token service requests served per endpoint
  • Secure token service response time per endpoint

The result of prometheus:
go_gc_duration_seconds{quantile="0"} 0 go_gc_duration_seconds{quantile="0.25"} 0 go_gc_duration_seconds{quantile="0.5"} 0 go_gc_duration_seconds{quantile="0.75"} 0 go_gc_duration_seconds{quantile="1"} 0 go_gc_duration_seconds_sum 0 go_gc_duration_seconds_count 0 go_goroutines 16 go_memstats_alloc_bytes 3.479416e+06 go_memstats_alloc_bytes_total 3.479416e+06 go_memstats_buck_hash_sys_bytes 1.445704e+06 go_memstats_frees_total 1375 go_memstats_gc_sys_bytes 268288 go_memstats_heap_alloc_bytes 3.479416e+06 go_memstats_heap_idle_bytes 679936 go_memstats_heap_inuse_bytes 4.857856e+06 go_memstats_heap_objects 19294 go_memstats_heap_released_bytes_total 0 go_memstats_heap_sys_bytes 5.537792e+06 go_memstats_last_gc_time_seconds 0 go_memstats_lookups_total 12 go_memstats_mallocs_total 20669 go_memstats_mcache_inuse_bytes 13888 go_memstats_mcache_sys_bytes 16384 go_memstats_mspan_inuse_bytes 59280 go_memstats_mspan_sys_bytes 65536 go_memstats_next_gc_bytes 4.473924e+06 go_memstats_other_sys_bytes 1.028528e+06 go_memstats_stack_inuse_bytes 753664 go_memstats_stack_sys_bytes 753664 go_memstats_sys_bytes 9.115896e+06 sts_requests_total{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master"} 1 sts_requests_total{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="0.005"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="0.01"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="0.025"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="0.05"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="0.1"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="0.25"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="0.5"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="1"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="2.5"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="5"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="10"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master",le="+Inf"} 1 sts_response_time_seconds_sum{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master"} 0.001876666 sts_response_time_seconds_count{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/metrics",hash="undefined",version="dev-master"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="0.005"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="0.01"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="0.025"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="0.05"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="0.1"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="0.25"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="0.5"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="1"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="2.5"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="5"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="10"} 1 sts_response_time_seconds_bucket{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master",le="+Inf"} 1 sts_response_time_seconds_sum{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master"} 0.000721316 sts_response_time_seconds_count{buildTime="2018-04-26 06:44:37.592314 +0000 UTC",endpoint="/health/status",hash="undefined",version="dev-master"} 1

@dolbik dolbik force-pushed the dolbik:Prometheus branch from 6e4e9b8 to d667348 Apr 26, 2018

@aeneasr
Copy link
Member

aeneasr left a comment

Thank you for the proposal. I think there's still some way to go, also because I'm quite unfamiliar with how Prometheus works. Can you recommend a crash course for me, so I can better judge the implementation here?

)

const (
HealthStatusPath = "/health/status"
MetricsStatusPath = "/health/metrics"

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

This should probably be renamed to prometheus

@@ -100,10 +100,14 @@ func NewMetricsManager(issuerURL string, databaseURL string, l logrus.FieldLogge
salt: uuid.New(),
BuildTime: buildTime, BuildVersion: version, BuildHash: hash,
}

go mm.RegisterSegment()

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

This is a side effect of NewMetricsHandler which might introduce some issues during testing. Not sure if I like this here.

This comment has been minimized.

@dolbik

dolbik Apr 26, 2018

Author Contributor

I'll back it to old place as we decide to separate Prometheus and telemetry flag

@@ -106,9 +106,7 @@ func RunHost(c *config.Config) func(cmd *cobra.Command, args []string) {

if ok, _ := cmd.Flags().GetBool("disable-telemetry"); !ok && os.Getenv("DISABLE_TELEMETRY") != "1" {

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

Not sure if we should disable prometheus if this is true. The idea of this is to allow people to disable sending reports to our servers. Prometheus runs locally and doesn't send anything to us

This comment has been minimized.

@aeneasr

This comment has been minimized.

@dolbik

dolbik May 2, 2018

Author Contributor

@arekkas I left this part practically unchanged. Server reports stay as is. Prometheus middleware is always enabled https://github.com/ory/hydra/pull/827/files#diff-34e2a4fce852fbba1d689b617c6d4a94R114

This comment has been minimized.

@aeneasr

aeneasr May 2, 2018

Member

Oh Yeah! Sorry, didn't see that - perfect :)

},
[]string{"endpoint"},
),
ResponseTime: prometheus.NewHistogramVec(

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

How do we decide which prometheus statistics to use here? Is sts_response_time_seconds standardized somewhere, or chosen arbitrarily?

This comment has been minimized.

@dolbik

dolbik Apr 26, 2018

Author Contributor

https://prometheus.io/docs/concepts/metric_types/

HistogramVec is used for request durations for example.

Name of the counter should be like https://prometheus.io/docs/practices/naming/ and we can change it accoring rules in doc

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

Name of the counter should be like https://prometheus.io/docs/practices/naming/ and we can change it accoring rules in doc

Damn, that's what I feared. Naming is subject to the environment you're in, so hardcoding this is not a very good idea. Another issue is what metrics we collect, like do or do we not want to collect CPU metrics? But what about the thousands of other possible metrics?

I feel like implementing prometheus is like implementing proper analytics/tracking - the data being sent is really dependent on what you want to learn.

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

Just as an example, you want to name this secure_token_service. I'd disagree and say to use the name of the product. Someone else might want to have here oauth2_server or whatever. It's really hard to define something that is so dependent on the context it's being used in.

This comment has been minimized.

@dolbik

dolbik Apr 26, 2018

Author Contributor

I agree with you. Some configurable value can be added (for example CLUSTER_NAME, default hydra) for naming. In this case metrics name will be CLUSTER_NAME_response_time_seconds for example

This comment has been minimized.

@dolbik

dolbik Apr 26, 2018

Author Contributor

By default prometheus client gathers information about:

  • Number of goroutines that currently exist.
  • A summary of the GC invocation durations
  • Number of bytes allocated and still in use
  • Total number of bytes allocated, even if freed
  • Number of bytes obtained by system. Sum of all system allocations
  • Total number of pointer lookups
  • Total number of mallocs
  • Total number of frees
  • Number of heap bytes allocated and still in use
  • Number of heap bytes obtained from system
  • Number of heap bytes waiting to be used
  • Number of heap bytes that are in use
  • Total number of heap bytes released to OS
  • Number of allocated objects
  • Number of bytes in use by the stack allocator
  • Number of bytes obtained from system for stack allocator
  • Number of bytes in use by mspan structures
  • Number of bytes used for mspan structures obtained from system
  • Number of bytes in use by mcache structures
  • Number of bytes used for mcache structures obtained from system
  • Number of bytes used by the profiling bucket hash table
  • Number of bytes used for garbage collection system metadata
  • Number of bytes used for other system allocations
  • Number of heap bytes when next garbage collection will take place
  • Number of seconds since 1970 of last garbage collection
  • Total user and system CPU time spent in seconds
  • Number of open file descriptors
  • Maximum number of open file descriptors
  • Virtual memory size in bytes
  • Resident memory size in bytes
  • Start time of the process since unix epoch in seconds

All other metrics should be added manually. Thats why i added 2 metrics that may be needed for users

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

Ah I see - maybe a good idea then would be to just send the default metricsf or now?

This comment has been minimized.

@dolbik

dolbik Apr 26, 2018

Author Contributor

In this case new middleware does not make sense. New endpoint will be added into "health".

Should i leave middleware as skeleton for future or it can be deleted?

This comment has been minimized.

@dolbik

dolbik Apr 27, 2018

Author Contributor

I propose to leave a empty middleware for prometheus with examples. It is given ability to simple add custom metrics if needed

This comment has been minimized.

@aeneasr

aeneasr Apr 27, 2018

Member

Yeah I think that makes sense - we could - in the future - also have the ability to load a plugin for handling prometheus middleware, which would make this a nobrainer to use.

}

func NewMetricsHandler(l logrus.FieldLogger, version, hash, buildTime string) *PrometheusHandler {
l.Info("Setting up Prometheus metrics")

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

Factories should try and avoid side effects such as logging

}

func (pmm *PrometheusHandler) ServeHTTP(rw http.ResponseWriter, r *http.Request, next http.HandlerFunc) {
defer func(start time.Time) {

This comment has been minimized.

@aeneasr

aeneasr Apr 26, 2018

Member

Not sure we need a defer here? Just put it behind next?

@dolbik

This comment has been minimized.

Copy link
Contributor Author

dolbik commented Apr 26, 2018

@dolbik dolbik force-pushed the dolbik:Prometheus branch 3 times, most recently from 888fd6b to 7c2cc77 Apr 26, 2018

"buildTime": buildTime,
},
}
return pm

This comment has been minimized.

@dolbik

dolbik Apr 27, 2018

Author Contributor

I propose to leave a empty middleware for prometheus with examples. It is given ability to simple add custom metrics if needed

This comment has been minimized.

@aeneasr

aeneasr Apr 27, 2018

Member

Yeah, that makes sense!

@dolbik dolbik force-pushed the dolbik:Prometheus branch from a4ae33d to 3bf6a11 Apr 27, 2018

[Prometheus] Add new prometheus metrics and metrics endpoint
Signed-off-by: Dmitry Dolbik <dolbik@gmail.com>

@dolbik dolbik force-pushed the dolbik:Prometheus branch from 3bf6a11 to 80c3904 Apr 27, 2018

@dolbik

This comment has been minimized.

Copy link
Contributor Author

dolbik commented Apr 27, 2018

@arekkas Looks like a have finished code according discussion.

@dolbik

This comment has been minimized.

Copy link
Contributor Author

dolbik commented May 2, 2018

@arekkas Are any actions needed from my side?

@aeneasr

This comment has been minimized.

Copy link
Member

aeneasr commented May 2, 2018

No, all good, I'm just extremely busy with #836 right now, I'll merge it soon

@aeneasr aeneasr merged commit ef94f98 into ory:master May 8, 2018

4 checks passed

ci/circleci: build-docker Your tests passed on CircleCI!
Details
ci/circleci: format Your tests passed on CircleCI!
Details
ci/circleci: swagger Your tests passed on CircleCI!
Details
ci/circleci: test Your tests passed on CircleCI!
Details
@aeneasr

This comment has been minimized.

Copy link
Member

aeneasr commented May 8, 2018

Thank you for your contribution!

mgalagan-sugarcrm pushed a commit to sugarcrm/hydra that referenced this pull request May 14, 2018

IDM-410: Prometheus Metrics for the Secure token Service (#2)
* health: Adds new prometheus metrics and metrics endpoint (ory#827)

Signed-off-by: Dmitry Dolbik <dolbik@gmail.com>

* IDM-410 Prometheus Metrics for the Secure token Service

mgloystein pushed a commit to spotxchange/hydra that referenced this pull request May 14, 2018

Michael Gloystein
Add Prometheus endpoint
Based on ory#827

JIRA: WAP-1891

mgloystein added a commit to spotxchange/hydra that referenced this pull request May 14, 2018

Prometheous (#7)
* OData no longer defaulted, using header to cause it

* Simpler value

* Add Prometheus endpoint

Based on ory#827

JIRA: WAP-1891

@aeneasr aeneasr referenced this pull request May 24, 2018

Closed

Prometheus metrics #669

@aeneasr

This comment has been minimized.

Copy link
Member

aeneasr commented Nov 18, 2018

@dolbik would you mind sharing how you set up prometheus with hydra? What would be excellent, if you could create an docker example here and create a PR for it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.