Export stats about response time #38

Merged
merged 6 commits into from Aug 22, 2017

Conversation

Projects
None yet
@vitorarins
Contributor

vitorarins commented Oct 18, 2015

As of version 1.5.10 HAProxy has an averate response time that can be exported.
Fixes #37.
@brian-brazil @juliusv

@trompx

This comment has been minimized.

Show comment
Hide comment
@trompx

trompx Oct 19, 2015

Great, thanks for the PR, I was just about to try to do the same thing!

trompx commented Oct 19, 2015

Great, thanks for the PR, I was just about to try to do the same thing!

@vitorarins

This comment has been minimized.

Show comment
Hide comment
@vitorarins

vitorarins Oct 19, 2015

Contributor

@trompx: you should see the comments on #37 about this. It looks like the statistics provided by HAProxy on response time doesn't necessarily represents the real value.

Contributor

vitorarins commented Oct 19, 2015

@trompx: you should see the comments on #37 about this. It looks like the statistics provided by HAProxy on response time doesn't necessarily represents the real value.

@lalarsson

This comment has been minimized.

Show comment
Hide comment
@lalarsson

lalarsson Jun 14, 2016

@vitorarins @trompx When can this PR be merged?

@vitorarins @trompx When can this PR be merged?

@beorn7

This comment has been minimized.

Show comment
Hide comment
@beorn7

beorn7 Jun 14, 2016

Member

@grobie Can you comment on this?

Member

beorn7 commented Jun 14, 2016

@grobie Can you comment on this?

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jun 14, 2016

Member

#37 (comment) still applies here.

Member

brian-brazil commented Jun 14, 2016

#37 (comment) still applies here.

@vitorarins

This comment has been minimized.

Show comment
Hide comment
@vitorarins

vitorarins Jun 14, 2016

Contributor

@brian-brazil Do you want me to close this PR?

Contributor

vitorarins commented Jun 14, 2016

@brian-brazil Do you want me to close this PR?

@davidkarlsen

This comment has been minimized.

Show comment
Hide comment
@davidkarlsen

davidkarlsen Oct 18, 2016

Could this be merged? Even if the status are not accurately what you'd want - it's still what haproxy exposes.

Could this be merged? Even if the status are not accurately what you'd want - it's still what haproxy exposes.

@beorn7

This comment has been minimized.

Show comment
Hide comment
@beorn7

beorn7 Jun 22, 2017

Member

@grobie Do you have state on this?

Member

beorn7 commented Jun 22, 2017

@grobie Do you have state on this?

@lambertjosh

This comment has been minimized.

Show comment
Hide comment
@lambertjosh

lambertjosh Jul 5, 2017

I think having this is better than not, even though for some use cases it may not be accurate.

I think having this is better than not, even though for some use cases it may not be accurate.

@SuperQ

This comment has been minimized.

Show comment
Hide comment
@SuperQ

SuperQ Jul 17, 2017

Member

@vitorarins, can you rebase and fix the merge conflicts?

Member

SuperQ commented Jul 17, 2017

@vitorarins, can you rebase and fix the merge conflicts?

@vitorarins

This comment has been minimized.

Show comment
Hide comment
@vitorarins

vitorarins Jul 17, 2017

Contributor

👍

Contributor

vitorarins commented Jul 17, 2017

👍

@SuperQ SuperQ requested a review from juliusv Jul 18, 2017

haproxy_exporter.go
@@ -211,6 +213,10 @@ func NewExporter(uri string, selectedServerMetrics map[int]*prometheus.GaugeVec,
42: newBackendMetric("http_responses_total", "Total of HTTP responses.", prometheus.Labels{"code": "4xx"}),
43: newBackendMetric("http_responses_total", "Total of HTTP responses.", prometheus.Labels{"code": "5xx"}),
44: newBackendMetric("http_responses_total", "Total of HTTP responses.", prometheus.Labels{"code": "other"}),
+ 58: newBackendMetric("http_queue_time", "Avg. HTTP queue time for last 1024 successful connections.", nil),

This comment has been minimized.

@juliusv

juliusv Jul 18, 2017

Member

This should include a unit, and if I understand the metric correctly, should have a more apt name. Maybe something like:

http_average_queue_time_seconds

I don't know if it's worth including in the metric name the fact that it is an exponential moving average (do we know this for sure?), and over the last 1024 connections.

Same for the other new metrics here.

@juliusv

juliusv Jul 18, 2017

Member

This should include a unit, and if I understand the metric correctly, should have a more apt name. Maybe something like:

http_average_queue_time_seconds

I don't know if it's worth including in the metric name the fact that it is an exponential moving average (do we know this for sure?), and over the last 1024 connections.

Same for the other new metrics here.

This comment has been minimized.

@brian-brazil

brian-brazil Jul 18, 2017

Member

We generally don't mention that a metric is a exponentially moving average in the name, and I'm not aware of any naming conventions on that front.

@brian-brazil

brian-brazil Jul 18, 2017

Member

We generally don't mention that a metric is a exponentially moving average in the name, and I'm not aware of any naming conventions on that front.

This comment has been minimized.

@vitorarins

vitorarins Jul 19, 2017

Contributor

It has been a while since I've done this, but I think I got these from the HAProxy documentation at the time.

@brian-brazil @juliusv Should I change the names/descriptions here?

@vitorarins

vitorarins Jul 19, 2017

Contributor

It has been a while since I've done this, but I think I got these from the HAProxy documentation at the time.

@brian-brazil @juliusv Should I change the names/descriptions here?

This comment has been minimized.

@grobie

grobie Jul 19, 2017

Member

Yes, it should def. have the unit prefix and I also agree with Julius to include average in the name.

@grobie

grobie Jul 19, 2017

Member

Yes, it should def. have the unit prefix and I also agree with Julius to include average in the name.

This comment has been minimized.

@juliusv

juliusv Jul 19, 2017

Member

Yeah, maybe I'd put the average in the end though, how about: http_queue_time_average_seconds and similar for the other ones.

@juliusv

juliusv Jul 19, 2017

Member

Yeah, maybe I'd put the average in the end though, how about: http_queue_time_average_seconds and similar for the other ones.

@juliusv

This comment has been minimized.

Show comment
Hide comment
@juliusv

juliusv Jul 18, 2017

Member

Generally 👍

I realize this number is not perfect, but as long as that's the best that HAProxy exports, that's better than nothing. It also seems a lot of people are for this (both here and on the related issue #37).

@brian-brazil Do you uphold your concerns against this? (in that case, we may need to have a vote)

Member

juliusv commented Jul 18, 2017

Generally 👍

I realize this number is not perfect, but as long as that's the best that HAProxy exports, that's better than nothing. It also seems a lot of people are for this (both here and on the related issue #37).

@brian-brazil Do you uphold your concerns against this? (in that case, we may need to have a vote)

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jul 18, 2017

Member

My technical concerns remain, however they're also from a time when I was more optimistic about improving metrics and I had intentions at one point of adding the requisite instrumentation to haproxy myself. In more recent times the general approach has been to accept such metrics, in cases where a more suitable form does not exist.

Member

brian-brazil commented Jul 18, 2017

My technical concerns remain, however they're also from a time when I was more optimistic about improving metrics and I had intentions at one point of adding the requisite instrumentation to haproxy myself. In more recent times the general approach has been to accept such metrics, in cases where a more suitable form does not exist.

@juliusv

This comment has been minimized.

Show comment
Hide comment
@juliusv

juliusv Jul 19, 2017

Member

@brian-brazil Cool, sounds good then. Then the only thing left here is the metric naming improvement.

Member

juliusv commented Jul 19, 2017

@brian-brazil Cool, sounds good then. Then the only thing left here is the metric naming improvement.

@grobie

I'm fine with exporting these metrics if they are clearly marked as average.

Though, I don't think there is a need to break compatibility to older HAProxy versions for this request. We can simply not export these metrics if the CSV result doesn't include them.

haproxy_exporter.go
- expectedCsvFieldCount = 52
+ // HAProxy 1.5.19
+ // pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
+ expectedCsvFieldCount = 62

This comment has been minimized.

@grobie

grobie Jul 19, 2017

Member

Instead of silently breaking compatibility with older HAProxy versions for this request, I think we should change the logic to only export the new metrics in case the CSV result includes these rows.

@grobie

grobie Jul 19, 2017

Member

Instead of silently breaking compatibility with older HAProxy versions for this request, I think we should change the logic to only export the new metrics in case the CSV result includes these rows.

This comment has been minimized.

@vitorarins

vitorarins Jul 20, 2017

Contributor

I've tried to follow the pattern here, but I can give it a try on not breaking compatibility. Do you or any other owner @juliusv @brian-brazil have any suggestions to do this elegantly?

@vitorarins

vitorarins Jul 20, 2017

Contributor

I've tried to follow the pattern here, but I can give it a try on not breaking compatibility. Do you or any other owner @juliusv @brian-brazil have any suggestions to do this elegantly?

This comment has been minimized.

@grobie

grobie Jul 20, 2017

Member

It should be pretty straight forward:

  • Remove expectedCsvFieldCount, the name doesn't make sense if we are able to handle different field counts

  • Change line 355 len(csvRow) < expectedCsvFieldCount to < 32, that's the minimum we need

  • Change the validation in exportCsvFields to sth. like this:

    valueStr, ok := csvRow[fieldIdx]
    if !ok || valueStr == "" {
      continue
    }
@grobie

grobie Jul 20, 2017

Member

It should be pretty straight forward:

  • Remove expectedCsvFieldCount, the name doesn't make sense if we are able to handle different field counts

  • Change line 355 len(csvRow) < expectedCsvFieldCount to < 32, that's the minimum we need

  • Change the validation in exportCsvFields to sth. like this:

    valueStr, ok := csvRow[fieldIdx]
    if !ok || valueStr == "" {
      continue
    }

This comment has been minimized.

@vitorarins

vitorarins Jul 20, 2017

Contributor

Awesome! Will work on that. Thank you so much!

@vitorarins

vitorarins Jul 20, 2017

Contributor

Awesome! Will work on that. Thank you so much!

@vitorarins

This comment has been minimized.

Show comment
Hide comment
@vitorarins

vitorarins Jul 20, 2017

Contributor

@grobie I made some changes trying to achieve what you requested, but I am not sure is the right way.

There is no "found" for slices in Go, so the suggested code: valueStr, ok := csvRow[fieldIdx] didn't work 😞 .
So I made an adjustment checking for the length of csvRow. But what if someone uses a different version of HAProxy and the metrics get out of place? I don't know all about HAProxy metrics, but do you think that could happen?

Contributor

vitorarins commented Jul 20, 2017

@grobie I made some changes trying to achieve what you requested, but I am not sure is the right way.

There is no "found" for slices in Go, so the suggested code: valueStr, ok := csvRow[fieldIdx] didn't work 😞 .
So I made an adjustment checking for the length of csvRow. But what if someone uses a different version of HAProxy and the metrics get out of place? I don't know all about HAProxy metrics, but do you think that could happen?

@grobie

We should also add a test to ensure metrics will be exporter for a HAProxy version < 1.5.10, but simply without http_queue_time_average_seconds etc.

haproxy_exporter.go
- statusField = 17
+ // HAProxy 1.5.19
+ // pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
+ minimumCsvFieldCount = 32

This comment has been minimized.

@grobie

grobie Jul 21, 2017

Member

It's actually 33 if you look at line 367.

@grobie

grobie Jul 21, 2017

Member

It's actually 33 if you look at line 367.

@grobie

This comment has been minimized.

Show comment
Hide comment
@grobie

grobie Jul 21, 2017

Member

But what if someone uses a different version of HAProxy and the metrics get out of place? I don't know all about HAProxy metrics, but do you think that could happen?

I'm not quite sure what you mean here. The CSV format of HAProxy is append only, so it'll always be the same order. Older versions will simply have less fields and hit your break in the loop.

Member

grobie commented Jul 21, 2017

But what if someone uses a different version of HAProxy and the metrics get out of place? I don't know all about HAProxy metrics, but do you think that could happen?

I'm not quite sure what you mean here. The CSV format of HAProxy is append only, so it'll always be the same order. Older versions will simply have less fields and hit your break in the loop.

vitorarins added some commits Oct 18, 2015

@vitorarins

This comment has been minimized.

Show comment
Hide comment
@vitorarins

vitorarins Aug 22, 2017

Contributor

@grobie I think I made all the changes requested. Can you take a look?

Thanks.

Contributor

vitorarins commented Aug 22, 2017

@grobie I think I made all the changes requested. Can you take a look?

Thanks.

@grobie

grobie approved these changes Aug 22, 2017

@grobie grobie merged commit 9fee681 into prometheus:master Aug 22, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@vitorarins vitorarins deleted the vitorarins:export-response-time branch Aug 22, 2017

@StephanErb

This comment has been minimized.

Show comment
Hide comment
@StephanErb

StephanErb Aug 30, 2017

@vitorarins @grobie HAProxy reports those timings as ms (via https://cbonte.github.io/haproxy-dconv/configuration-1.5.html):

 58. qtime [..BS]: the average queue time in ms over the 1024 last requests
 59. ctime [..BS]: the average connect time in ms over the 1024 last requests
 60. rtime [..BS]: the average response time in ms over the 1024 last requests

I don't see any transformation from mililseconds to seconds in the code. Am I missing somethings or are we exporting the metrics in the wrong unit here?

@vitorarins @grobie HAProxy reports those timings as ms (via https://cbonte.github.io/haproxy-dconv/configuration-1.5.html):

 58. qtime [..BS]: the average queue time in ms over the 1024 last requests
 59. ctime [..BS]: the average connect time in ms over the 1024 last requests
 60. rtime [..BS]: the average response time in ms over the 1024 last requests

I don't see any transformation from mililseconds to seconds in the code. Am I missing somethings or are we exporting the metrics in the wrong unit here?

@vitorarins

This comment has been minimized.

Show comment
Hide comment
@vitorarins

vitorarins Aug 30, 2017

Contributor

@StephanErb You are correct. Those metrics are in ms and there is no transformation, just the naming is wrong. I will work on a fix for the name.

Contributor

vitorarins commented Aug 30, 2017

@StephanErb You are correct. Those metrics are in ms and there is no transformation, just the naming is wrong. I will work on a fix for the name.

@grobie

This comment has been minimized.

Show comment
Hide comment
@grobie

grobie Aug 30, 2017

Member
Member

grobie commented Aug 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment