Skip to content

Latest commit

 

History

History
179 lines (152 loc) · 14.4 KB

4.springboot2配置prometheus监控.md

File metadata and controls

179 lines (152 loc) · 14.4 KB

背景介绍

SpringBoot2是2018年3.1号release的, 距离现在已经9个月左右了, 而且2.1.0已经release了, 所以2.0用在生产环境也没啥大问题了, 之前在springboot1.5版本的时候, 我写了一个切面注解, 开发人员只需要在需要监控的接口上加上注解就可以自动添加相关的监控, 比如QPS, 平均响应时间等. 最近有空看了一下springboot2版本的prometheus监控的添加.发现springboot真TMD好用, 企业开发必备.

springboot1.X的时候, 监控这块还比较弱, 配置prometheus的话还需要引入prometheus官方提供的依赖, 需要配置的挺多的, springboot升级到2版本以后, 监控这块得到了大大的增强, 支持的数据类型也跟prometheus一样多了.

  • springboot2新增了一个监控代理层, 这个代理层又可以支持多加监控服务, 比如prometheus, datadog等.
  • 支持的数据格式也多了Counter, Gauge, Histgram等都有, 对于一个服务的监控来说够用了.
  • SpringBoot会自动配置一个综合的registry, 并且会扫描classpath, 把支持的registry加进来, 比如你在POM中定义了micrometer-registry-prometheus的依赖, 它就会把prometheus的registry添加进来.Springboot会自动把加进来的registry作用于Metrics类上, 这样我们只需要引入相关的包, 然后直接使用相关的方法就行, 数据自动会吐出来.比如使用Metrics.Timer
  • 官方原生有切面注解的支持, 这一点我想说, 太TMD懂用户了.

添加方法

官方支持监控这块是一个独立的项目: Micrometer, 而且数据源也结合本身的监控组件来的, 所以需要添加如下的依赖:

        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

加上依赖后添加如下配置到application.properties中:

management.endpoint.prometheus.enabled=true
management.endpoints.web.exposure.include=prometheus

版本根据自己的需求来确定, 一般不建议使用最新的, 我遇到过使用最新的监控接口404的情况.

生产环境中, 一般我们对接口有3个纬度的性能监控:

  • QPS
rate(counter[10s])
sum(rate(counter[10s])) by (label)
# 这里注意10s, 一定要大于抓取的时间间隔, 比如我的抓取间隔是30秒, 这里最好计算1m内的.
  • 平均响应时间
rate(timer_sum[10s])/rate(timer_count[10s])
sum(rate(timer_sum[10s])) by (label)/sum(rate(timer_count[10s])) by (label)
  • 接口耗时的分位数, 比如95%的请求耗时在多少秒以内
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (job, le))

下面的代码中一个注解就可以获取以上所有性能指标:

    /**
     * @Timed使用这一个注解就可以计算QPS和平均响应时间
     * extraTags是添加额外的labels, percentile返回的是比如95%的接口相应时间在多少以内, histogram是是否输出bucket数据还是只输出
     * count+sum数据.
     * @return
     */
    @GetMapping("/test1/{id}")
    @Timed(value = "xcx", extraTags = {"name", "rocky"}, percentiles = {0.5, 0.95}, histogram = true)
    String index(@PathVariable String id, HttpServletRequest httpServletRequest) {
        System.out.println(httpServletRequest.getRequestURI());
        return id;
    }
    @GetMapping("/test2")
    String hello() {

        Timer.Sample customSample = Timer.start(Metrics.globalRegistry);
        ...耗时操作...
        customSample.stop(Metrics.globalRegistry.timer("tmp.timer.today", "stat", "ok"));
        return "OK";
    }
    
    //这里有一个点要注意, 访问/test1的时候, 后面的pathvariable不会被解析, 当时测试的时候是没有被解析的, 如果解析的话会出问题, 会生成大量的key, 内存会爆掉, 另外如果同时使用官方提供的注解和自定义的timer的时候, 注解一定要有value的值, 也就是起一个名字, 不然会报错.

通过以上简单的注解的使用就可以得到如下的数据, 访问http://127.0.0.1:8001/actuator/prometheus

# HELP http_server_requests_seconds  
# TYPE http_server_requests_seconds histogram
http_server_requests_seconds{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",quantile="0.5",} 0.003801088
http_server_requests_seconds{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",quantile="0.95",} 0.09633792
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.001",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.001048576",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.001398101",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.001747626",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.002097151",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.002446676",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.002796201",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.003145726",} 1.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.003495251",} 2.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.003844776",} 5.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.004194304",} 5.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.005592405",} 5.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.006990506",} 8.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.008388607",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.009786708",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.011184809",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.01258291",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.013981011",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.015379112",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.016777216",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.022369621",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.027962026",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.033554431",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.039146836",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.044739241",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.050331646",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.055924051",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.061516456",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.067108864",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.089478485",} 9.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.111848106",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.134217727",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.156587348",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.178956969",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.20132659",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.223696211",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.246065832",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.268435456",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.357913941",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.447392426",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.536870911",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.626349396",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.715827881",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.805306366",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.894784851",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="0.984263336",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="1.073741824",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="1.431655765",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="1.789569706",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="2.147483647",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="2.505397588",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="2.863311529",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="3.22122547",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="3.579139411",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="3.937053352",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="4.294967296",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="5.726623061",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="7.158278826",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="8.589934591",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="10.021590356",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="11.453246121",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="12.884901886",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="14.316557651",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="15.748213416",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="17.179869184",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="22.906492245",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="28.633115306",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="30.0",} 10.0
http_server_requests_seconds_bucket{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",le="+Inf",} 10.0
http_server_requests_seconds_count{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",} 10.0
http_server_requests_seconds_sum{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",} 0.139419531
# HELP http_server_requests_seconds_max  
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{exception="None",method="GET",name="rocky",status="200",uri="/test1/{id}",} 0.096155881

通过以上就可以配置prometheus抓取了, 配置完抓取以后

参考链接