Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Prometheus metrics #11141

Merged
merged 3 commits into from
Jan 19, 2021
Merged

Updated Prometheus metrics #11141

merged 3 commits into from
Jan 19, 2021

Conversation

kerneltime
Copy link
Contributor

@kerneltime kerneltime commented Dec 19, 2020

Description

Introduce the updated Prometheus metrics.
This change will expose 2 URLs for metrics collection

  1. Cluster URL: Will report only MinIO generated metrics across the entire cluster.
  2. Node URL: Will report GO and process specific metrics. This will be on a per node basis.

This change also cleans up the names for the metrics making it complaint to Prometheus best practices and groups them in a cleaner way.

TODO:

  • Finish up the peer API to fetch cluster metrics
  • Documentation updates
  • Histogram for TTFB

If any specific metrics need to be part of the Cluster URL that are now part of Node URL, it will be added and reported across the cluster.

Motivation and Context

  1. Internal representation of metrics generated by MinIO
  2. This allows Prometheus to scrape from any node and allows Prometheus to scrape off MinIO behind a load balancer
  3. Clean up the names and code organization

How to test this PR?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Documentation needed
  • Unit tests needed

@kerneltime kerneltime changed the title [WIP] Introduce the updated Prometheus metrics [WIP] Updated Prometheus metrics Dec 19, 2020
@kerneltime
Copy link
Contributor Author

Testing the values reported and documentation are pending.

@kerneltime kerneltime changed the title [WIP] Updated Prometheus metrics Updated Prometheus metrics Jan 5, 2021
@kerneltime kerneltime force-pushed the prom-v2 branch 3 times, most recently from 9a2066c to a8eb657 Compare January 5, 2021 22:27
@kerneltime
Copy link
Contributor Author

kerneltime commented Jan 5, 2021

Need to account for changes in #11196 -> done

Copy link
Contributor

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial comments, still testing

cmd/http-stats.go Outdated Show resolved Hide resolved
docs/metrics/prometheus/README.md Outdated Show resolved Hide resolved
@kerneltime kerneltime force-pushed the prom-v2 branch 2 times, most recently from d021b1b to b97e1b8 Compare January 7, 2021 07:01
cmd/metrics-v2_test.go Outdated Show resolved Hide resolved
cmd/metrics.go Outdated Show resolved Hide resolved
cmd/metrics.go Show resolved Hide resolved
Copy link
Contributor

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All cluster_capacity_* metrics report 0

image

Copy link
Contributor

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minio_bucket_replication_pending_bytes gets updated, but neither minio_bucket_replication_received_bytes nor minio_bucket_replication_sent_bytes is getting updated --> always reports 0

image

@nitisht
Copy link
Contributor

nitisht commented Jan 11, 2021

@kerneltime could you please fix the conflict here

@kerneltime
Copy link
Contributor Author

minio_bucket_replication_pending_bytes gets updated, but neither minio_bucket_replication_received_bytes nor minio_bucket_replication_sent_bytes is getting updated --> always reports 0

image

Fixed.

@minio minio deleted a comment from minio-trusted Jan 12, 2021
@kerneltime kerneltime force-pushed the prom-v2 branch 3 times, most recently from 4aa12f3 to 90c8804 Compare January 12, 2021 07:21
@minio minio deleted a comment from minio-trusted Jan 12, 2021
@minio minio deleted a comment from minio-trusted Jan 12, 2021
Copy link
Contributor

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be failing

$ curl localhost:9000/minio/prometheus/metrics/cluster
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AllAccessDisabled</Code><Message>All access to this bucket has been disabled.</Message><Resource>/minio/prometheus/metrics/cluster</Resource><RequestId></RequestId><HostId>09025b64-2459-4812-a523-acab49f8cc4d</HostId></Error>

$ curl localhost:9000/minio/prometheus/metrics/node
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AllAccessDisabled</Code><Message>All access to this bucket has been disabled.</Message><Resource>/minio/prometheus/metrics/node</Resource><RequestId></RequestId><HostId>09025b64-2459-4812-a523-acab49f8cc4d</HostId></Error>

@kerneltime
Copy link
Contributor Author

Seems to be failing

$ curl localhost:9000/minio/prometheus/metrics/cluster
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AllAccessDisabled</Code><Message>All access to this bucket has been disabled.</Message><Resource>/minio/prometheus/metrics/cluster</Resource><RequestId></RequestId><HostId>09025b64-2459-4812-a523-acab49f8cc4d</HostId></Error>

$ curl localhost:9000/minio/prometheus/metrics/node
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AllAccessDisabled</Code><Message>All access to this bucket has been disabled.</Message><Resource>/minio/prometheus/metrics/node</Resource><RequestId></RequestId><HostId>09025b64-2459-4812-a523-acab49f8cc4d</HostId></Error>

It worked for me, will sync up with you to see what the issue is.

Copy link
Contributor

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, changes work fine. Except this case:

on a single node setup with 4 drives i.e. ./minio server /tmp/data{1...4}, the fields minio_cluster_capacity_* report 0.

cmd/metrics-v2.go Outdated Show resolved Hide resolved
cmd/metrics-v2.go Outdated Show resolved Hide resolved
@nitisht
Copy link
Contributor

nitisht commented Jan 13, 2021

It worked for me, will sync up with you to see what the issue is.

apologies for confusion, this was working

docs/metrics/prometheus/list.md Outdated Show resolved Hide resolved
cmd/peer-rest-client.go Outdated Show resolved Hide resolved
cmd/peer-rest-client.go Outdated Show resolved Hide resolved
cmd/metrics-router.go Outdated Show resolved Hide resolved
@kerneltime
Copy link
Contributor Author

kerneltime commented Jan 13, 2021

Tested, changes work fine. Except this case:

on a single node setup with 4 drives i.e. ./minio server /tmp/data{1...4}, the fields minio_cluster_capacity_* report 0.

This works for distributed setup. Not sure if we want to report this metric for single node deployment.

# HELP minio_cluster_capacity_raw_free_bytes Total free capacity online in the cluster.
# TYPE minio_cluster_capacity_raw_free_bytes gauge
minio_cluster_capacity_raw_free_bytes{server="127.0.0.1:9004"} 9.88841705472e+11
# HELP minio_cluster_capacity_raw_total_bytes Total capacity online in the cluster.
# TYPE minio_cluster_capacity_raw_total_bytes gauge
minio_cluster_capacity_raw_total_bytes{server="127.0.0.1:9004"} 1.50901174272e+12
# HELP minio_cluster_capacity_usable_free_bytes Total free usable capacity online in the cluster.
# TYPE minio_cluster_capacity_usable_free_bytes gauge
minio_cluster_capacity_usable_free_bytes{server="127.0.0.1:9004"} 4.94420852736e+11
# HELP minio_cluster_capacity_usable_total_bytes Total usable capacity online in the cluster.
# TYPE minio_cluster_capacity_usable_total_bytes gauge
minio_cluster_capacity_usable_total_bytes{server="127.0.0.1:9004"} 7.5450587136e+11

@nitisht
Copy link
Contributor

nitisht commented Jan 14, 2021

This works for distributed setup. Not sure if we want to report this metric for single node deployment.

Since it is a standard deployment pattern, IMO it would be nice to have. But we can look at this in a later PR.

Copy link
Contributor

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM & Tested

@kerneltime kerneltime mentioned this pull request Jan 15, 2021
6 tasks
@harshavardhana
Copy link
Member

@kerneltime please update the go.mod conflict we should be able to take the PR in.

cmd/disk-cache-stats.go Outdated Show resolved Hide resolved
cmd/metrics-router.go Outdated Show resolved Hide resolved
cmd/metrics-v2.go Outdated Show resolved Hide resolved
cmd/metrics-v2.go Outdated Show resolved Hide resolved
cmd/metrics-v2.go Outdated Show resolved Hide resolved
cmd/metrics-v2.go Outdated Show resolved Hide resolved
cmd/metrics-v2.go Outdated Show resolved Hide resolved
cmd/notification.go Outdated Show resolved Hide resolved
cmd/notification.go Outdated Show resolved Hide resolved
cmd/notification.go Outdated Show resolved Hide resolved
@minio-trusted
Copy link
Contributor

Mint Automation

Test Result
mint-large-bucket.sh ✔️
mint-fs.sh ✔️
mint-gateway-s3.sh ✔️
mint-erasure.sh ✔️
mint-dist-erasure.sh ✔️
mint-zoned.sh ✔️
mint-gateway-nas.sh ✔️
mint-gateway-azure.sh more...

11141-b4c1b3a/mint-gateway-azure.sh.log:

Running with
SERVER_ENDPOINT:      minio-c3.minio.io:32041
ACCESS_KEY:           minioazure
SECRET_KEY:           ***REDACTED***
ENABLE_HTTPS:         0
SERVER_REGION:        us-east-1
MINT_DATA_DIR:        /mint/data
MINT_MODE:            full
ENABLE_VIRTUAL_STYLE: 0

To get logs, run 'docker cp 243a060dd432:/mint/log /tmp/mint-logs'

(1/15) Running aws-sdk-go tests ... done in 9 seconds
(2/15) Running aws-sdk-java tests ... done in 2 seconds
(3/15) Running aws-sdk-php tests ... done in 2 minutes and 40 seconds
(4/15) Running aws-sdk-ruby tests ... done in 20 seconds
(5/15) Running awscli tests ... done in 2 minutes and 55 seconds
(6/15) Running healthcheck tests ... done in 0 seconds
(7/15) Running mc tests ... done in 4 minutes and 17 seconds
(8/15) Running minio-dotnet tests ... done in 1 minutes and 44 seconds
(9/15) Running minio-go tests ... done in 6 minutes and 28 seconds
(10/15) Running minio-java tests ... FAILED in 8 minutes and 55 seconds
{
  "name": "minio-java",
  "function": "putObject()",
  "args": "[user metadata]",
  "duration": 173,
  "status": "FAIL",
  "error": "error occurred\nErrorResponse(code = AuthenticationFailed, message = -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.10.0/azblob/zc_storage_error.go:42\n===== RESPONSE ERROR (ServiceCode=AuthenticationFailed) =====\nDescription=Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:5903ad6a-501e-0146-0b0a-ee752d000000\nTime:2021-01-19T02:23:24.8010516Z, Details: \n   AuthenticationErrorDetail: The MAC signature found in the HTTP request 'JF7IlrN3EhbD0bvEJp1fW59+wlz6JH5jerqzsgQpzFU=' is not the same as any computed signature. Server used following string to sign: 'PUT\n\n\n128\n\napplication/xml\n\n\n\n\n\n\nx-ms-blob-cache-control:\nx-ms-blob-content-disposition:\nx-ms-blob-content-encoding:\nx-ms-blob-content-language:\nx-ms-blob-content-type:application/octet-stream\nx-ms-client-request-id:06ce40f2-c11e-4a30-6e99-1bb0a1ccf63b\nx-ms-date:Tue, 19 Jan 2021 02:23:24 GMT\nx-ms-meta-my_header1:a   b   c\nx-ms-meta-my_header2:\"a   b   c\"\nx-ms-meta-my_project:Project One\nx-ms-meta-my_unicode_tag:商å“�\nx-ms-version:2019-02-02\n/minioazure/minio-java-test-1q367ut/minio-java-test-3dce10l\ncomp:blocklist\ntimeout:1501'.\n   Code: AuthenticationFailed\n   PUT https://minioazure.blob.core.windows.net/minio-java-test-1q367ut/minio-java-test-3dce10l?comp=blocklist&timeout=1501\n   Authorization: REDACTED\n   Content-Length: [128]\n   Content-Type: [application/xml]\n   User-Agent: [APN/1.0 MinIO/1.0 MinIO/2021-01-19T01:49:38Z]\n   X-Ms-Blob-Cache-Control: []\n   X-Ms-Blob-Content-Disposition: []\n   X-Ms-Blob-Content-Encoding: []\n   X-Ms-Blob-Content-Language: []\n   X-Ms-Blob-Content-Type: [application/octet-stream]\n   X-Ms-Client-Request-Id: [06ce40f2-c11e-4a30-6e99-1bb0a1ccf63b]\n   X-Ms-Date: [Tue, 19 Jan 2021 02:23:24 GMT]\n   X-Ms-Meta-My_header1: [a   b   c]\n   X-Ms-Meta-My_header2: [\"a   b   c\"]\n   X-Ms-Meta-My_project: [Project One]\n   X-Ms-Meta-My_unicode_tag: [商品]\n   X-Ms-Version: [2019-02-02]\n   --------------------------------------------------------------------------------\n   RESPONSE Status: 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\n   Content-Length: [1092]\n   Content-Type: [application/xml]\n   Date: [Tue, 19 Jan 2021 02:23:24 GMT]\n   Server: [Microsoft-HTTPAPI/2.0]\n   X-Ms-Error-Code: [AuthenticationFailed]\n   X-Ms-Request-Id: [5903ad6a-501e-0146-0b0a-ee752d000000]\n\n\n, bucketName = minio-java-test-1q367ut, objectName = minio-java-test-3dce10l, resource = /minio-java-test-1q367ut/minio-java-test-3dce10l, requestId = 165B80E32008A51E, hostId = 33981b9f-8ce5-4926-834d-6f1d2e74cf51)\nrequest={method=PUT, url=http://minio-c3.minio.io:32041/minio-java-test-1q367ut/minio-java-test-3dce10l, headers=x-amz-meta-My-Unicode-Tag: 商品\nx-amz-meta-My-Project: Project One\nx-amz-meta-My-header1: a   b   c\nx-amz-meta-My-Header2: \"a   b   c\"\nContent-Type: application/octet-stream\nHost: minio-c3.minio.io:32041\nAccept-Encoding: identity\nUser-Agent: MinIO (Linux; amd64) minio-java/8.0.3\nContent-MD5: A9oFTxee7YVcJ9fWsgQeKg==\nx-amz-content-sha256: 1ff7959f86334ddc5c188a5083268f600146328b2b6c5185e75bf7d9387d6b74\nx-amz-date: 20210119T022324Z\nAuthorization: AWS4-HMAC-SHA256 Credential=*REDACTED*/20210119/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date;x-amz-meta-my-header1;x-amz-meta-my-header2;x-amz-meta-my-project;x-amz-meta-my-unicode-tag, Signature=*REDACTED*\n}\nresponse={code=403, headers=Accept-Ranges: bytes\nContent-Length: 3086\nContent-Security-Policy: block-all-mixed-content\nContent-Type: application/xml\nServer: MinIO\nVary: Origin\nX-Amz-Request-Id: 165B80E32008A51E\nX-Xss-Protection: 1; mode=block\nDate: Tue, 19 Jan 2021 02:23:24 GMT\n}\n >>> [io.minio.MinioClient.execute(MinioClient.java:775), io.minio.MinioClient.putObject(MinioClient.java:4547), io.minio.MinioClient.putObject(MinioClient.java:2713), io.minio.MinioClient.putObject(MinioClient.java:2830), FunctionalTest.testPutObject(FunctionalTest.java:763), FunctionalTest.putObject(FunctionalTest.java:890), FunctionalTest.runObjectTests(FunctionalTest.java:3751), FunctionalTest.runTests(FunctionalTest.java:3783), FunctionalTest.main(FunctionalTest.java:3927)]"
}
(10/15) Running minio-js tests ... done in 2 minutes and 44 seconds
(11/15) Running minio-py tests ... done in 18 minutes and 36 seconds
(12/15) Running s3cmd tests ... done in 2 minutes and 21 seconds
(13/15) Running s3select tests ... done in 1 minutes and 0 seconds
(14/15) Running security tests ... done in 0 seconds

Executed 14 out of 15 tests successfully.

Deleting image on docker hub
Deleting image locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants