Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: capture disks when entire peer is offline #11697

Merged
merged 1 commit into from Mar 4, 2021

Conversation

harshavardhana
Copy link
Member

Description

fix: capture disks when entire peer is offline

Motivation and Context

currently when one of the peer is down, the
drives from that peer are reported as '0/0'
offline instead we should capture/filter the
drives from the peer and populate it appropriately
such that mc admin info displays correct info.

How to test this PR?

#!/usr/bin/env bash

export MINIO_ACCESS_KEY="minio"
export MINIO_SECRET_KEY="minio123"
export MINIO_ERASURE_SET_DRIVE_COUNT=4
export MINIO_PROMETHEUS_AUTH_TYPE=public
#rm -rf /tmp/disk{01..12}
export MINIO_ENDPOINTS="http://127.0.0.1:9001/home/harsha/disk01 http://127.0.0.1:9001/home/harsha/disk02 http://127.0.0.1:9002/home/harsha/disk03 http://127.0.0.1:9002/home/harsha/disk04 http://127.0.0.1:9001/home/harsha/disk05 http://127.0.0.1:9001/home/harsha/disk06 http://127.0.0.1:9002/home/harsha/disk07 http://127.0.0.1:9002/home/harsha/disk08"
export MINIO_KMS_AUTO_ENCRYPTION=off
for i in {01..02}; do
    minio server --address ":90${i}" &
done

and run mc admin info alias/ after taking down peer on port 9001

Correct values should be

mc admin info myminio-local2/
●  127.0.0.1:9001
   Uptime: offline
   Drives: 0/4 OK 

●  127.0.0.1:9002
   Uptime: 41 seconds 
   Version: 2021-03-04T02:19:32Z
   Network: 1/2 OK 
   Drives: 4/4 OK 

2.7 GiB Used, 2 Buckets, 4,089 Objects
4 drives online, 4 drives offline

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Documentation updated
  • Unit tests added/updated

currently when one of the peer is down, the
drives from that peer are reported as '0/0'
offline instead we should capture/filter the
drives from the peer and populate it appropriately
such that `mc admin info` displays correct info.
@minio-trusted
Copy link
Contributor

Mint Automation

Test Result
mint-large-bucket.sh ✔️
mint-fs.sh ✔️
mint-gateway-s3.sh ✔️
mint-erasure.sh ✔️
mint-dist-erasure.sh ✔️
mint-zoned.sh ✔️
mint-gateway-nas.sh ✔️
mint-compress-encrypt-dist-erasure.sh more...

11697-c070d10/mint-compress-encrypt-dist-erasure.sh.log:

Running with
SERVER_ENDPOINT:      minio-dev7.minio.io:30886
ACCESS_KEY:           minio
SECRET_KEY:           ***REDACTED***
ENABLE_HTTPS:         0
SERVER_REGION:        us-east-1
MINT_DATA_DIR:        /mint/data
MINT_MODE:            full
ENABLE_VIRTUAL_STYLE: 0

To get logs, run 'docker cp c266364dd9a7:/mint/log /tmp/mint-logs'

(1/15) Running aws-sdk-go tests ... done in 2 seconds
(2/15) Running aws-sdk-java tests ... done in 1 seconds
(3/15) Running aws-sdk-php tests ... done in 44 seconds
(4/15) Running aws-sdk-ruby tests ... done in 4 seconds
(5/15) Running awscli tests ... FAILED in 32 seconds
{
  "name": "awscli",
  "duration": 2774,
  "function": "aws --endpoint-url http://minio-dev7.minio.io:30886 s3api copy-object --bucket awscli-mint-test-bucket-16472 --key datafile-1-kB-copy --copy-source awscli-mint-test-bucket-16472/datafile-1-kB\n",
  "status": "FAIL",
  "error": "Hash mismatch expected 084e1383b70fb0c51acc680fef370023, got ac57de7156d7fc25ac1a65f81fa3989b"
}
(5/15) Running healthcheck tests ... done in 0 seconds
(6/15) Running mc tests ... done in 49 seconds
(7/15) Running minio-dotnet tests ... done in 42 seconds
(8/15) Running minio-go tests ... FAILED in 2 minutes and 43 seconds
{
  "args": {
    "destination": {
      "Bucket": "minio-go-test-63e1ypi1tu2uwc35",
      "Object": "dstObject",
      "Encryption": {},
      "UserMetadata": null,
      "ReplaceMetadata": false,
      "UserTags": null,
      "ReplaceTags": false,
      "LegalHold": "",
      "Mode": "",
      "RetainUntilDate": "0001-01-01T00:00:00Z",
      "Size": 0,
      "Progress": null
    },
    "source": {
      "Bucket": "minio-go-test-63e1ypi1tu2uwc35",
      "Object": "srcObject",
      "VersionID": "",
      "MatchETag": "",
      "NoMatchETag": "",
      "MatchModifiedSince": "0001-01-01T00:00:00Z",
      "MatchUnmodifiedSince": "0001-01-01T00:00:00Z",
      "MatchRange": false,
      "Start": 0,
      "End": 0,
      "Encryption": null
    }
  },
  "duration": 4886,
  "error": "We encountered an internal error, please try again.: cause(s2: corrupt input)",
  "function": "CopyObject(destination, source)",
  "message": "GetObject failed",
  "name": "minio-go: testUnencryptedToSSES3CopyObject",
  "status": "FAIL"
}
(8/15) Running minio-java tests ... FAILED in 2 minutes and 0 seconds
{
  "name": "minio-java",
  "function": "copyObject()",
  "args": "[match etag]",
  "duration": 249,
  "status": "FAIL",
  "error": "error occurred\nErrorResponse(code = PreconditionFailed, message = At least one of the pre-conditions you specified did not hold, bucketName = minio-java-test-1f0ifan, objectName = minio-java-test-11bt1ue-copy, resource = /minio-java-test-1f0ifan/minio-java-test-11bt1ue-copy, requestId = 1669040F2739AE18, hostId = e69500b8-16c1-4cfb-ba39-e7b68795a191)\nrequest={method=PUT, url=http://minio-dev7.minio.io:30886/minio-java-test-1f0ifan/minio-java-test-11bt1ue-copy, headers=x-amz-copy-source-if-match: 71cff0a060f852067e443ad1e24ae26c-1\nx-amz-copy-source: /minio-java-test-3vgo9k1/minio-java-test-11bt1ue\nHost: minio-dev7.minio.io:30886\nAccept-Encoding: identity\nUser-Agent: MinIO (Linux; amd64) minio-java/8.0.3\nContent-MD5: 1B2M2Y8AsgTpgAmY7PhCfg==\nx-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date: 20210304T025324Z\nAuthorization: AWS4-HMAC-SHA256 Credential=*REDACTED*/20210304/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-copy-source;x-amz-copy-source-if-match;x-amz-date, Signature=*REDACTED*\n}\nresponse={code=412, headers=Accept-Ranges: bytes\nContent-Length: 418\nContent-Security-Policy: block-all-mixed-content\nContent-Type: application/xml\nETag: \"71cff0a060f852067e443ad1e24ae26c\"\nLast-Modified: Thu, 04 Mar 2021 02:53:24 GMT\nServer: MinIO\nVary: Origin\nX-Amz-Request-Id: 1669040F2739AE18\nX-Xss-Protection: 1; mode=block\nDate: Thu, 04 Mar 2021 02:53:24 GMT\n}\n >>> [io.minio.MinioClient.execute(MinioClient.java:775), io.minio.MinioClient.execute(MinioClient.java:563), io.minio.MinioClient.executePut(MinioClient.java:904), io.minio.MinioClient.copyObject(MinioClient.java:1232), FunctionalTest.testCopyObjectMatchETag(FunctionalTest.java:1850), FunctionalTest.copyObject(FunctionalTest.java:2016), FunctionalTest.runObjectTests(FunctionalTest.java:3757), FunctionalTest.runTests(FunctionalTest.java:3783), FunctionalTest.main(FunctionalTest.java:3927)]"
}
(8/15) Running minio-js tests ... done in 52 seconds
(9/15) Running minio-py tests ... done in 3 minutes and 23 seconds
(10/15) Running s3cmd tests ... FAILED in 5 seconds
{
  "name": "s3cmd",
  "duration": "3076",
  "function": "test_put_object_multipart",
  "status": "FAIL",
  "error": "WARNING: MD5 Sums don't match!\nWARNING: Retrying upload of /mint/data/datafile-65-MB\nWARNING: MD5 Sums don't match!\nWARNING: Retrying upload of /mint/data/datafile-65-MB\nWARNING: MD5 Sums don't match!\nWARNING: Retrying upload of /mint/data/datafile-65-MB\nWARNING: MD5 Sums don't match!\nWARNING: Retrying upload of /mint/data/datafile-65-MB\nWARNING: MD5 Sums don't match!\nWARNING: Retrying upload of /mint/data/datafile-65-MB\nWARNING: MD5 Sums don't match!\nWARNING: Too many failures. Giving up on '/mint/data/datafile-65-MB'\nERROR: \nUpload of '/mint/data/datafile-65-MB' part 1 failed. Use\n  /usr/local/bin/s3cmd abortmp s3://s3cmd-test-bucket-20861/s3cmd-test-object-13418 c1635e22-8fb3-4653-ba53-7d2e87cc4493\nto abort the upload, or\n  /usr/local/bin/s3cmd --upload-id c1635e22-8fb3-4653-ba53-7d2e87cc4493 put ...\nto continue the upload.\nERROR: Upload of '/mint/data/datafile-65-MB' failed too many times (Last reason: )"
}
(10/15) Running s3select tests ... done in 8 seconds
(11/15) Running security tests ... done in 0 seconds

Executed 11 out of 15 tests successfully.

Deleting image on docker hub
Deleting image locally

Copy link
Contributor

@poornas poornas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@vadmeste vadmeste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@harshavardhana harshavardhana merged commit 7865850 into minio:master Mar 4, 2021
@harshavardhana harshavardhana deleted the display-offline branch March 4, 2021 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants