xl: Avoid multi-disks node to exit when one disk fails #12423

vadmeste · 2021-06-02T16:54:50Z

Description

It makes sense that a node which has multiple disks to start when one
disk fails, returning i/o error for example. This commit will make this
faulty tolerence available in this specific use case.

Motivation and Context

The cluster should be started when a disk is broken.

How to test this PR?

Contact me.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Optimization (provides speedup with no functional changes)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Fixes a regression (If yes, please add commit-id or PR # here)
Documentation updated
Unit tests added/updated

harshavardhana

why do we need this? @vadmeste

vadmeste · 2021-06-02T18:07:36Z

why do we need this? @vadmeste

There is one report about a node which is not starting because if input/output error, which is not good if a node has multiple disks. So this PR will ignore disks that are returning errors when try to upgrade format.json and clean temporary disks.

harshavardhana · 2021-06-02T18:18:33Z

why do we need this? @vadmeste

There is one report about a node which is not starting because if input/output error, which is not good if a node has multiple disks. So this PR will ignore disks that are returning errors when try to upgrade format.json and clean temporary disks.

We already do that @vadmeste in the registerStorage()

vadmeste · 2021-06-02T18:25:15Z

We already do that @vadmeste in the registerStorage()

Well yes, but that was not enough, there are some golang os calls to try to upgrade format.json and clean tmp directories before calling newXLStorage(). Those functions are called formatErasureMigrateLocalEndpoints(endpoints) and formatErasureCleanupTmpLocalEndpoints(endpoints)

By the way, maybe it makes sense to move formatErasureMigrateLocalEndpoints and formatErasureCleanupTmpLocalEndpoints inside newXLStorage(), it looks like this is an easier fix.

harshavardhana · 2021-06-02T18:28:09Z

Well yes, but that was not enough, there are some golang os calls to try to upgrade format.json and clean tmp directories before calling newXLStorage(). Those functions are called formatErasureMigrateLocalEndpoints(endpoints) and formatErasureCleanupTmpLocalEndpoints(endpoints)

By the way, maybe it makes sense to move formatErasureMigrateLocalEndpoints and formatErasureCleanupTmpLocalEndpoints inside newXLStorage(), it looks like this is an easier fix.

Yes that can be done instead @vadmeste adding another variable into Endpoints looks unclean.

harshavardhana · 2021-06-04T00:26:41Z

PTAL at the conflicts @vadmeste

vadmeste · 2021-06-04T10:07:44Z

PTAL at the conflicts @vadmeste

Fixed

cmd/prepare-storage.go

poornas · 2021-06-04T17:31:35Z

cmd/tier-journal.go

+	for _, diskPath := range diskPaths {
+		j := &tierJournal{
+			diskPath: diskPath,
+		}


the tier deletion journal is intentionally being done centrally as it an overkill to have it run on all disk paths. @krisis PTAL

@vadmeste to select the first available local disk on a server works to an extent. It's definitely an improvement. Additionally, we need to act on journals that may be present on other local disks, due to a different set of disks available locally during previous server restarts.

E.g

On the first startup, when all disks are online, this approach would select the first local disk on each server.

Imagine the first disk on the first server goes temporarily offline. On restart, we would select the second disk on this server.

Now the first disk is back online. On a subsequent restart, we would pick this disk.

At this point we would have a journal in the second disk which will remain unattended. We need to address this too.

@vadmeste we could address the issue of possibly unattended tier deletion journals on other local disks in a separate PR.

Yes, but I just thought this cannot be more bad than what we have right now. Yes, we can revert this and address it more properly in another PR.

It makes sense that a node which has multiple disks to start when one disk fails, returning i/o error for example. This commit will make this faulty tolerence available in this specific use case.

poornas

LGTM

vadmeste · 2021-06-04T20:52:59Z

LGTM

@poornas I reverted back the change of the journal, let me know if you see any issue with it.

minio-trusted · 2021-06-04T21:11:32Z

Mint Automation

Test	Result
mint-large-bucket.sh	✔️
mint-fs.sh	✔️
mint-gateway-s3.sh	✔️
mint-erasure.sh	✔️
mint-dist-erasure.sh	✔️
mint-zoned.sh	✔️
mint-gateway-nas.sh	✔️
mint-compress-encrypt-dist-erasure.sh	❌ more...

12423-734a45b/mint-compress-encrypt-dist-erasure.sh.log:

Running with
SERVER_ENDPOINT:      minio-dev7.minio.io:31704
ACCESS_KEY:           minio
SECRET_KEY:           ***REDACTED***
ENABLE_HTTPS:         0
SERVER_REGION:        us-east-1
MINT_DATA_DIR:        /mint/data
MINT_MODE:            full
ENABLE_VIRTUAL_STYLE: 0

To get logs, run 'docker cp 1e96884854e8:/mint/log /tmp/mint-logs'

(1/15) Running aws-sdk-go tests ... done in 1 seconds
(2/15) Running aws-sdk-java tests ... done in 1 seconds
(3/15) Running aws-sdk-php tests ... done in 43 seconds
(4/15) Running aws-sdk-ruby tests ... done in 4 seconds
(5/15) Running awscli tests ... done in 2 minutes and 14 seconds
(6/15) Running healthcheck tests ... done in 0 seconds
(7/15) Running mc tests ... done in 1 minutes and 13 seconds
(8/15) Running minio-dotnet tests ... done in 45 seconds
(9/15) Running minio-go tests ... done in 1 minutes and 53 seconds
(10/15) Running minio-java tests ... FAILED in 1 minutes and 21 seconds
{
  "name": "minio-java",
  "function": "composeObject()",
  "args": "[single source with offset]",
  "duration": 34,
  "status": "FAIL",
  "error": "error occurred\nErrorResponse(code = InvalidArgument, message = Range specified is not valid for source object, bucketName = minio-java-test-qvduse, objectName = minio-java-test-36en505, resource = /minio-java-test-qvduse/minio-java-test-36en505, requestId = 16857D1C44CA16E4, hostId = f609277c-16b3-439d-8929-4b42f6aee275)\nrequest={method=PUT, url=http://minio-dev7.minio.io:31704/minio-java-test-qvduse/minio-java-test-36en505?uploadId=7e0a443c-4b87-44ee-a64e-eeb478321e4f&partNumber=1, headers=x-amz-copy-source: /minio-java-test-qvduse/minio-java-test-sjd6e6\nx-amz-copy-source-range: bytes=2048-1048575\nx-amz-copy-source-if-match: cb92d17a904ccec2e6e23b8bb66245fb\nHost: minio-dev7.minio.io:31704\nAccept-Encoding: identity\nUser-Agent: MinIO (Linux; amd64) minio-java/8.0.3\nContent-MD5: 1B2M2Y8AsgTpgAmY7PhCfg==\nx-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date: 20210604T210641Z\nAuthorization: AWS4-HMAC-SHA256 Credential=*REDACTED*/20210604/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-copy-source;x-amz-copy-source-if-match;x-amz-copy-source-range;x-amz-date, Signature=*REDACTED*\n}\nresponse={code=400, headers=Accept-Ranges: bytes\nContent-Length: 388\nContent-Security-Policy: block-all-mixed-content\nContent-Type: application/xml\nServer: MinIO\nVary: Origin\nX-Amz-Request-Id: 16857D1C44CA16E4\nX-Xss-Protection: 1; mode=block\nDate: Fri, 04 Jun 2021 21:06:41 GMT\n}\n >>> [io.minio.MinioClient.execute(MinioClient.java:775), io.minio.MinioClient.uploadPartCopy(MinioClient.java:4804), io.minio.MinioClient.composeObject(MinioClient.java:1431), FunctionalTest.testComposeObject(FunctionalTest.java:2120), FunctionalTest.composeObjectTests(FunctionalTest.java:2145), FunctionalTest.composeObject(FunctionalTest.java:2300), FunctionalTest.runObjectTests(FunctionalTest.java:3758), FunctionalTest.runTests(FunctionalTest.java:3783), FunctionalTest.main(FunctionalTest.java:3927)]"
}
(10/15) Running minio-js tests ... done in 47 seconds
(11/15) Running minio-py tests ... done in 2 minutes and 45 seconds
(12/15) Running s3cmd tests ... done in 17 seconds
(13/15) Running s3select tests ... done in 7 seconds
(14/15) Running security tests ... done in 0 seconds

Executed 14 out of 15 tests successfully.

Deleting image on docker hub
Deleting image locally

harshavardhana requested changes Jun 2, 2021

View reviewed changes

vadmeste force-pushed the start-with-failed-disk branch from 14be7b3 to 7e5ca0e Compare June 2, 2021 18:05

vadmeste force-pushed the start-with-failed-disk branch from 7e5ca0e to 386466d Compare June 2, 2021 19:18

vadmeste changed the title ~~xl: Ignore initializing disks on problematic endpoints~~ xl: Avoid multi-disks node to exit when one disk fails Jun 2, 2021

harshavardhana approved these changes Jun 3, 2021

View reviewed changes

vadmeste force-pushed the start-with-failed-disk branch from 386466d to e66a975 Compare June 4, 2021 10:03

harshavardhana requested a review from poornas June 4, 2021 15:49

poornas reviewed Jun 4, 2021

View reviewed changes

xl: Avoid multi-disks node to exit when one disk fails

ae8ae3a

It makes sense that a node which has multiple disks to start when one disk fails, returning i/o error for example. This commit will make this faulty tolerence available in this specific use case.

vadmeste force-pushed the start-with-failed-disk branch from e66a975 to ae8ae3a Compare June 4, 2021 19:53

poornas approved these changes Jun 4, 2021

View reviewed changes

journal tries the next local disk when error

734a45b

harshavardhana merged commit 810af07 into minio:master Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xl: Avoid multi-disks node to exit when one disk fails #12423

xl: Avoid multi-disks node to exit when one disk fails #12423

vadmeste commented Jun 2, 2021 •

edited

harshavardhana left a comment

vadmeste commented Jun 2, 2021

harshavardhana commented Jun 2, 2021

vadmeste commented Jun 2, 2021

harshavardhana commented Jun 2, 2021

harshavardhana commented Jun 4, 2021

vadmeste commented Jun 4, 2021

poornas Jun 4, 2021

krisis Jun 4, 2021

krisis Jun 4, 2021

vadmeste Jun 4, 2021

poornas left a comment

vadmeste commented Jun 4, 2021

minio-trusted commented Jun 4, 2021

xl: Avoid multi-disks node to exit when one disk fails #12423

xl: Avoid multi-disks node to exit when one disk fails #12423

Conversation

vadmeste commented Jun 2, 2021 • edited

Description

Motivation and Context

How to test this PR?

Types of changes

Checklist:

harshavardhana left a comment

Choose a reason for hiding this comment

vadmeste commented Jun 2, 2021

harshavardhana commented Jun 2, 2021

vadmeste commented Jun 2, 2021

harshavardhana commented Jun 2, 2021

harshavardhana commented Jun 4, 2021

vadmeste commented Jun 4, 2021

poornas Jun 4, 2021

Choose a reason for hiding this comment

krisis Jun 4, 2021

Choose a reason for hiding this comment

krisis Jun 4, 2021

Choose a reason for hiding this comment

vadmeste Jun 4, 2021

Choose a reason for hiding this comment

poornas left a comment

Choose a reason for hiding this comment

vadmeste commented Jun 4, 2021

minio-trusted commented Jun 4, 2021

Mint Automation

12423-734a45b/mint-compress-encrypt-dist-erasure.sh.log:

vadmeste commented Jun 2, 2021 •

edited