Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: quorum calculation mistake with reduced parity #10186

Merged
merged 1 commit into from Aug 3, 2020

Conversation

harshavardhana
Copy link
Member

Description

fix: quorum calculation mistake with reduced parity

Motivation and Context

With reduced parity, our write quorum should be the same
as read quorum, but code was still assuming

readQuorum+1

In all situations which is not necessary.

How to test this PR?

version: '3.7'

# starts 4 docker containers running minio server instances. Each
# minio server's web interface will be accessible on the host at port
# 9001 through 9004.
services:
  minio1:
    image: y4m4/minio:dev
    volumes:
      - data1-1:/data1
      - data1-2:/data2
    ports:
      - "9001:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio2:
    image: y4m4/minio:dev
    volumes:
      - data2-1:/data1
      - data2-2:/data2
    ports:
      - "9002:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio3:
    image: y4m4/minio:dev
    volumes:
      - data3-1:/data1
      - data3-2:/data2
    ports:
      - "9003:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio4:
    image: y4m4/minio:dev
    volumes:
      - data4-1:/data1
      - data4-2:/data2
    ports:
      - "9004:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio5:
    image: y4m4/minio:dev
    volumes:
      - data5-1:/data1
      - data5-2:/data2
    ports:
      - "9005:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio6:
    image: y4m4/minio:dev
    volumes:
      - data6-1:/data1
      - data6-2:/data2
    ports:
      - "9006:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio7:
    image: y4m4/minio:dev
    volumes:
      - data7-1:/data1
      - data7-2:/data2
    ports:
      - "9007:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio8:
    image: y4m4/minio:dev
    volumes:
      - data8-1:/data1
      - data8-2:/data2
    ports:
      - "9008:9000"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_STORAGE_CLASS_STANDARD: EC:4
    command: server http://minio{1...8}/data{1...2}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

## By default this config uses default local driver,
## For custom volumes replace with volume driver configuration.
volumes:
  data1-1:
  data1-2:
  data2-1:
  data2-2:
  data3-1:
  data3-2:
  data4-1:
  data4-2:
  data5-1:
  data5-2:
  data6-1:
  data6-2:
  data7-1:
  data7-2:
  data8-1:
  data8-2:

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Documentation needed
  • Unit tests needed

Copy link
Contributor

@kannappanr kannappanr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

With reduced parity our write quorum should be same
as read quorum, but code was still assuming

```
readQuorum+1
```

In all situations which is not necessary.
@minio-trusted
Copy link
Contributor

Mint Automation

Test Result
mint-xl.sh ✔️
mint-large-bucket.sh ✔️
mint-fs.sh ✔️
mint-dist-xl.sh ✔️
mint-gateway-s3.sh ✔️
mint-gateway-azure.sh ✔️
mint-gateway-nas.sh ✔️
mint-zoned.sh more...

10186-9ef829a/mint-zoned.sh.log:

Running with
SERVER_ENDPOINT:      minio-dev8.minio.io:32127
ACCESS_KEY:           minio
SECRET_KEY:           ***REDACTED***
ENABLE_HTTPS:         0
SERVER_REGION:        us-east-1
MINT_DATA_DIR:        /mint/data
MINT_MODE:            full
ENABLE_VIRTUAL_STYLE: 0

To get logs, run 'docker cp 275db7866948:/mint/log /tmp/mint-logs'

(1/15) Running aws-sdk-go tests ... done in 2 seconds
(2/15) Running aws-sdk-java tests ... done in 2 seconds
(3/15) Running aws-sdk-php tests ... done in 44 seconds
(4/15) Running aws-sdk-ruby tests ... done in 5 seconds
(5/15) Running awscli tests ... done in 2 minutes and 14 seconds
(6/15) Running healthcheck tests ... done in 0 seconds
(7/15) Running mc tests ... done in 1 minutes and 1 seconds
(8/15) Running minio-dotnet tests ... done in 46 seconds
(9/15) Running minio-go tests ... done in 1 minutes and 23 seconds
(10/15) Running minio-java tests ... done in 1 minutes and 16 seconds
(11/15) Running minio-js tests ... done in 52 seconds
(12/15) Running minio-py tests ... done in 2 minutes and 32 seconds
(13/15) Running s3cmd tests ... FAILED in 22 seconds
{
  "name": "s3cmd",
  "duration": "11813",
  "function": "test_sync_list_objects",
  "status": "FAIL",
  "error": "WARNING: Bucket is not empty. Removing all the objects from it first. This may take some time...\nERROR: S3 error: 404 (Not Found)"
}
(13/15) Running s3select tests ... done in 7 seconds
(14/15) Running security tests ... done in 0 seconds

Executed 14 out of 15 tests successfully.

Deleting image on docker hub
Deleting image locally

harshavardhana added a commit that referenced this pull request Aug 7, 2020
fix: quorum calculation mistake with reduced parity (#10186)

With reduced parity our write quorum should be same
as read quorum, but code was still assuming

```
readQuorum+1
```

In all situations which is not necessary.

---
fix: Pass context all the way down to the network call in lockers (#10161)

Context timeout might race on each other when timeouts are lower
i.e when two lock attempts happened very quickly on the same resource
and the servers were yet trying to establish quorum.

This situation can lead to locks held which wouldn't be unlocked
and subsequent lock attempts would fail.

This would require a complete server restart. A potential of this
issue happening is when server is booting up and we are trying
to hold a 'transaction.lock' in quick bursts of timeout.

---
allow server to start even with corrupted/faulty disks (#10175)

---
allow listing across drives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants