Skip to content

Commit

Permalink
docs/testplan: add project cowbell testplan (#6001)
Browse files Browse the repository at this point in the history
  • Loading branch information
ihaid committed Jul 10, 2023
1 parent 97a89c3 commit 05f3074
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions docs/testplan/project-cowbell-testplan.md
@@ -0,0 +1,25 @@
# Mini Cowbell Testplan

 

## Background
We want to deploy the entire Storj stack on environments that have kubernetes running on 5 NUCs.

 

## Pre-condition
Configuration for satellites that only have 5 node and the recommended RS scheme is [2,3,4,4] where:
- 2 is the number of required pieces to reconstitute the segment.
- 3 is the repair threshold, i.e. if a segment remains with only 3 healthy pieces, it will be repaired.
- 4 is the success threshold, i.e. the number of pieces required for a successful upload or repair.
- 4 is the number of total erasure-coded pieces that will be generated.


| Test Scenario | Test Case | Description | Comments |
|---------------|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Upload | Upload with all nodes online | Every file is uploaded to 4 nodes with 2x expansion factor. So one node has no files. | Happy path scenario |
| | Upload with one node offline | If one of five nodes fails and goes offline, 80% of the stored data will lose one erasure-coded piece. The health status of these segments will be reduced from 4 pieces to 3 pieces and will mark these segments for repair. overlay.node.online-window: 4h0m0s -> for about 4 hours the node will still be selected for uploads) | Uploads will continue uninterrupted if the client uses the new refactored upload path. This improved upload logic will request the satellite for a new node if the satellite selects the offline node for the upload, unaware it is already offline. If the client uses the old upload logic, uploads may fail if the satellite selects the offline node (20% chance). When the satellite detects the offline node, all uploads will be successful. |
| Download | Download with one node offline | If one of five nodes fails and goes offline, 80% of the stored data will lose one erasure-coded piece. The health status of these segments will be reduced from 4 pieces to 3 pieces and will mark these segments for repair. overlay.node.online-window: 4h0m0s -> for about 4 hours the node will still be selected for downloads) | |
| Repair | Repair with 2 nodes disqualified | Disqualify 2 nodes so the repair download are still possible but there is no node available for an upload, shouldn't consume download bandwidth and error out early. Only spend download bandwidth when there is at least one node available for an upload | If two nodes go offline, there are remaining pieces in the worst case, which cannot be repaired and is a de facto data loss if the offline nodes are damaged. |
| Audit | | Audits can't identify corrupted pieces with just the minimum number of pieces. Reputation should not increase. Audits should be able to identify corrupted pieces with minumum + 1 pieces. Reputation should decrease. | |
| Upgrades | Nodes restart for upgrades | No more than a single node goes offline for maintenance. Otherwise, normal operation of the network cannot be ensured. | Occasionally, nodes may need to restart due to software updates. This brings the node offline for some period of time |

0 comments on commit 05f3074

Please sign in to comment.