[Issue 17]
The Issue is solved, but still open for improvement. Suggestions for this problem can be added to the opened issue.
The scalability works well for repository-service-tuf-api
once you can scale
horizontally, having multiple instances of the Server API sending all
requests to the Broker.
The scalability for repository-service-tuf-worker
is not functional.
The repository workers pick up the tasks randomly, but it is executed in order once we use a lock.
.. uml:: @startuml !pragma teoz true participant "Broker queue" as broker participant "repository-service-tuf-worker 1" as worker1 participant "repository-service-tuf-worker 2" as worker2 participant "repository-service-tuf-worker 3" as worker3 participant "repository-service-tuf-worker 4" as worker4 participant "repository-service-tuf-worker 5" as worker5 rnote over broker task 01 task 02 task 03 task 04 task 05 task 06 task 07 task 08 endrnote broker o-> worker1 note right #cyan: task 01 &broker o-> worker2 note right #cyan: task 02 &broker o-> worker3 note right #cyan: task 03 &broker o-> worker4 note right #cyan: task 04 &broker o-> worker5 note right #cyan: task 05 worker1 --> worker1: run <back:cyan>task 01</back> & worker1 -> broker: finish <back:cyan>task 01</back> broker o-> worker1 note right #cyan: task 06 worker3 --> worker3: run <back:cyan>task 02</back> & worker3 -> broker: finish <back:cyan>task 02</back> broker o-> worker3 note right #cyan: task 07 worker2 --> worker2: run <back:cyan>task 03</back> & worker2 -> broker: finish <back:cyan>task 03</back> broker o-> worker2 note right #cyan: task 08 worker4 --> worker4: run <back:cyan>task 05</back> & worker4 -> broker: finish <back:cyan>task 05</back> worker5 --> worker5: run <back:cyan>task 04</back> & worker5 -> broker: finish <back:cyan>task 04</back> worker1 --> worker1: run <back:cyan>task 06</back> & worker1 -> broker: finish <back:cyan>task 06</back> worker3 --> worker3: run <back:cyan>task 07</back> & worker3 -> broker: finish <back:cyan>task 07</back> worker2 --> worker2: run <back:cyan>task 08</back> & worker2 -> broker: finish <back:cyan>task 08</back> @enduml
The problem is the process of writing the role metadata files.
For example, whenever you add an artifact to a delegated hash role (i.e.
bins-e
), you need to write a new <version>.bins-e.json
, bump the
<version>.snapshot.json
and the <version>.timestamp
.
.. uml:: @startuml participant "Broker/Backend" as broker participant "add-target" as add-artifact participant "Storage Backend" as storage #Grey broker o-> add-artifact: [task 01] <consumer> add-artifact -> storage: loads latest bin-e.json add-artifact <-- storage: 3.bin-e.json add-artifact -> add-artifact: Add Artifact\nBump version add-artifact -> storage: writes 4.bin-e.json note right: 4.bin-e.json\n\tfile001 add-artifact -> storage: loads latest Snapshot add-artifact <-- storage: 41.snapshot.json add-artifact -> add-artifact: Add <bin-e> meta\nbump version add-artifact -> storage: writes 42.snapshot.json note right: 4.bin-e.json\n\tfile001\n42.snapshot.json\n\t4.bin-e add-artifact -> storage: loads Timestamp add-artifact <-- storage: Timestamp.json (version 83) add-artifact -> add-artifact: Add 42.snapshot.json add-artifact -> storage: writes timestamp.json note right: 4.bin-e.json\n\t file001\n42.snapshot.json\n\t4.bin-e\ntimestamp.json\n\t42.snapshot.json add-artifact -> broker: [task 01] <publish> result @enduml
If you have a hundred or thousand requests to add artifacts you might have
multiple new <version>.bins-e.json
followed by bumps in snapshot
and
timestamp
. There is a risk of race conditions.
Exemple
.. uml:: @startuml participant "Broker/Backend" as broker participant "add-target" as add-artifact participant "Storage Backend" as storage #Grey broker o-[#Blue]> add-artifact: [task 01] <consuner> add-artifact -[#Blue]> storage: loads latest bin-e.json broker o-[#Green]> add-artifact: [task 02] <consuner> add-artifact -[#Green]> storage: loads latest bin-p.json add-artifact <[#Blue]-- storage: 3.bin-e.json add-artifact <[#Green]-- storage: 16.bin-p.json add-artifact -[#Blue]-> add-artifact: 3.bin-e.json\n Add artifact\nBump version to 4 add-artifact -[#Green]> add-artifact: 16.bin-e.json\n Add artifact\nBump version to 16 add-artifact -[#Blue]> storage: writes 4.bin-e.json add-artifact -[#Green]> storage: writes 16.bin-e.json note right: 4.bin-e.json\n\tfile001\n16.bin-p.json\n\tfile003\n\tfile005 add-artifact -[#Blue]> storage: loads latest Snapshot add-artifact -[#Green]> storage: loads latest Snapshot add-artifact <[#Blue]-- storage: 41.snapshot.json add-artifact <[#Green]-- storage: 41.snapshot.json add-artifact -[#Blue]> add-artifact: Add <bin-e> meta\nbump version add-artifact -[#Green]> add-artifact: Add <bin-p> meta\nbump version add-artifact -[#Blue]> storage: writes 42.snapshot.json note right: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t4.bin-e add-artifact -[#Green]-> storage: writes 42.snapshot.json destroy storage note right#FFAAAA: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t16.bin-p \ \n\t**missing 4.bin-e** add-artifact -[#Blue]> storage: loads Timestamp add-artifact -[#Green]> storage: loads Timestamp add-artifact <[#Blue]-- storage: Timestamp.json (version 83) add-artifact -[#Blue]> add-artifact: Add 42.snapshot.json add-artifact -[#Blue]> storage: writes timestamp.json (version 84) note right#FFAAAA: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t16.bin-p \ \n\t**missing 4.bin-e** \ \ntimestamp.json \ \n\tversion 84 \ \n\t42.snapshot add-artifact -[#Blue]> broker: [task 01] <publish> result add-artifact <[#Green]-- storage: Timestamp.json (version 84) add-artifact -[#Green]> add-artifact: Add 42.snapshot.json add-artifact -[#Green]> add-artifact: Add artifact\nBump version to 85 add-artifact -[#Green]> storage: writes timestamp.json (version 85) note right#FFAAAA: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t16.bin-p \ \n\t**missing 4.bin-e** \ \ntimestamp.json \ \n\tversion 84 \ \n\t42.snapshot add-artifact -[#Green]> broker: [task 02] <publish> result @enduml
On one level, we optimize it by grouping all changes for the same delegated hash role , avoiding multiple interactions in the same task.
However we still have a problem with the snapshot and timestamp
.
To avoid the problem, we use a lock system with one task per time.
The lock protects against the race condition but does not solve the
scalability. Even having dozen repository-service-tuf-worker
do not scale the
writing metadata process.
.. uml:: @startuml !pragma teoz true participant "Broker/Backend" as broker participant "add-target" as add-artifact participant "Storage Backend" as storage #Grey broker o-[#Blue]> add-artifact: [task 01] <consuner> note left #Red: Lock add-artifact -[#Blue]> add-artifact: check lock broker o-[#Green]> add-artifact: [task 02] <consuner> add-artifact -[#Green]> add-artifact: check lock note left #Orange: Waiting unlock group "task 01" execution add-artifact -[#Blue]> storage: loads latest bin-e.json add-artifact <[#Blue]-- storage: 3.bin-e.json add-artifact -[#Blue]-> add-artifact: 3.bin-e.json\n Add artifact\nBump version to 4 add-artifact -[#Blue]> storage: writes 4.bin-e.json add-artifact -[#Blue]> storage: loads latest Snapshot add-artifact <[#Blue]-- storage: 41.snapshot.json add-artifact -[#Blue]> add-artifact: Add <bin-e> meta\nbump version add-artifact -[#Blue]> storage: writes 42.snapshot.json add-artifact -[#Blue]> storage: loads Timestamp add-artifact <[#Blue]-- storage: Timestamp.json (version 83) add-artifact -[#Blue]> add-artifact: Add 42.snapshot.json add-artifact -[#Blue]> storage: writes timestamp.json (version 84) note right: 4.bin-e.json\n\tfile001 \ \n42.snapshot.json\n\t4.bin-e \ \ntimestamp.json (version: 84) \ \n\t42.snapshot {finish_task01} add-artifact -[#Blue]> broker: [task 01] <publish> result note left #Cyan: Unlock end add-artifact -[#Green]> broker: [task 02] Lock note left #Red: Lock group "task 02" execution add-artifact <[#Green]-- storage: 16.bin-p.json add-artifact -[#Green]> add-artifact: 16.bin-e.json\n Add artifact\nBump version to 16 add-artifact -[#Green]> storage: writes 16.bin-e.json add-artifact -[#Green]> storage: loads latest Snapshot add-artifact <[#Green]-- storage: 42.snapshot.json add-artifact -[#Green]> add-artifact: Add <bin-p> meta\nbump version add-artifact -[#Green]> storage: loads Timestamp add-artifact <[#Green]-- storage: Timestamp.json (version 84) add-artifact -[#Green]> add-artifact: Add 43.snapshot.json add-artifact -[#Green]> add-artifact: Add artifact\nBump version to 85 add-artifact -[#Green]> storage: writes timestamp.json (version 85) note right: 16.bin-p.json\n\tfile003\n\tfile005 \ \n43.snapshot.json\n\t4.bin-e \n\t16.bin-p \ \ntimestamp.json (version 85) \ \n\t43.snapshot add-artifact -[#Green]> broker: [task 02] <publish> result note left #Cyan: Unlock end @enduml
.. uml:: @startuml !pragma teoz true participant "Broker queue" as broker participant "repository-service-tuf-worker 1" as worker1 participant "repository-service-tuf-worker 2" as worker2 participant "repository-service-tuf-worker 3" as worker3 participant "repository-service-tuf-worker 4" as worker4 participant "repository-service-tuf-worker 5" as worker5 rnote over broker task 01 task 02 task 03 task 04 task 05 task 06 task 07 task 08 endrnote broker o-> worker1 note right #cyan: task 01 &broker o-> worker2 note right #cyan: task 02\ttask06 &broker o-> worker3 note right #cyan: task 03\ttask04 &broker o-> worker4 note right #cyan: task 05\ttask07 &broker o-> worker5 note right #cyan: task08 worker1 --> worker1: run <back:cyan>task 01</back> & worker1 -> broker: finish <back:cyan>task 01</back> worker2 --> worker2: run <back:cyan>task 02\ttask 06</back> & worker2 --> worker2: run <back:cyan>task 06</back> & worker2 -> broker: finish <back:cyan>task 02\ttask 06</back> worker4 --> worker4: run <back:cyan>task 05</back> & worker4 -> broker: finish <back:cyan>task 05</back> &worker5 --> worker5: run <back:cyan>task 05</back> & worker5 -> broker: finish <back:cyan>\t\ttask 08</back> worker3 --> worker3: run <back:cyan>task 03</back> & worker3 -> broker: finish <back:cyan>task 03</back> worker3 --> worker3: run <back:cyan>task 04</back> & worker3 -> broker: finish <back:cyan>task 04</back> worker4 --> worker4: run <back:cyan>task 07</back> & worker4 -> broker: finish <back:cyan>task 07</back> @enduml
Suggestions for this problem can be added to the opened issue.