Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task/runner: Several reliability improvements #95

Merged
merged 20 commits into from
Dec 7, 2022

Conversation

victorges
Copy link
Member

@victorges victorges commented Dec 7, 2022

This is to make several reliability improvemenets towards the 12/16 launch milestone.

Included changes:

  • Handle 429 errors from the API and backoff task execution
  • Make our panic handling logic as resilient as possible, using a different flow for sending events
  • Update go-api-client and livepeer-data for reliabiltiy improvements (better panic handling, strong consistency for assets/tasks, etc)
  • Setup dead lettering for unprocessable messages received from Rabbit. We will need to create new queues because DLX argument is not mutable, so I've also added the "clean up old queue" logic to be used during the rollout (basically deploy, move messages from old queues to new queues, then delete empty old queues manually)
  • Publish all messages with a separate timeout from the surrounding context and also with mandatory: true (although later I realized that to get full benefits we need more logic in the AMQP producer)

@victorges victorges requested a review from a team as a code owner December 7, 2022 00:33
@codecov
Copy link

codecov bot commented Dec 7, 2022

Codecov Report

Merging #95 (7f5719d) into main (f782ca2) will decrease coverage by 33.19472%.
The diff coverage is 18.01802%.

❗ Current head 7f5719d differs from pull request most recent head 507638e. Consider uploading reports for the commit 507638e to get more accurate results

Impacted file tree graph

@@                 Coverage Diff                 @@
##                main        #95          +/-   ##
===================================================
- Coverage   38.64734%   5.45262%   -33.19472%     
===================================================
  Files              3         14          +11     
  Lines            207       1889        +1682     
===================================================
+ Hits              80        103          +23     
- Misses           119       1777        +1658     
- Partials           8          9           +1     
Impacted Files Coverage Δ
clients/catalyst.go 94.82759% <ø> (ø)
task/prepare.go 0.00000% <0.00000%> (ø)
task/transcode.go 1.29310% <0.00000%> (ø)
task/upload.go 0.00000% <0.00000%> (ø)
task/runner.go 5.34759% <19.80198%> (ø)
task/export.go 0.00000% <0.00000%> (ø)
task/paths.go 0.00000% <0.00000%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c266111...507638e. Read the comment docs.

Impacted Files Coverage Δ
clients/catalyst.go 94.82759% <ø> (ø)
task/prepare.go 0.00000% <0.00000%> (ø)
task/transcode.go 1.29310% <0.00000%> (ø)
task/upload.go 0.00000% <0.00000%> (ø)
task/runner.go 5.34759% <19.80198%> (ø)
task/export.go 0.00000% <0.00000%> (ø)
task/paths.go 0.00000% <0.00000%> (ø)
... and 6 more

@victorges victorges force-pushed the vg/chore/reliability-improvements branch from 8a4a313 to d425728 Compare December 7, 2022 11:21
@victorges victorges merged commit 66ef87d into main Dec 7, 2022
@victorges victorges deleted the vg/chore/reliability-improvements branch December 7, 2022 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants