Skip to content

Add retry logic to create_run#35

Merged
j316chuck merged 7 commits intomainfrom
chuck/add_retry_to_create_run
Nov 14, 2024
Merged

Add retry logic to create_run#35
j316chuck merged 7 commits intomainfrom
chuck/add_retry_to_create_run

Conversation

@j316chuck
Copy link
Contributor

@j316chuck j316chuck commented Nov 13, 2024

Description

Add retry logic to create_run

Issues Fixed

Tests failed here because create_run timed out
https://github.com/mosaicml/composer/actions/runs/11452297889/job/31901019192#step:3:348

Adding a retry here should help

# Description
Add retry logic to create_run

# Issues Fixed
Tests failed here because create_run timed out 
https://github.com/mosaicml/composer/actions/runs/11452297889/job/31901019192#step:3:348

Adding a retry should help
@j316chuck j316chuck requested review from b-chu and snarayan21 and removed request for snarayan21 November 13, 2024 17:32
Copy link

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just increase timeout? i don't mind either way, up to @b-chu

@b-chu
Copy link
Contributor

b-chu commented Nov 13, 2024

I think it's better to increase timeout in create_run and no have retry logic. Should be ok to overload the run's max_duration and use that timeout for create_run as well

@j316chuck
Copy link
Contributor Author

j316chuck commented Nov 13, 2024

bumped timeout to 30 seconds. for corgi the retry def helped a lot for stability so i think we should keep wdyt? this new logic will max increase latency by 2 minutes which is an appropriate tradeoff?

@j316chuck j316chuck requested a review from snarayan21 November 13, 2024 19:17
@b-chu
Copy link
Contributor

b-chu commented Nov 13, 2024

Let's just do timeout for now and if run creation keeps timing out, we can add backoff later

@j316chuck
Copy link
Contributor Author

sounds good fixed

Copy link
Contributor

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untested but Chuck said he'd fix if there's any issues

@j316chuck j316chuck merged commit 465f4c1 into main Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants