Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

[torchelastic][circleci] Fix etcd download path #1

Closed
wants to merge 1 commit into from

Conversation

kiukchung
Copy link
Contributor

No description provided.

@kiukchung
Copy link
Contributor Author

test PR, closing.

@kiukchung kiukchung closed this Nov 22, 2019
@kiukchung kiukchung deleted the kiuk-dev branch November 26, 2019 06:30
facebook-github-bot pushed a commit that referenced this pull request Dec 5, 2019
…nt, fix bug in petctl setp where None was being passed to cfn param, pump docker logs to cloudwatch

Summary:
1. Uses docker log-driver == awslogs to make docker output go to cloud watch (see screenshots below)
2. #1 creates a log group called `torchelastic/$USER` in CW and creates log streams (one per worker) called `$job_name/$instance_id`
3. Fixes a bug in `petctl setup` where if no efs and s3 buckets are specified the `NoneType` is passed to the cfn template param which throws a validation error because it expects a string
4. Fixes an issue with cfn template where the CloudWatch IAM managed policy was being created with a specific name hence preventing multiple stacks from being created in the same account.

#thanks Vinicius Reis for testing `petctl` and reporting bugs #3 and #4.

{F223965947}
{F223965943}

Reviewed By: vreis

Differential Revision: D18826855

fbshipit-source-id: 2d75f607734135ab6d5301fc636501a38cfee9d9
facebook-github-bot pushed a commit that referenced this pull request Mar 18, 2020
Summary:
MVP first cut:
1. fault tolerance for agents
2. elasticity for agents
3. unittests for #1 and #2
4. defines APIs for key data objects and agent

All unittests passing.

Differential Revision: D20488549

fbshipit-source-id: 3965a9f5827c5fe4faa4c3ee6aa7533130605c04
fotstrt pushed a commit to eth-easl/elastic that referenced this pull request Feb 17, 2022
Summary:
MVP first cut:
1. fault tolerance for agents
2. elasticity for agents
3. unittests for pytorch#1 and pytorch#2
4. defines APIs for key data objects and agent

All unittests passing.

Differential Revision: D20488549

fbshipit-source-id: 3965a9f5827c5fe4faa4c3ee6aa7533130605c04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant