Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized Dockerfiles #66

Merged
merged 38 commits into from
Nov 11, 2021
Merged

Parameterized Dockerfiles #66

merged 38 commits into from
Nov 11, 2021

Conversation

ravi-mosaicml
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml commented Nov 5, 2021

  1. Added a Dockerfile to build a pytorch image.
  2. Parameterized the Dockerfile for composer, which extends the Pytorch image
  3. Updated the Makefile to use the new parameterized Dockerfiles

1. Added timeout to the hparams for initialize_process_group. The default of 30 minutes was too long for failing tests, which prevents one from getting any meaningful log.
2. In cleanup(), using `os.killpg` to terminate DDP subprocesses instead of just kill().
3.  In cleanup(), attempting a `sigterm` before resorting to a `sigkill()` 5 seconds later. Graceful cleanup is preferred!
Fixed escape sequence
Tests were failing when composer was not installed with `-e`, causing the yamls to be not found and included
@ravi-mosaicml ravi-mosaicml changed the title WIP Ravi/docker matrix Docker Build Matrix Nov 6, 2021
@ravi-mosaicml ravi-mosaicml changed the base branch from dev to ravi/log_debug_in_tests November 6, 2021 00:45
@ravi-mosaicml ravi-mosaicml changed the base branch from ravi/log_debug_in_tests to ravi/ddp_enhancements November 6, 2021 00:46
Base automatically changed from ravi/ddp_enhancements to dev November 8, 2021 23:28
Copy link
Member

@bandish-shah bandish-shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few changes to address

docker/README.md Outdated Show resolved Hide resolved
docker/pytorch/Dockerfile Outdated Show resolved Hide resolved
docker/Makefile Show resolved Hide resolved
docker/pytorch/Dockerfile Outdated Show resolved Hide resolved
1. Documented Makefile
2. Removed build matrix (for now) from the docke readme
3. Fixed pillow-simd
4. Renamed user to mosaicml
5. Updated pytorch minimum version to 1.9
@ravi-mosaicml ravi-mosaicml changed the base branch from dev to ravi/support_py37 November 9, 2021 17:31
Copy link
Member

@bandish-shah bandish-shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for addressing the feedback!

@ravi-mosaicml ravi-mosaicml changed the title Docker Build Matrix Parameterized Dockerfiles Nov 10, 2021
Base automatically changed from ravi/support_py37 to dev November 11, 2021 16:45
@ravi-mosaicml ravi-mosaicml merged commit 4488c01 into dev Nov 11, 2021
@ravi-mosaicml ravi-mosaicml deleted the ravi/docker_matrix branch November 11, 2021 16:52
coryMosaicML pushed a commit to coryMosaicML/composer that referenced this pull request Feb 23, 2022
Added a Dockerfile to build a pytorch image.
Parameterized the Dockerfile for composer, which extends the Pytorch image
Updated the Makefile to use the new parameterized Dockerfiles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants