Skip to content

Conversation

fmassa
Copy link
Member

@fmassa fmassa commented Sep 4, 2019

This is PR is currently in a very raw state, but seems to work and CUDA tests pass.

Clean-up is required, but I'd requesting early feedback on the overall direction.

@fmassa fmassa requested review from ezyang and soumith September 4, 2019 15:29
filters:
branches:
only:
- master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want to actually do this, as it will prevent you from easily doing binary builds by opening PRs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I want only a subset of the builds to run, how would you recommend me doing this, with an approach similar to PyTorch should_run_job.sh ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a vacuum, what I would have expected to see is that by default there is some limited set of builds that run on master, but then there is an easy way to turn on all of the builds by editing CircleCI config, if you need it. That's what the {%- if True %-} block does today.

You can do a should_run_job.sh strategy, but if you enable too many builds by default on a PR, or too many builds which have exactly equal finish times, you'll hit GitHub rate limiting. We had exactly this problem. Another way to solve this problem is to use the parametrized jobs endpoint to trigger jobs yourself; ask @suo about it, he's been poking at it on main pytorch/pytorch repo.

stable"

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
Copy link
Contributor

@ezyang ezyang Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch CI pins docker-ce version and you should too. Whenever docker releases a new version it takes a day or so for nvidia to release updated nvidia-docker packages. Your CI will be broken in the meantime if you don't pin.

set -e

curl -o conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh conda.sh -b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to use soumith/conda-cuda image (which I'm not sure I condone, but I guess it's fine for now), you shouldn't need to reinstall conda; conda should already come on the image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point. I originally tried using a basic nvidia/cuda:10.0 image because I wanted kind of minimal dependencies, conda-cuda was huge (20GB) and I wanted to understand a bit a few more things.
But it was in the end a bit more work and I just switched to soumith/conda-cuda, which I have now found where the original Dockerfile lives, so I will switch to using it and use the conda from there as well.


/workspace/packaging/conda/switch_cuda_version.sh 10.0

conda install -y pytorch cudatoolkit=10.0 -c pytorch-nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll be at the mercy of PyTorch not wantonly breaking vision if you do it this way ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but torchvision master currently requires PyTorch nightly (see e.g., #1262, so I think this is the simplest thing we can do?

conda install -y pytest mock scipy

# torchvision dependency on av
conda install -y av -c conda-forge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, all of these dependencies should be preloaded on the Docker image, so you can just go straight to build and test immediately

@ezyang
Copy link
Contributor

ezyang commented Sep 4, 2019

Looks basically reasonable!

@fmassa fmassa mentioned this pull request Sep 5, 2019
@fmassa
Copy link
Member Author

fmassa commented Sep 9, 2019

Subsumed by #1298

@fmassa fmassa closed this Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants