Skip to content

Conversation

mattip
Copy link
Contributor

@mattip mattip commented May 20, 2020

closes gh-37584. I think I need to do more to generate an image, but the .circleci/README.md is vague in the details. The first commit reflows and updates that document a bit, I will continue to update it as the PR progresses :) Dropped updating .circleci/README.md, will do that in a separate PR once this is merged.

@dr-ci
Copy link

dr-ci bot commented May 20, 2020

💊 CI failures summary and remediations

As of commit e7a6aa3 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 48 times.

@mattip
Copy link
Contributor Author

mattip commented May 21, 2020

@malfet I assume at some point I will need to update the cimodel with the information that conda is a valid compiler specifier?

@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 21, 2020
@mattip mattip force-pushed the conda-build branch 2 times, most recently from 8f761f7 to 37ce0fe Compare May 28, 2020 07:58
@mattip
Copy link
Contributor Author

mattip commented May 28, 2020

Rebased off master to clear the merge conflicts. @malfet I still don't understand how to get the proper hash to put into the ignore stanza, and it seems the actual docker image build is not being triggered.

@mattip
Copy link
Contributor Author

mattip commented Jun 11, 2020

ping @malfet

@mattip
Copy link
Contributor Author

mattip commented Jun 15, 2020

@seemethere is there a way to move this forward? I am stuck trying to generate a new Docker image for the CI run

@mattip
Copy link
Contributor Author

mattip commented Jul 1, 2020

This PR is still stuck waiting for instructions on how to generate a new CI docker image.

@mattip
Copy link
Contributor Author

mattip commented Jul 16, 2020

@ezyang is there something I can do to move this forward?

@ezyang
Copy link
Contributor

ezyang commented Jul 17, 2020

Hi @mattip, a lot of us are busy right now because it is performance cycle at FB.

I don't have too much context on this diff, but if I understand correctly, on master, it's no longer necessary to manually delete the trigger strings, as this diff from @seemethere has landed: #40194. This means that the instructions at https://github.com/pytorch/pytorch/wiki/Docker-image-build-on-CircleCI are out of date.

All you have to do now is submit a PR with the new Docker changes, and CI will automatically build the new images and test against them, and if all is green and this PR is approved I can land it.

@mattip
Copy link
Contributor Author

mattip commented Jul 19, 2020

All you have to do now is submit a PR with the new Docker changes, and CI will automatically build the new images and test against them, and if all is green and this PR is approved I can land it.

I tried to add a pytorch-linux-bionic-py3.7-conda image, but I do not see them among the circleCI runs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang
Copy link
Contributor

ezyang commented Jul 19, 2020

I read over the CircleCI yaml, and it looks like I was mistaken and it's still necessary to delete the triggers, sorry :( I'm going to try pushing the necessary update to trigger the build. Also, it looks like you don't have the ECR repo setup so I went ahead and did that for you.

For future reference, try to avoid force pushing to the branch, it destroys history on GitHub which means I can't look at old builds and see if you previously had things setup correctly.

@ezyang
Copy link
Contributor

ezyang commented Jul 19, 2020

@mattip
Copy link
Contributor Author

mattip commented Jul 19, 2020

try to avoid force pushing to the branch, it destroys history on GitHub

Sorry. So the preferred workflow is to merge master into the branch?

@ezyang
Copy link
Contributor

ezyang commented Jul 19, 2020

Yes, just merge in master. When we land the diff we'll squash all history so the merges won't matter in the long run.

ezyang and others added 4 commits July 19, 2020 17:12
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@mattip
Copy link
Contributor Author

mattip commented Jul 20, 2020

The image has been built and uploaded, so I added it to the data model in 14b8f86

@mattip
Copy link
Contributor Author

mattip commented Jul 20, 2020

It seems to work, the test failures are the same as the other builds.

@ezyang
Copy link
Contributor

ezyang commented Jul 20, 2020

@mattip
Copy link
Contributor Author

mattip commented Jul 20, 2020

I updated the tag from c966facd-bcb3-4b37-ba7d-f9b995ad78d9 to 8bdba785b1eac4d297d5f5930f979518012a56e0 which is what appears on http://docker.pytorch.org/pytorch.html. I hope that is correct, the format is different.

@mattip
Copy link
Contributor Author

mattip commented Jul 20, 2020

The lint quickcheck of .circleci/config.yml failed, but I fail to see why. There is no change if I call the rengenerate.sh script locally.

@ezyang
Copy link
Contributor

ezyang commented Jul 20, 2020

8bdba785b1eac4d297d5f5930f979518012a56e0 looks wrong, there is no pytorch-linux-bionic-py3.7-conda build. You want d368bea1-af47-48fd-a3b7-c41d1e7b4f1a

ezyang added 2 commits July 20, 2020 19:17
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@mattip
Copy link
Contributor Author

mattip commented Jul 21, 2020

The pytorch_linux_xenial_py3_clang5_android_ndk_r19c CI runs are failing to find an image with the hash d368bea1-af47-48fd-a3b7-c41d1e7b4f1a. All the other images are finding that hash. When I look at the docker build CI run, I see that the other images succeeded but that one is still marked as failed. I have re-triggered it via the "rerun job with ssh" button, but then the job runs successfully claiming that the image hash 8bdba785b1eac4d297d5f5930f979518012a56e0 already exits. And indeed, if I go to the docker image listing, I see an image for pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c with that hash.

When I look at some of the other successful docker builds, the "Check if image should be built" step also looks for 8bdba7.... On the docker image listing, I see both hashes: the 8bdba... one and the d368bea1-af47... one. What am I missing?

@ezyang
Copy link
Contributor

ezyang commented Jul 21, 2020

I have re-triggered it via the "rerun job with ssh" button, but then the job runs successfully claiming that the image hash 8bdba785b1eac4d297d5f5930f979518012a56e0 already exits. And indeed, if I go to the docker image listing, I see an image for pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c with that hash.

There's a bug; the docker jobs are checking for the wrong hash. The bug was exacerbated by the fact that streamlined docker build workflow was only partially landed (and part of it was reverted.) I think we will need to work around the bug for now, probably by deleting the "should build image" test. cc @seemethere I don't have time to fix this today due to PSC but will get back on it tomorrow.

@ezyang
Copy link
Contributor

ezyang commented Jul 22, 2020

OK, turns out I was wrong.

8bdba785b1eac4d297d5f5930f979518012a56e0 looks wrong, there is no pytorch-linux-bionic-py3.7-conda build. You want d368bea1-af47-48fd-a3b7-c41d1e7b4f1a

This is wrong: the image does exist, as per:

(/scratch/ezyang/pytorch-scratch2-env) ezyang@devfair040:/scratch/ezyang$ docker run -it 308535385114.dkr.e
cr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-conda:8bdba785b1eac4d297d5f5930f979518012a56e
0
Unable to find image '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-conda
:8bdba785b1eac4d297d5f5930f979518012a56e0' locally
8bdba785b1eac4d297d5f5930f979518012a56e0: Pulling from pytorch/pytorch-linux-bionic-py3.7-conda

It isn't showing up in the index, probably because the index hasn't picked up the new ECR repo. So actually you were all fine.

The real reason why quickcheck lint was failing, was because there was an update with master, and the quickcheck is computed after merging with master (kind of icky, but you would have indeed broken quickcheck if you had merged as is. Would have been nice if it mentioned this though.)

I'm going to revert back to your diffs and try to land.

@mattip
Copy link
Contributor Author

mattip commented Jul 22, 2020

Thanks. I still think think it is fragile to tag the images with the CIRCLE_WORKFLOW_ID, but hopefully that is orthogonal to this PR.

@ezyang
Copy link
Contributor

ezyang commented Jul 22, 2020

I think in the half-constructed state we are in today, you basically can't use workflow ids, for the reasons you posted above.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 9c7ca89.

@jakirkham
Copy link

Woot! 🎉 Thanks Matti! 😄

seemethere added a commit that referenced this pull request Aug 3, 2020
This reverts commit 9c7ca89.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
facebook-github-bot pushed a commit that referenced this pull request Aug 3, 2020
Summary:
This reverts commit 9c7ca89.

Pull Request resolved: #42472

Reviewed By: ezyang, agolynski

Differential Revision: D22903382

Pulled By: seemethere

fbshipit-source-id: e2b01537bcdf6c50d967329833cb6450a75b8247
@rgommers
Copy link
Collaborator

This PR was reverted, with reason not documented and no further follow-up as far as I can tell. And right now there's again no CI job with conda compilers, and the build seems to be broken for at least some people (gh-47717).

@seemethere do you remember what happened here? Can we re-introduce a conda compiler CI job?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a CI job using conda compilers

9 participants