Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH Action for docker builds #1724

Merged
merged 16 commits into from
Mar 2, 2021

Conversation

trsvchn
Copy link
Collaborator

@trsvchn trsvchn commented Mar 1, 2021

Related to #1644 and #1721

Description:

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

I think I finally got it how to bind particular step with certain dir update.

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

Nope)

@trsvchn trsvchn force-pushed the issue-1644-improve-docker-build branch 4 times, most recently from e052f9f to 6f69470 Compare March 1, 2021 19:53
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 1, 2021

Thanks for testing that, @trsvchn !
I also wonder if we couldn't solve permission issue of trigger build job using pull request target trick ?

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

@vfdev-5 probably, but we can verify it only after the merge, like we did for code-style

@trsvchn trsvchn force-pushed the issue-1644-improve-docker-build branch 2 times, most recently from 660067c to c5558f8 Compare March 1, 2021 20:19
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 1, 2021

@vfdev-5 probably, but we can verify it only after the merge, like we did for code-style

Sounds good! Let's do it like for code-style. Just to make sure, the permission issue there is related to probably skipped CIRCLE_TOKEN as it is on fork.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 1, 2021

I see that you are trying to find out if trigger next stops or not depending on modified. I did something like that here : https://github.com/pytorch/ignite/blob/master/.circleci/trigger_if_modified.sh . May help...

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

@vfdev-5 probably, but we can verify it only after the merge, like we did for code-style

Sounds good! Let's do it like for code-style. Just to make sure, the permission issue there is related to probably skipped CIRCLE_TOKEN as it is on fork.

it's hard to say, we need to try. I don't know how user defined tokens interact with pr_target

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

I see that you are trying to find out if trigger next stops or not depending on modified. I did something like that here : https://github.com/pytorch/ignite/blob/master/.circleci/trigger_if_modified.sh . May help...

oh, thanks, I will check it!

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

@vfdev-5
Yeah I think the main part here is

FILES_MODIFIED=$(git diff --name-only origin/${base_branch}..HEAD | grep -i -E ${pattern})

I've already tried:

git diff-tree --no-commit-id --name-only -r HEAD

and

git diff --name-only HEAD^ HEAD

But it didn't help

I think it's because of checkout action and detached head checkout by default

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

going to try prepared solutions)
https://github.com/futuratrepadeira/changed-files

@trsvchn trsvchn force-pushed the issue-1644-improve-docker-build branch from ae3763a to 69f52e4 Compare March 1, 2021 21:03
@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

going to update some docker

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

Oh, yeah it works!

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

Interesting fact: changed-files action works only with pull_request events and pull_request_target

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 1, 2021

Great! Currently, the problem will be with GA that wont be able to build our large HVD images...
I think we can reuse that with more finegrained trigger circle ci ...

Interesting fact: changed-files action works only with PRs events
Yeah, that's what they say "GitHub action to export a PR's changed files"

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 1, 2021

@vfdev-5 so we are going to try to trigger Circle CI through pull_request_target events, right?

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @trsvchn !

@vfdev-5 vfdev-5 merged commit f18ddb3 into pytorch:master Mar 2, 2021
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

@trsvchn oh, I just realized that we are not doing correctly here now:
we have to build in sequence the images:

  • base -> [vision, nlp]
  • hvd-base -> [hvd-vision, hvd-nlp]
  • hvd-apex -> [hvd-apex-vision, hvd-apex-nlp]
  • ...

Otherwise, buildling for example nlp, it fetchs another base image...

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

@vfdev-5 Oh, because of multi-stage builds, I have never tried it btw, how does it work?

so nlp and vision need base, right? they should have access to that built image, right?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

Well, actually, it is not because of mutli-stage but just, for example, nlp image uses FROM base:latest.
We can simply try to squash parallel builds of [base, vision, nlp] and related into a sequential and run

  • base
  • hvd-base
  • hvd-apex
  • msdp-apex

in parallel

they should have access to that built image, right?

Yes, they should have access to that build image.

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

Or, we can try smth like:

jobs:
  build-base:
    needs: setup
    # build base
    # upload image as artifact

  build-vision:
    needs: build-base
    # download image as artifact
    # build vision
...

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

Because I'm not sure about space

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

Instead of uploading temp images, we can try if steps could reuse the same docker env between...

Yes, space question can be blocking ...

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

https://github.community/t/whats-the-recommended-way-to-pass-a-docker-image-to-the-next-job-in-a-workflow/17225/3

As different jobs use different runners, so we can’t pass docker image between jobs.

Is it possible to share dockerfile between job1 and job4 in your scenario, and then build it in job4 again to generate docker image instead of pulling it from docker hub?

You can use the upload-artifact and download-artifact actions to share data between jobs in a workflow.

https://help.github.com/en/actions/automating-your-workflow-with-github-actions/persisting-workflow-data-using-artifacts#passing-data-between-jobs-in-a-workflow 99

Sorry to tell that there are no other methods to pass Docker image to next job in a workflow.

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

there is also https://github.community/t/whats-the-recommended-way-to-pass-a-docker-image-to-the-next-job-in-a-workflow/17225/4

Github provided Github Package Registry which could hold your docker image like docker hub registry. If you are using public repo, it is free. But there are some limits for private repo.

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

We can use Package Registr to store our docker images

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

I think we can pass docker env between steps in a single job, right ? Otherwise, yes, we can use GH Package registry.

jobs:
   build-hvd
   steps:
      - build-hvd-base
      ....
      - build-hvd-nlp
      ....
      - build-hvd-vision

but we can run out of memory...

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

Yep, exactly! especially in hvd case

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

OK, let's try either :

  1. upload-artifact and download-artifact
  2. GH registry and clean up after the builds

Maybe, 2) will be a more adapted solution, I'm just afraid of it could used publicly (what we would like to avoid)

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

  • there is delete-artifact
    we can use it to cleanup, since I am not sure what will happen if we try to upload twice or smth

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

I am going to try this artifact experiment in my fork, to avoid massive artifact mess up here :)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

Sounds great! Thanks @trsvchn !

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

Well, I've configured artifacts exchange between jobs, and it works fine for small images.

results: https://github.com/trsvchn/ignite/actions/runs/618227617

The problem is in the lack of space when making a tar archive for image, I'm talking about:

docker save -o image.tar image:latest

but we get:

2021-03-03T21:13:34.7276664Z ##[warning]You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 74 MB

I've inspected GH Actions runner space. This is default filesystem on start (before docker build):

Filesystem      Size  Used Avail
total           109G   69G   39G

I'm thinking to remove some packages to free up space for docker save.

@vfdev-5 @ydcjeff @sdesrozis What heavy packages can we remove w/o effect on docker?

Here is Top 50

 944.4M ghc-8.10.4
 924.8M ghc-9.0.1
 761.9M azure-cli
 471.9M google-cloud-sdk
 308.9M adoptopenjdk-11-hotspot
 282.0M libgl1-mesa-dri
 280.4M hhvm
 228.3M google-chrome-stable
 211.8M firefox
 208.5M dotnet-sdk-5.0
 194.4M adoptopenjdk-8-hotspot
 193.6M llvm-10-dev
 189.5M llvm-9-dev
 185.0M dotnet-sdk-3.1
 170.2M powershell
 163.1M llvm-8-dev
 135.2M moby-containerd
 120.5M snapd
 118.9M mysql-server-core-8.0
 114.5M moby-engine
 109.5M mono-devel
  81.0M podman
  79.3M libllvm11
  78.7M mongodb-org-server
  70.3M libllvm10
  69.4M dotnet-runtime-3.1
  68.9M moby-cli
  68.1M linux-azure-headers-5.4.0-1040
  66.7M dotnet-runtime-5.0
  66.1M libllvm9
  62.3M linux-modules-5.4.0-1040-azure
  61.8M mysql-client-core-8.0
  61.5M mongodb-org-mongos
  60.0M mono-llvm-tools
  59.8M libllvm8
  58.6M moby-buildx
  56.9M gcc-10
  55.3M ansible
  55.1M libclang-common-10-dev
  51.8M mecab-ipadic
  51.0M mongodb-org-shell
  49.1M containernetworking-plugins
  46.5M postgresql-13
  44.6M libclang-common-9-dev
  44.5M msbuild
  42.7M libclang-common-8-dev
  42.6M libclang-cpp10
  42.6M libicu-dev
  38.4M kubectl
  38.1M r-base-core

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

@trsvchn thanks for the update !
How about removing pytorch/pytorch:*-devel image before creating the archive ? This should save about 10Gb maybe.

I'm not sure if this is a good idea to remove system packages as github actions can rely on some of them ...

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

@vfdev-5 Yes, I was thinking about it, but can we remove it? I've tried locally and got this

Error response from daemon: conflict: unable to delete 7554ac65eba5 (cannot be forced) - image has dependent child images

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

Yes, true, we can not remove it for msdp images, but should work for hvd.

Let me check if we can use github registry without exposing it publicly ...

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 3, 2021

From actions/runner-images#709 (comment)

As a possible workarounds, you can remove the part of pre-installed software in runtime. For example, this command will release 5+ of free space:

sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 3, 2021

We can try that as well

@trsvchn trsvchn mentioned this pull request Mar 4, 2021
3 tasks
@sdesrozis
Copy link
Contributor

@trsvchn @vfdev-5

Regarding the packages below

228.3M google-chrome-stable
 211.8M firefox
 208.5M dotnet-sdk-5.0
 193.6M llvm-10-dev
 189.5M llvm-9-dev
 185.0M dotnet-sdk-3.1
 170.2M powershell
 163.1M llvm-8-dev
 118.9M mysql-server-core-8.0
 109.5M mono-devel
  81.0M podman
  79.3M libllvm11
  78.7M mongodb-org-server
  70.3M libllvm10

do we need

  • firefox and google-chrome ?
  • dotnet ?
  • db servers like mongodb and sqll ?
  • podman ?
  • multiple llvm ?

Sorry if this is not relevant according to the current thread ....

@trsvchn
Copy link
Collaborator Author

trsvchn commented Mar 4, 2021

@sdesrozis Thanks for suggestions! But idea save/load docker image is not working, now we are building all 3 images in a one job and

using

sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"

frees enough space for that purpose

@harshitgaur2
Copy link

Hi, is this issue still not resolved. If not, can I get the issue note on the last updated work?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 4, 2022

@harshit2000 yes this PR is merged and fixed related issues, please check its description.

@harshitgaur2
Copy link

@vfdev-5 Sure, thanks for the info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants