Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to use BuildKit options to reduce resources consumption #125

Closed
ghassanmas opened this issue May 16, 2023 · 9 comments
Closed
Labels
documentation Improvements or additions to documentation

Comments

@ghassanmas
Copy link
Member

ghassanmas commented May 16, 2023

Context

By palm it's expected that tutor/tutor-mfe would require BuildKit to be enabled by docker which is the case by default for docker since 23 version that is BuildKit is the default builder1.

Buildkit adds extra features to tutor/tutor-mfe, mainly cache related, however one of it's main feature.

Parallelize building independent build stages 1.

Would consume a lot of resources in case of tutor-mfe, given it would run npm install and npm run build concurrently for the X MFEs that are enabled by tutor-mfe, this can lead to errors related to network for former and the high resources consumption for the latter.

Also another concern about this is that consider the case of which the same machine that is used to deploy an Open edX instance is used for building, it's would be quite risky a low resources machine to run build the image while also having tutor containers running. i.e. in case system crash, it would the affect the availability of the service.

Possible solution:

Note: Those are not exclusive of each others.

  1. Configure BuildKit to use less resources as suggested by @regisb 2 3
  2. Make it optimal to use the BuildKit builder just when building tutor-mfe.
  3. Rethink the way MFEs are built/deployed, i.e..

Related issue/concern:

Also in Development mode, it has been observed that typically a developer would need to work on a specific MFE, however tutor dev would by default run all MFEs in development mode, i.e. npm run start X times of the enabled MFEs, while is totally different issue, it's probably related.

Possible outcomes at least before palm release

  • Document a possible way to utilize buildkit with a low resources machine. i.e. at least mention it in the release note, or/and have a link that detail a recommended set-up for a low resources machine.

Footnotes

  1. https://docs.docker.com/build/buildkit/#overview 2

  2. Docs https://docs.docker.com/build/buildkit/configure/#max-parallelism

  3. Slack thread https://openedx.slack.com/archives/CGE253B7V/p1684170597489729

@regisb
Copy link
Contributor

regisb commented May 16, 2023

Can you please try the following solution?

  1. Create a buildtkit.toml configuration file with the following contents:
[worker.oci]
   max-parallelism = 2
  1. Create a builder that uses that configuration:
docker buildx create --use --name=max2cpu --driver=docker-container --config=./buildkit.toml
  1. Build the mfe image:

    tutor images build mfe

This solution seems to work for me, in the sense that I see only two layers built simultaneously.

If this works for you as well, then I think that it should be the recommended approach, and we should add these instructions to the troubleshooting docs.

@ghassanmas
Copy link
Member Author

Yes @regisb, it did what is expected, I can it only does two thing at a time.
image

@arbrandes
Copy link
Collaborator

I second @regisb's proposal. A nice and simple way to reduce resource usage.

@ghassanmas
Copy link
Member Author

ghassanmas commented May 17, 2023

[Update]

I had to change it to use one worker, when testing on Mac M2 mini with 8GB of RAM:

while building, running docker stats

CONTAINER ID   NAME                              CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
8c34f988213e   buildx_buildkit_max2cpu           170.35%   868.6MiB / 7.765GiB   10.92%    470MB / 8.78MB    21.8MB / 3.01GB   46

CPU would fluctate between 100-200%, I/O aroud 800MB, PIDs can reach up to to 60. That when using one worker.

The crash error I would get otherwise, npm killed something.

To use one worker, I had to update the file pointed above and then running this command:

docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml the difference is just using --node because name exists.

@ghassanmas
Copy link
Member Author

Also after the build is done docker stats

CONTAINER ID   NAME                              CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
8c34f988213e   buildx_buildkit_max2cpu           0.00%     1.569GiB / 7.765GiB   20.21%    811MB / 13.8MB    1.33GB / 8.45GB   25

Why is it still running, it make sense that CPU is 0% because build is done, however it still consumes a lot of RAM.

I am not sure what magic does buildx/builder do, but I had to stop it docker stop 8c34f988213e

The builder would initially be inactive docker buildx ls and would not appear show up in docker ps or docker stats as a running container, until a build command is execuated. The probelm again is even after build is done, the builder container would still be running... may be I had to wait for it to stop itself, I couldn't find a relavnt ref in the doc.

@regisb
Copy link
Contributor

regisb commented May 22, 2023

I had to change it to use one worker, when testing on Mac M2 mini with 8GB of RAM

Running with just two workers exceeds your 8GB of RAM??? This would mean that building a single MFE requires 4GB of RAM? If this is true then we really need to rename MFE to macrofrontends.

Why is it still running, it make sense that CPU is 0% because build is done, however it still consumes a lot of RAM.

Buildx is actually a process that runs inside a docker container -- as implied by the --driver=docker-container option you used. I suspect it's using some memory because it's doing garbage collection and other chores in the background. In my experience it's safe to remove the container, as it will automatically be restarted next time you run docker buildx.

@arbrandes
Copy link
Collaborator

we really need to rename MFE to macrofrontends.

No objections from me. ;P

@regisb regisb changed the title Investigate BuildKit options to reduce resources consumption Document how to use BuildKit options to reduce resources consumption May 30, 2023
@regisb regisb added the documentation Improvements or additions to documentation label May 30, 2023
@regisb
Copy link
Contributor

regisb commented May 30, 2023

I changed the title of the issue to reflect the decision proposed in my earlier comment.

@davidjoy
Copy link

davidjoy commented Mar 14, 2024

For those that end up at this PR trying to solve the following error when running tutor dev launch:

ERROR: failed to solve: process "/bin/sh -c npm clean-install --no-audit --no-fund --registry=$NPM_REGISTRY" did not complete successfully: exit code: 137
Error: Command failed with status 1: docker buildx build --tag=docker.io/overhangio/openedx-mfe:17.0.0-nightly --output=type=docker --cache-from=type=registry,ref=docker.io/overhangio/openedx-mfe:17.0.0-nightly-cache /Users/david/Library/Application Support/tutor-nightly/env/plugins/mfe/build/mfe

The answer appears to be the parallelism situation described here. Documentation on how to fix it is now here:

https://github.com/overhangio/tutor-mfe?tab=readme-ov-file#mfe-development

Scroll down to the end of the "MFE Development" section, right before "Uninstall", and you'll find some steps to reduce the max-parallelism, which means the launch process will try to do way fewer things at once and, hopefully, succeed:

cat >buildkitd.toml <<EOF
[worker.oci]
  max-parallelism = 1
EOF
docker buildx create --use --name=singlecpu --config=./buildkitd.toml

(That file can be created anywhere, see the link for more details)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Development

No branches or pull requests

4 participants