Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI stack for ML packages #31592

Merged
merged 38 commits into from
Oct 9, 2022
Merged

Add CI stack for ML packages #31592

merged 38 commits into from
Oct 9, 2022

Conversation

adamjstewart
Copy link
Member

@adamjstewart adamjstewart commented Jul 15, 2022

Basic stack of ML packages we would like to test and generate binaries for in CI.

Closes #31551

@tgamblin @scottwittenburg I said this when I was working on #27798, but this is the least intuitive and least user friendly thing I've ever done in Spack. I basically copy-n-pasted a gigantic file containing a million settings that have absolutely no meaning to me in the hopes that it works and I didn't make any mistakes. If there is even a single setting that is wrong in this file, someone is going to have to help me fix it, because I'm unaware of any documentation explaining what all of these mysterious keywords mean. Does this have to be this complicated? Can we have a file of defaults somewhere and I only need to override the ones I care about? I don't see why this should be any more than a simple environment file like:

spack:
  specs:
    - py-tensorflow
    - py-torch

@adamjstewart adamjstewart added the gitlab Issues related to gitlab integration label Jul 15, 2022
@adamjstewart
Copy link
Member Author

TODO: add ml tag to all packages we decide to consider ML libraries and test in CI.

@spackbot-app spackbot-app bot added the core PR affects Spack core functionality label Jul 16, 2022
@adamjstewart adamjstewart changed the title Add CI stack for ML packages [WIP] Add CI stack for ML packages Jul 16, 2022
@adamjstewart
Copy link
Member Author

@scottwittenburg any tips to get this pipeline to run?

Copy link
Contributor

@blue42u blue42u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advice from a fellow traveler on this road (#31598). See comments below.

share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml Outdated Show resolved Hide resolved
share/spack/gitlab/cloud_pipelines/stacks/ml/spack.yaml Outdated Show resolved Hide resolved
share/spack/gitlab/cloud_pipelines/stacks/ml/spack.yaml Outdated Show resolved Hide resolved
@adamjstewart
Copy link
Member Author

Still no luck.

@adamjstewart
Copy link
Member Author

@spackbot run pipeline

@spackbot-app
Copy link

spackbot-app bot commented Jul 22, 2022

I've started that pipeline for you!

@adamjstewart
Copy link
Member Author

I think it's working! Like there are still concretization issues I need to resolve, but it gets far enough that it tries to concretize the environment!

@adamjstewart
Copy link
Member Author

@aweits can you investigate the build issues with JAX in CI?

@aweits
Copy link
Contributor

aweits commented Aug 3, 2022

@aweits can you investigate the build issues with JAX in CI?

replicated a build failure locally w/ bazel 5.2.0, builds fine with 4.1.0...

@adamjstewart
Copy link
Member Author

On what OS? I was hoping that Bazel 5 build failures would only be on macOS arm64...

@aweits
Copy link
Contributor

aweits commented Aug 4, 2022

On what OS? I was hoping that Bazel 5 build failures would only be on macOS arm64...

linux-rhel7-skylake_avx512

I think it's this:

bazelbuild/bazel#13811

@adamjstewart
Copy link
Member Author

Okay, so from reading that, it seems like the problem is at the package level, not the bazel level. So would the right approach be to lock down supported Bazel versions until those rules are updated?

@aweits
Copy link
Contributor

aweits commented Aug 4, 2022

Okay, so from reading that, it seems like the problem is at the package level, not the bazel level. So would the right approach be to lock down supported Bazel versions until those rules are updated?

That's quite probably the "right" way to do it - alternatively, the workaround in that thread seems to work:

diff --git a/var/spack/repos/builtin/packages/py-jaxlib/package.py b/var/spack/repos/builtin/packages/py-jaxlib/package.p
index eb543a2..3ad4399 100644
--- a/var/spack/repos/builtin/packages/py-jaxlib/package.py
+++ b/var/spack/repos/builtin/packages/py-jaxlib/package.py
@@ -67,3 +67,19 @@ def patch(self):
             "build/build_wheel.py",
             string=True,
         )
+
+        matchtext = '''load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")'''
+        newtext = '''
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+http_archive(
+    name = "build_bazel_rules_apple",
+    sha256 = "0052d452af7742c8f3a4e0929763388a66403de363775db7e90adecb2ba4944b",
+    urls = [
+        "https://github.com/bazelbuild/rules_apple/releases/download/0.31.3/rules_apple.0.31.3.tar.gz",
+    ],
+)
+'''
+        filter_file(matchtext,
+                    newtext,
+                    'WORKSPACE',
+                    string=True)

@adamjstewart
Copy link
Member Author

@spackbot run pipeline

@spackbot-app
Copy link

spackbot-app bot commented Aug 7, 2022

I've started that pipeline for you!

@adamjstewart
Copy link
Member Author

I think this PR is finally ready to merge! All new pipelines are passing. Some caveats:

  1. Some existing pipelines are failing. I have no control over those. I can force merge if I need to.
  2. PyTorch does not yet support ROCm, JAX does not yet build properly. These issues can be solved in future PRs.
  3. There are 3 separate stacks for CPU, CUDA, and ROCm pipelines. 90% of the packages in those DAGs do not have +cuda/+rocm variants, and so are duplicated. I would love to be able to write a single pipeline with a matrix over all possible variant combinations, but that's not possible at the moment: Environments: matrix of optional variants #32893
  4. We would like to someday repeat these 3 stacks on several OS/arch combinations. Much like 3, this is not yet supported without duplicating the entire pipeline dozens of times. Would love to see support for this someday: Environments: matrix of OS/arch #32894

@tgamblin
Copy link
Member

tgamblin commented Oct 9, 2022

This looks really awesome to me -- thanks for all the work everyone. Once we merge, this should start getting tested on every PR. We may still have issues on develop until GitHub merge queue goes into broader beta later this year (then we can use it) but I think this goes a most of the way towards keeping these packages built!

@tgamblin tgamblin merged commit 01ede3c into spack:develop Oct 9, 2022
@adamjstewart adamjstewart deleted the ci/ml branch October 9, 2022 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core PR affects Spack core functionality gitlab Issues related to gitlab integration new-version update-package
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

CI and public binaries for ML libraries
9 participants