CI: 2023 07 14
Zack Galbreath edited this page Jul 14, 2023
·
1 revision
- Aashish Chaudhary
- Alec Scott
- Dan LaManna
- Jacob Nesbitt
- Mike VanDenburgh
- Massimiliano Culpo
- Ryan Krattiger
- Scott Wittenburg
- Todd Gamblin
- Tamara Grimmett
- Zack Galbreath
- This week we upgraded EKS to v1.27 and karpenter to v0.29. These upgrades went well with few surpises and minimal downtime.
- Karpenter now supports Windows containers! We plan to test out this functionality and hope to enable a Windows stack in our GitLab CI piplines soon.
- Next week we plan to upgrade gitlab.spack.io to the latest patch release (v16.1.2).
-
Ryan's PR to add more fine-grained timers to
spack install
is going well. We are hoping to merge it soon.- We are also working on ingesting this new data into OpenSearch and using it to publish new Grafana dashboards.
- Jake and Alec are going to meet next week to discuss strategies to more centrally store CI metrics.
- cache.spack.io now shows results for our weekly snapshot mirrors.
- Scott opened PR #38866 to unconditionally run the
protected-publish
job in our protected pipelines. This will fix the problem where the top-level mirror is not always up-to-date with the results from the individual stack-specific mirrors. - Scott also discovered that many of the
no-binary-for-spec
failures we've seen lately may be due to aDeleteOldObjects
lifecycle policy that was configured to delete objects from the PR mirror after 14 days. This has since been disabled.
- Ryan discovered that his pruning script was sometimes receiving incomplete results from the GitLab API. He's updating this script to directly query GitLab's database for the list of jobs to fetch instead.
- Alec will be working with a student this summer to investigating job scheduling & performance in our GitLab CI pipelines.
- Finish timing data PR and start working on subsequent dashboards
- Upgrade gitlab.spack.io to the latest patch release
- Migrate GitLab's minio volume from gp2 to gp3
- Manually running the pruning script for our develop buildcache and continue to work on automating this task.
- Investigate why our gitlab sidekiq pods die and get restarted somewhat frequently. Perhaps increasing resource requests will reduce this error rate?
- Update gitlab.spack.io to use S3 and ElasticCache rather than minio and redis.
- Update the sync script to merge topics branches against their base branch instead of assuming that it is always
develop
(necessary for release branch PRs).