[RHDHPAI-535] Address Images Missing Content by Jdubrick · Pull Request #34 · redhat-ai-dev/developer-images

Jdubrick · 2025-03-21T18:45:46Z

What does this PR do?:

This PR adds the missing content so that the images being built by CI are in proper working order. Prior to this the images were missing the actual model information, especially detr-resnet-101.

In addition to adding a Makefile and the necessary build info for detr-resnet-101 building, I also created a CONTRIBUTING.md file to better document how you can maintain these images. The helm-charts/application-gitops was previously marked as deprecated as well so I removed the config.env from it to avoid CI building new versions.

Because the detr-resnet-101 data is large I had to move it to manual building and validation, similar to the model-server/vllm workflow. This is a constraint set by GH because of LFS unfortunately.

Which issue(s) this PR fixes:

https://issues.redhat.com/browse/RHDHPAI-535

PR acceptance criteria:

Testing and documentation do not need to be complete in order for this PR to be approved. We just need to ensure tracking issues are opened and linked to this PR, if they are not in the PR scope due to various constraints.

Tested and Verified
Documentation (READMEs, Product Docs, Blogs, Education Modules, etc.)

How to test changes / Special notes to the reviewer:

You can perform the following to ensure all images are built properly:

Checkout this PR branch
Edit the config.env file for each container image and redirect it to your Quay account, tag it however you want.
Add a small change to the README for each container image so it triggers as a code change during CI
On your fork of developer-images you should make sure GH Actions are enabled and you have your Quay info stored in GH secrets (QUAY_USERNAME, QUAY_PASSWORD). See CI definition for more info.
Open and merge the contents of this PR to your forked main branch, this will trigger the CI and allow the build + push to occur to your Quay.
Confirm all the images are in your Quay as expected
Head over to your fork of ai-lab-template and change the image references in the env files to your newly built and hosted ones. All this info is stored here: https://github.com/redhat-ai-dev/ai-lab-template/tree/main/scripts/envs
Deploy these templates to RHDH (can replace Catalog in the rhdh-installer to do so) and create each template app, confirming they function as expected with the new images that were built by the CI.

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

Jdubrick · 2025-03-21T18:55:02Z

Update: Had to remove the CI building and pushing the detr-resnet-101 image because it required Git LFS, on my local fork this wasn't an issue but for this org it is flagging it as requiring additional resources (subscription to it on Git). Because of this I have moved the building and pushing of this image to being done manually similar to the vLLM.

thepetk

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.

batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.

I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!

I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940

Jdubrick · 2025-03-25T13:57:50Z

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.
batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.
I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!

I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940

I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk

thepetk · 2025-03-25T14:37:12Z

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.
batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.
I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!
I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940
I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk

I thought about it tbh. I haven't spent a tone of time on it (e.g I don't want to misslead you), however I'm thinking we need to be sure we use a fixed version. Meaning we need to be sure that the content's version inside the developer images is specific. Thinking that we include the contents here, we are at least sure we build every time the same image and not a newer one.

However this could be facilitate with many different ways (maybe a hash in your download function could help?). Also maybe we could use the code from the ai-lab-recipes?

Jdubrick · 2025-03-25T14:42:45Z

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.
batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.
I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!
I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940
I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk
I thought about it tbh. I haven't spent a tone of time on it (e.g I don't want to misslead you), however I'm thinking we need to be sure we use a fixed version. Meaning we need to be sure that the content's version inside the developer images is specific. Thinking that we include the contents here, we are at least sure we build every time the same image and not a newer one.

However this could be facilitate with many different ways (maybe a hash in your download function could help?). Also maybe we could use the code from the ai-lab-recipes?

I know I had to enable git lfs to be able to work with it, which I don't think is great as a requirement for others to work with a repo tbh. IMO it would be okay to add it to an ignore file and when a maintainer goes to update they will pull the new changes in - We have fixed versions in quay and these are just the files used to build them so we have historical records if that makes sense?

The Makefile used to pull the info and the script for converting into the proper format all came from ai-lab-recipes as well fwiw @thepetk

thepetk · 2025-03-25T14:55:19Z

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.
batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.
I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!
I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940
I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk
I thought about it tbh. I haven't spent a tone of time on it (e.g I don't want to misslead you), however I'm thinking we need to be sure we use a fixed version. Meaning we need to be sure that the content's version inside the developer images is specific. Thinking that we include the contents here, we are at least sure we build every time the same image and not a newer one.
However this could be facilitate with many different ways (maybe a hash in your download function could help?). Also maybe we could use the code from the ai-lab-recipes?
I know I had to enable git lfs to be able to work with it, which I don't think is great as a requirement for others to work with a repo tbh. IMO it would be okay to add it to an ignore file and when a maintainer goes to update they will pull the new changes in - We have fixed versions in quay and these are just the files used to build them so we have historical records if that makes sense?

The Makefile used to pull the info and the script for converting into the proper format all came from ai-lab-recipes as well fwiw @thepetk

Hmm I'm ok with adding it to ignore, my only question was how/if we can track which contents have created a speicifc image tag in quay.io and if we can fix a version of the model that is built locally, but I guess as long as this process is done manually we cannot be sure.

Jdubrick · 2025-03-25T15:00:43Z

Yeah it would be tough to track that because it has to be done manually, I think since it isn't updated very often we should be okay for now but it may be worth looking into having some sort of tracker implemented that you're required to update if you push or maybe storing compressed versions of the content that are labeled? @thepetk

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

thepetk

I think with the new changes is good to be merged. I've downloaded sucessfully all contents using the Makefile command and built locally the detr image

Jdubrick added 6 commits March 19, 2025 14:14

add new logic to support detr-resnet-101 building

7895246

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

update documentation and add new contributing help

c67aa27

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

remove config setup for application-gitops as it is deprecated

8d9e78b

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

update detr-resnet-101 containerfile to utilize COPY

f830872

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

add git lfs to checkout step

033ae4f

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

rearrange detr-resnet-101 containerfile

5e8b128

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

Jdubrick requested review from johnmcollier, maysunfaisal, michael-valdron, thepetk and yangcao77 March 21, 2025 18:45

remove git lfs due to constraints and move detr-resnet to manual

0277a97

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

thepetk reviewed Mar 25, 2025

View reviewed changes

Jdubrick added 2 commits March 25, 2025 12:08

add large detr-resnet-101 content to git ignore

4bcc677

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

update contributing for detr-resnet-101

7e7f1ea

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

thepetk approved these changes Mar 25, 2025

View reviewed changes

Jdubrick merged commit 1e5138a into redhat-ai-dev:main Mar 25, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RHDHPAI-535] Address Images Missing Content#34

[RHDHPAI-535] Address Images Missing Content#34
Jdubrick merged 9 commits intoredhat-ai-dev:mainfrom
Jdubrick:fix/address-failing-ci

Jdubrick commented Mar 21, 2025 •

edited

Loading

Uh oh!

Jdubrick commented Mar 21, 2025

Uh oh!

thepetk left a comment •

edited

Loading

Uh oh!

Jdubrick commented Mar 25, 2025

Uh oh!

thepetk commented Mar 25, 2025 •

edited

Loading

Uh oh!

Jdubrick commented Mar 25, 2025

Uh oh!

thepetk commented Mar 25, 2025

Uh oh!

Jdubrick commented Mar 25, 2025

Uh oh!

thepetk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jdubrick commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?:

Which issue(s) this PR fixes:

PR acceptance criteria:

How to test changes / Special notes to the reviewer:

Uh oh!

Jdubrick commented Mar 21, 2025

Uh oh!

thepetk left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jdubrick commented Mar 25, 2025

Uh oh!

thepetk commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jdubrick commented Mar 25, 2025

Uh oh!

thepetk commented Mar 25, 2025

Uh oh!

Jdubrick commented Mar 25, 2025

Uh oh!

thepetk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jdubrick commented Mar 21, 2025 •

edited

Loading

thepetk left a comment •

edited

Loading

thepetk commented Mar 25, 2025 •

edited

Loading