Skip to content

[RHDHPAI-535] Address Images Missing Content#34

Merged
Jdubrick merged 9 commits intoredhat-ai-dev:mainfrom
Jdubrick:fix/address-failing-ci
Mar 25, 2025
Merged

[RHDHPAI-535] Address Images Missing Content#34
Jdubrick merged 9 commits intoredhat-ai-dev:mainfrom
Jdubrick:fix/address-failing-ci

Conversation

@Jdubrick
Copy link
Copy Markdown
Contributor

@Jdubrick Jdubrick commented Mar 21, 2025

What does this PR do?:

This PR adds the missing content so that the images being built by CI are in proper working order. Prior to this the images were missing the actual model information, especially detr-resnet-101.

In addition to adding a Makefile and the necessary build info for detr-resnet-101 building, I also created a CONTRIBUTING.md file to better document how you can maintain these images. The helm-charts/application-gitops was previously marked as deprecated as well so I removed the config.env from it to avoid CI building new versions.

Because the detr-resnet-101 data is large I had to move it to manual building and validation, similar to the model-server/vllm workflow. This is a constraint set by GH because of LFS unfortunately.

Which issue(s) this PR fixes:

https://issues.redhat.com/browse/RHDHPAI-535

PR acceptance criteria:

Testing and documentation do not need to be complete in order for this PR to be approved. We just need to ensure tracking issues are opened and linked to this PR, if they are not in the PR scope due to various constraints.

  • Tested and Verified

  • Documentation (READMEs, Product Docs, Blogs, Education Modules, etc.)

How to test changes / Special notes to the reviewer:

You can perform the following to ensure all images are built properly:

  1. Checkout this PR branch
  2. Edit the config.env file for each container image and redirect it to your Quay account, tag it however you want.
  3. Add a small change to the README for each container image so it triggers as a code change during CI
  4. On your fork of developer-images you should make sure GH Actions are enabled and you have your Quay info stored in GH secrets (QUAY_USERNAME, QUAY_PASSWORD). See CI definition for more info.
  5. Open and merge the contents of this PR to your forked main branch, this will trigger the CI and allow the build + push to occur to your Quay.
  6. Confirm all the images are in your Quay as expected
  7. Head over to your fork of ai-lab-template and change the image references in the env files to your newly built and hosted ones. All this info is stored here: https://github.com/redhat-ai-dev/ai-lab-template/tree/main/scripts/envs
  8. Deploy these templates to RHDH (can replace Catalog in the rhdh-installer to do so) and create each template app, confirming they function as expected with the new images that were built by the CI.

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
@Jdubrick
Copy link
Copy Markdown
Contributor Author

Update: Had to remove the CI building and pushing the detr-resnet-101 image because it required Git LFS, on my local fork this wasn't an issue but for this org it is flagging it as requiring additional resources (subscription to it on Git). Because of this I have moved the building and pushing of this image to being done manually similar to the vLLM.

Copy link
Copy Markdown
Contributor

@thepetk thepetk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.

batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.

I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!

I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940

@Jdubrick
Copy link
Copy Markdown
Contributor Author

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.

batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.

I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!

I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940

I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk

@thepetk
Copy link
Copy Markdown
Contributor

thepetk commented Mar 25, 2025

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.

batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.

I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!
I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940

I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk

I thought about it tbh. I haven't spent a tone of time on it (e.g I don't want to misslead you), however I'm thinking we need to be sure we use a fixed version. Meaning we need to be sure that the content's version inside the developer images is specific. Thinking that we include the contents here, we are at least sure we build every time the same image and not a newer one.

However this could be facilitate with many different ways (maybe a hash in your download function could help?). Also maybe we could use the code from the ai-lab-recipes?

@Jdubrick
Copy link
Copy Markdown
Contributor Author

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.

batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.

I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!
I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940

I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk

I thought about it tbh. I haven't spent a tone of time on it (e.g I don't want to misslead you), however I'm thinking we need to be sure we use a fixed version. Meaning we need to be sure that the content's version inside the developer images is specific. Thinking that we include the contents here, we are at least sure we build every time the same image and not a newer one.

However this could be facilitate with many different ways (maybe a hash in your download function could help?). Also maybe we could use the code from the ai-lab-recipes?

I know I had to enable git lfs to be able to work with it, which I don't think is great as a requirement for others to work with a repo tbh. IMO it would be okay to add it to an ignore file and when a maintainer goes to update they will pull the new changes in - We have fixed versions in quay and these are just the files used to build them so we have historical records if that makes sense?

The Makefile used to pull the info and the script for converting into the proper format all came from ai-lab-recipes as well fwiw @thepetk

@thepetk
Copy link
Copy Markdown
Contributor

thepetk commented Mar 25, 2025

I tried to test the updates, but unfortunately I get an error too when I'm trying to clone/checkout the branch of your fork due to the large size of models.safetensors file.

batch response: This repository is over its data quota.
Account responsible for LFS bandwidth should purchase
more data packs to restore access.

I think if you have a template ready that you've used to do the testing for the object detection image it would be awesome (one with the image you've pushed manually)!
I see the action after you've merged the PR on your fork was successfull too: https://github.com/Jdubrick/ai-developer-images/actions/runs/13978075242/job/39136688940

I'm wondering if it would be best to include the contents of detr-resnet-101 to a .gitignore to avoid any issues with git lfs for those working with the repo, since it requires you to manually build and push anyways I think it would be fine? wdyt? @thepetk

I thought about it tbh. I haven't spent a tone of time on it (e.g I don't want to misslead you), however I'm thinking we need to be sure we use a fixed version. Meaning we need to be sure that the content's version inside the developer images is specific. Thinking that we include the contents here, we are at least sure we build every time the same image and not a newer one.
However this could be facilitate with many different ways (maybe a hash in your download function could help?). Also maybe we could use the code from the ai-lab-recipes?

I know I had to enable git lfs to be able to work with it, which I don't think is great as a requirement for others to work with a repo tbh. IMO it would be okay to add it to an ignore file and when a maintainer goes to update they will pull the new changes in - We have fixed versions in quay and these are just the files used to build them so we have historical records if that makes sense?

The Makefile used to pull the info and the script for converting into the proper format all came from ai-lab-recipes as well fwiw @thepetk

Hmm I'm ok with adding it to ignore, my only question was how/if we can track which contents have created a speicifc image tag in quay.io and if we can fix a version of the model that is built locally, but I guess as long as this process is done manually we cannot be sure.

@Jdubrick
Copy link
Copy Markdown
Contributor Author

Yeah it would be tough to track that because it has to be done manually, I think since it isn't updated very often we should be okay for now but it may be worth looking into having some sort of tracker implemented that you're required to update if you push or maybe storing compressed versions of the content that are labeled? @thepetk

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Copy link
Copy Markdown
Contributor

@thepetk thepetk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the new changes is good to be merged. I've downloaded sucessfully all contents using the Makefile command and built locally the detr image

@Jdubrick Jdubrick merged commit 1e5138a into redhat-ai-dev:main Mar 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants