Use GenAIComp base image to simplify Dockerfiles & reduce image sizes #1369

eero-t · 2025-01-08T16:41:22Z

Description

Current application Dockerfiles copy other Dockerfiles. I've gone through all of them (currently 19, initially 20), and found that except for "EdgeCraftRAG" Dockerfile.server file, only unique part in each container is just the small application Python file (and FFmpeg addition for "DocSum").

This changes all the other 18 Dockerfiles (for 15 apps) to use common base image: opea-project/GenAIComps#1127

That should clearly speed up their building, and greatly reduce the resulting disk usage as new base image is shared between container images. IMHO the main benefit from removing the duplicated content will be dractic simplification of the Dockerfiles though.

Note: image size shown by docker & crictl include also sizes of underlying images, but disk space is actually used only once per shared base image, so the real disk cost of each additional application image will be just few tens of KBs.

EDIT: subset of these changes have been split to 3 other PRs (#1612 + #1638 + #1671), for apps which do not have CI issues (any more). This PR will handle the last (AvatarChatbot) app once it is fixed.

Issues

There are several tickets about large disk usage, both for GenAIExamples and GenAIComps repo.

Type of change

Others (enhancement, documentation, validation, etc.)

Dependencies

No new 3rd party dependencies.

"GenAIComps" content included earlier directly to Dockerfiles comes now from "GenAIComps" base image.

This PR will fail until the AvatarChatbot application is fixed: #1607

Tests

Did manual testing that ChatQnA & DocSum images build when GenAIComps image is available, and that those 2 applications still work fine.

As to other applications, this relies on CI tests in these PRs, and earlier staged builds PR (which changed Dockerfiles to be constructed similarly to base-image use, with all content just being built in same Dockerfile): #1031

Future work

This solves only application image size aspect. There are many other images which disk usage can and should still be optimized (app UIs, GenAIComps backend service images etc).

github-actions · 2025-01-08T16:41:40Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

eero-t · 2025-01-08T16:42:15Z

Setting to draft state until base image is available: opea-project/GenAIComps#1127

mkbhanda · 2025-01-08T18:46:19Z

Delightfully clean!

eero-t · 2025-01-20T17:45:16Z

Rebased to main and resolved all conflicts. Only difference to earlier Dockerfile contents is "DocSum" including now FFmpeg. Updated description accordingly.

eero-t · 2025-02-05T09:37:49Z

Base container PR opea-project/GenAIComps#1127 is merged.

Before this PR can be merged:

the resulting base container image need to be added to nightly build images,
pushed to registry used by CI (in this / GenAIComps repo), and
pushed also to DockerHub OPEA project.

chensuyue · 2025-03-04T03:22:28Z

@eero-t you can resolve the conflict and run the test.

eero-t · 2025-03-04T10:25:43Z

Rebased to main and resolved the conflict. Dropped draft status as base image is now available (thanks @chensuyue!): https://hub.docker.com/r/opea/comps-base

eero-t · 2025-03-04T16:09:15Z

Majority of the CI checks (64) passes, include application tests, but CI tests take ~5 hours to run, and there are also lot (26) of test failures...

"AvatarChatbot" tests fail to Timeouts.

"ChatQnA" tests fail because:

chatqna.py is newer than comps-base image. Former has the async revert, latter doesn't
Invalid type used in llama_guard health check for vllm-llm:
[2025-03-04 12:34:47,339] [ ERROR] - opea_llama_guard - Health check failed due to an exception: Invalid input type <class 'dict'>. Must be a PromptValue, str, or list of BaseMessages.
Trace endpoint timeouts:
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=4318): Max retries exceeded with url: /v1/traces (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc3d2775710>: Failed to establish a new connection: [Errno 111] Connection refused'))

Content checks fail (CONTENT='') for "CodeGen", "CodeTrans", "DocSum" and Xeon version of "Translation", maybe due to latest comps-base image missing async fix.

Whereas rocm version of "Translation" test does not show why it fails:

Container translation-tgi-service  Started
Container translation-tgi-service  Waiting
Container translation-tgi-service  Error
dependency failed to start: container translation-tgi-service exited (1)
Error: Process completed with exit code 1.

"GraphRAG" services shows several connection errors:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=4318): Max retries exceeded with url: /v1/traces (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe350700b10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Retrying llama_index.llms.openai.base.OpenAI._acomplete in 1.0 seconds as it raised APIConnectionError: Connection error..
Retrying llama_index.llms.openai.base.OpenAI._acomplete in 1.73633567763621 seconds as it raised APIConnectionError: Connection error..
[2025-03-04 14:09:53,458] [    INFO] - neo4j_retrievers - [ check health ] Failed to connect to Neo4j: Connection error.
[2025-03-04 14:09:53,458] [   ERROR] - neo4j_retrievers - OpeaNeo4jRetriever health check failed.
...
 [2025-03-04 14:15:18,358] [   ERROR] - opea_retrievers_microservice - [ retrieval ] Error during retrieval invocation: Cannot resolve address 100.83.111.229:None
....
socket.gaierror: [Errno -8] Servname not supported for ai_socktype

"SearchQnA" Xeon version got Timeout, and rocm version got Internal Server error.

"MultiModalQnA" (rocm) test returns some base64 binary data which cannot be really be viewed in Browser. Output of that test should be fixed.

"VisualQnA" rocm version fails to device being unavailable:

+ echo '[ lvm ] HTTP status is not 200. Received status was 500'
+ docker logs visualqna-tgi-service
Error: ShardCannotStart
+ exit 1
Error: Process completed with exit code 1.

eero-t · 2025-03-04T16:41:25Z

I split the changes for a subset of (5) apps[1] that do not have any CI issues currently into a separate PR (#1612).

[1] ("AudioQnA", "DocIndexRetriever", "EdgeCraftRAG", "FaqGen", "VideoQnA".

eero-t · 2025-03-14T17:51:28Z

Second set of Dockerfile updates was merged => rebased this to main, to see whether remaining 4 apps would now succeed in their earlier failing 7 CI tests:

AvatarChatbot (3/3 fails)
CodeGen (1/2 fails)
CodeTrans (1/3 fails)
MultimodalQnA (2/3 fails)

chensuyue · 2025-03-17T06:43:31Z

#1607 AvatarChatbot failed with known issue.

eero-t · 2025-03-17T10:21:11Z

#1607 AvatarChatbot failed with known issue.

Thanks @chensuyue! I split the currently working 3 apps to a separate PR #1671.

eero-t · 2025-03-24T09:07:52Z

With the #1671 merged, rebased this to main and renamed the commit changing the last (AvatarChatbot) app, to be last part in series.

eero-t · 2025-03-28T09:37:48Z

Rebased to main, to check current situation with AvatarChatbot.

xiguiw · 2025-04-01T08:28:41Z

@eero-t

Any ideas about this new issue:
opea-project/GenAIComps#1465

Update the last remaining application (megaservice) image of the 15 apps that use GenAIComps repo code as base. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

eero-t · 2025-04-04T09:03:13Z

Rebased to main to see whether AvatarChatbot finally passes CI, but it still fails.

eero-t · 2025-04-04T09:15:06Z

@chensuyue Now the CI failures are due to container cleanup:

Stop and remove all containers used by the services in ./docker_compose/intel/cpu/xeon/compose.yaml ...
312cfa11f69a
....
34b467b308e2
Error: Process completed with exit code 1.

chensuyue · 2025-04-08T13:53:00Z

@chensuyue Now the CI failures are due to container cleanup:

Stop and remove all containers used by the services in ./docker_compose/intel/cpu/xeon/compose.yaml ...
312cfa11f69a
....
34b467b308e2
Error: Process completed with exit code 1.

No, it's not the main issue, the main issue is functionality test failed.

…opea-project#1369) Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> Signed-off-by: Mahathi Vatsal <mahathi.vatsal.salopanthula@intel.com>

…opea-project#1369) Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> Signed-off-by: Lacewell, Chaunte W <chaunte.w.lacewell@intel.com>

…opea-project#1369) Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>

eero-t requested review from Spycsh, WenjiaoYue, XinyaoWa, ashahba, letonghan, lkk12014402, lvliang-intel, myqi and xuechendi as code owners January 8, 2025 16:41

eero-t marked this pull request as draft January 8, 2025 16:41

eero-t mentioned this pull request Jan 8, 2025

Misc Dockerfiles updates to reduce image footprints #1363

Closed

4 tasks

This was referenced Jan 9, 2025

Add Dockerfile for comps-base image opea-project/GenAIComps#1127

Merged

Use staged builds to minimize final image sizes #1031

Merged

eero-t force-pushed the use-base-image branch from eb89a86 to 61653b4 Compare January 20, 2025 17:39

This was referenced Feb 19, 2025

[Feature] KubeAI Operator for OPEA opea-project/GenAIInfra#791

Closed

[CI/CD] Enabling building base image in CI opea-project/GenAIComps#1314

Closed

[Feature] Dockerfile Optimization for OPEA v1.3 #1585

Closed

eero-t force-pushed the use-base-image branch from 61653b4 to 5487610 Compare March 4, 2025 10:21

eero-t marked this pull request as ready for review March 4, 2025 10:23

eero-t requested a review from rbrugaro as a code owner March 4, 2025 10:23

eero-t mentioned this pull request Mar 4, 2025

Use GenAIComp base image to simplify Dockerfiles - part 1 #1612

Merged

1 task

eero-t mentioned this pull request Mar 17, 2025

Use GenAIComp base image to simplify Dockerfiles - part 3/4 #1671

Merged

1 task

eero-t force-pushed the use-base-image branch from 37e421e to 56a0222 Compare March 24, 2025 09:03

eero-t force-pushed the use-base-image branch from a1c1bc6 to 50a6a54 Compare March 28, 2025 09:37

eero-t force-pushed the use-base-image branch from 50a6a54 to d6cd853 Compare March 31, 2025 15:54

xiguiw mentioned this pull request Apr 1, 2025

opea/comps-base:latest image become too large opea-project/GenAIComps#1465

Closed

chensuyue approved these changes Apr 3, 2025

View reviewed changes

Use GenAIComp base image to simplify Dockerfiles - part 4/4

fccdb28

Update the last remaining application (megaservice) image of the 15 apps that use GenAIComps repo code as base. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

eero-t force-pushed the use-base-image branch from 7a7710d to fccdb28 Compare April 4, 2025 08:44

joshuayao added this to the v1.3 milestone Apr 8, 2025

Merge branch 'main' into use-base-image

9a3479f

joshuayao linked an issue Apr 8, 2025 that may be closed by this pull request

[Feature] Dockerfile Optimization for OPEA v1.3 #1585

Closed

8 tasks

joshuayao added this to OPEA Apr 9, 2025

joshuayao moved this to In review in OPEA Apr 9, 2025

joshuayao self-requested a review April 9, 2025 06:50

joshuayao approved these changes Apr 9, 2025

View reviewed changes

joshuayao merged commit 8b7cb35 into opea-project:main Apr 9, 2025
23 of 24 checks passed

github-project-automation bot moved this from In review to Done in OPEA Apr 9, 2025

eero-t deleted the use-base-image branch April 9, 2025 12:31

Use GenAIComp base image to simplify Dockerfiles & reduce image sizes #1369

Use GenAIComp base image to simplify Dockerfiles & reduce image sizes #1369

Uh oh!

Conversation

eero-t commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Future work

Uh oh!

github-actions bot commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

eero-t commented Jan 8, 2025

Uh oh!

mkbhanda commented Jan 8, 2025

Uh oh!

eero-t commented Jan 20, 2025

Uh oh!

eero-t commented Feb 5, 2025

Uh oh!

chensuyue commented Mar 4, 2025

Uh oh!

eero-t commented Mar 4, 2025

Uh oh!

eero-t commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eero-t commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eero-t commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chensuyue commented Mar 17, 2025

Uh oh!

eero-t commented Mar 17, 2025

Uh oh!

eero-t commented Mar 24, 2025

Uh oh!

eero-t commented Mar 28, 2025

Uh oh!

xiguiw commented Apr 1, 2025

Uh oh!

eero-t commented Apr 4, 2025

Uh oh!

eero-t commented Apr 4, 2025

Uh oh!

chensuyue commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eero-t commented Jan 8, 2025 •

edited

Loading

github-actions bot commented Jan 8, 2025 •

edited

Loading

eero-t commented Mar 4, 2025 •

edited

Loading

eero-t commented Mar 4, 2025 •

edited

Loading

eero-t commented Mar 14, 2025 •

edited

Loading