Sync finetuning changes from main to dev branch by AhmedSeemalK · Pull Request #90 · opea-project/Enterprise-Inference

AhmedSeemalK · 2026-04-17T06:24:09Z

Sync changes from main branch to development branch dev

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Address PR review comments: correct the git clone URL to opea-project/Enterprise-Inference, align model configuration with .env.example, and add a prerequisite section listing required models. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Use consistent `docker compose` (not `docker-compose`) and list log commands for all individual services for thoroughness. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Add EMBEDDING_API_ENDPOINT, RERANKER_API_ENDPOINT, and LLM_API_ENDPOINT config vars so each service can target its own APISIX route. When set, the service uses the per-model URL; when unset, it falls back to GENAI_GATEWAY_URL for GenAI Gateway compatibility. Consistent with the pattern used by RAGChatbot and other sample solutions. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

…n.md - api_client.py: Remove /v1 from reranker URL (TEI uses /rerank, not /v1/rerank); add model name to rerank payload per TEI API requirements - reranker-configuration.md: Scope guide to Xeon-only deployments with a note that Gaudi/TEI works out of the box; remove spurious :4000 port from BASE_URL; add TOKEN variable setup and replace literal "Token" with ${TOKEN} in all curl commands Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

- api_client.py: Branch on RERANKER_API_ENDPOINT to select URL path (/rerank vs /v1/rerank), payload field ("texts" vs "documents"), and response format (flat array vs nested results) - reranker-configuration.md: Restructure guide to cover both Keycloak and GenAI Gateway deployments with separate curl examples, token setup, and expected responses - README.md: Add Keycloak-specific notes for per-model APISIX route configuration and required API endpoint variables Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

- reranker-configuration.md: Scope entire guide to GenAI Gateway (LiteLLM) deployments; remove Keycloak/APISIX sections since reranker works out of the box for those deployments - README.md: Clarify reranker post-deployment config is GenAI Gateway only; add note that Keycloak/APISIX needs no extra steps - api_client.py: Send both "documents" and "texts" in rerank payload so it works across all backends (vLLM, TEI, LiteLLM) without branching Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

…k token TTL Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Gaudi (TEI) serves endpoints without /v1 prefix (/embeddings, /rerank) while Xeon (vLLM) uses the /v1 prefix (/v1/embeddings, /v1/rerank). - Add INFERENCE_BACKEND=vllm|tei to all three config.py files - Update embedding, retrieval, and llm api_client.py to branch URL construction based on INFERENCE_BACKEND - Pass INFERENCE_BACKEND through docker-compose.yml for all three services - Add INFERENCE_BACKEND to .env.example with hardware guidance - Scope reranker-configuration.md to GenAI Gateway + Xeon only - Update README to reflect GenAI Gateway + Xeon scope and note that Keycloak tokens can be configured for longer TTL in Keycloak console Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

When LLM_API_ENDPOINT is set (APISIX/Keycloak), always keep /v1 prefix regardless of INFERENCE_BACKEND. Only drop /v1 for GenAI Gateway + Gaudi where LiteLLM itself handles the routing without the /v1 prefix. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Two issues were causing 500 errors when reranking over large uploads: 1. Batch size overflow (413): TOP_K_FUSION=50 sent all 50 candidates in a single rerank request, exceeding bge-reranker-base's max batch size. Fixed by adding RERANKER_MAX_BATCH_SIZE config (default 32) and looping over batches in rerank_pairs(). Index offsets are tracked so scores are written back to the correct positions in the full list. 2. Token length overflow (500 EngineCore): Technical document chunks tokenize at ~2 chars/token in worst case. At 1000-char truncation some docs in batch 2 exceeded the model's 512-token max sequence length (query + doc combined). Reduced truncation to 500 chars (~125 tokens), leaving safe headroom for the query and worst-case tokenization while preserving the leading context most relevant for reranking quality. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Clarify that MODEL_ENDPOINT values differ by deployment type: - Xeon + Keycloak/APISIX: APISIX route name with -vllmcpu suffix (e.g. bge-base-en-v1.5-vllmcpu, bge-reranker-base-vllmcpu) - Xeon + GenAI Gateway / Gaudi: HuggingFace model ID Update APISIX endpoint URL examples in .env.example to use -vllmcpu route names. Add deployment-type comparison table to README Configure Models section. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

…c fixes api_client.py (retrieval): - Separate rerank payload by backend: Keycloak/APISIX uses "texts", GenAI Gateway uses "documents" — each backend expects its own field - Add logger.info for raw reranker response per batch - Clarify response format comments (Format 1 vs Format 2) ingestion/config.py + main.py: - Add embedding_batch_size config (default 32, must match embedding service) - Use settings.embedding_batch_size instead of hardcoded 32 in main.py - Log the batch size at start of embedding loop docker-compose.yml + .env.example: - Pass EMBEDDING_BATCH_SIZE to ingestion service so users can tune it - Add EMBEDDING_BATCH_SIZE to .env.example with note to reduce for larger documents reranker-configuration.md: - Step 2: clarify TOKEN source (GenAI Gateway vault.yml, not Keycloak) - Step 2: define BASE_URL with /v1 path so curl commands use /rerank - Steps 3 + 7: update curl to use ${BASE_URL}/rerank - Step 3: add note on "documents" vs "texts" field by deployment type - Step 7: add Keycloak/APISIX response format (flat array) alongside GenAI Gateway format (nested results) README.md: - Replace docker-compose with docker compose throughout - Expand log-checking section with per-service startup verification commands Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

BASE_URL must remain without /v1 because Steps 4 and 5 use the same variable for LiteLLM admin endpoints (/model/info, /model/update) which have no /v1 prefix. The inference curl commands correctly use ${BASE_URL}/v1/rerank explicitly. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

Prevents bandit from scanning the HybridSearch dataset venv which causes internal errors on Python 3.14 bytecode files. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Adds Trivy (vuln/misconfig/secret), Bandit, and ShellCheck scans scoped to the HybridSearch sample solution. Runs on PR open/sync and push to main/dev, with workflow_dispatch support for manual PR scans. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

This reverts commit 33f85a1. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

GitHub Actions only picks up workflows from .github/workflows at the repository root. Moves the SDLE scan workflow out of the sample_solutions/HybridSearch subdirectory so it runs correctly. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

All Trivy, Bandit, and ShellCheck scans passed successfully. Removing the workflow file as it is no longer needed on this branch. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

Signed-off-by: Harika <codewith3@gmail.com>

…version to 0.35.0 Signed-off-by: Harika <codewith3@gmail.com>

Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com> Signed-off-by: Harika <codewith3@gmail.com>

Signed-off-by: Harika <codewith3@gmail.com>

* Create code-scans.yaml Workflow to scan the code for Security vulnerabilities and Code quality issues * Updated the co-pilot review * Update code-scans.yaml Updated Trivy scan with latest stable version

Signed-off-by: Harika <codewith3@gmail.com>

Signed-off-by: alexsin368 <alex.sin@intel.com>

* Release v1.5.2 Signed-off-by: amberjain1 <amber.jain@intel.com> Signed-off-by: psurabh <pradeep.surabhi@intel.com> Signed-off-by: mdfaheem-intel <mohammad.faheem@intel.com> Signed-off-by: vivekrsintc <vivek.rs@intel.com> Co-authored-by: pvishwan <pramodh.vishwanath@intel.com> Co-authored-by: AhmedSeemalK <ahmed.seemal@intel.com> Co-authored-by: vhpintel <vijay.kumar.h.p@intel.com> Co-authored-by: sgurunat <gurunath.s@intel.com> Co-authored-by: jaswanth8888 <jaswanth.karani@intel.com> Co-authored-by: sandeshk-intel <sandesh.kumar.s@intel.com> Co-authored-by: vinayK34 <vinay3.kumar@intel.com> Signed-off-by: Github Actions <actions@github.com> * Adding Finetuning as a blueprint solution as part of release v1.5.2 Signed-off-by: S, Gurunath <gurunath.s@intel.com> * False positive bandit san issue in gpu_engine file, added comment to supress it Signed-off-by: S, Gurunath <gurunath.s@intel.com> --------- Signed-off-by: amberjain1 <amber.jain@intel.com> Signed-off-by: psurabh <pradeep.surabhi@intel.com> Signed-off-by: mdfaheem-intel <mohammad.faheem@intel.com> Signed-off-by: vivekrsintc <vivek.rs@intel.com> Signed-off-by: Github Actions <actions@github.com> Signed-off-by: S, Gurunath <gurunath.s@intel.com> Co-authored-by: Github Actions <actions@github.com> Co-authored-by: pvishwan <pramodh.vishwanath@intel.com> Co-authored-by: AhmedSeemalK <ahmed.seemal@intel.com> Co-authored-by: vhpintel <vijay.kumar.h.p@intel.com> Co-authored-by: jaswanth8888 <jaswanth.karani@intel.com> Co-authored-by: sandeshk-intel <sandesh.kumar.s@intel.com> Co-authored-by: vinayK34 <vinay3.kumar@intel.com>

…ripts Cld2labs/redhat9.6 deployment scripts

cld2labs/HybridSearch

cld2labs/Docugen-Microagents

add model-deployment folder

arpannookala-12 and others added 30 commits March 10, 2026 18:07

Add HybridSearch sample solution

824732b

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Fix docker compose command and add per-service log instructions

d7c6ae9

Use consistent `docker compose` (not `docker-compose`) and list log commands for all individual services for thoroughness. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Add Docugen-Microagents

d720bc8

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

update ReadMe and remove redundant images

a55818e

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

update README and addressed Docker user change

4e676cf

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

fix: resolve event loop conflicts and mermaid diagram rendering

31be992

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

add .github folder

6ff680b

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

update README with SSL verification

c788fe3

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

update trivy action version

95462cb

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

revert trivy version

80ae34f

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

Narrow reranker config scope to GenAI Gateway + Xeon and note Keycloa…

e992dd7

…k token TTL Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Add INFERENCE_BACKEND note to README model config section

c3e92ef

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

update trivy version

a890942

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

Add .venv-dataset to bandit exclude_dirs in .bandit config

33f85a1

Prevents bandit from scanning the HybridSearch dataset venv which causes internal errors on Python 3.14 bytecode files. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Revert "Add .venv-dataset to bandit exclude_dirs in .bandit config"

a6de873

This reverts commit 33f85a1. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Remove code-scans.yaml after security scans passed

a373f75

All Trivy, Bandit, and ShellCheck scans passed successfully. Removing the workflow file as it is no longer needed on this branch. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

remove code-scans file

d71d513

Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>

Harika and others added 26 commits April 10, 2026 09:42

Adding redhat deployment scripts

cd3010a

Signed-off-by: Harika <codewith3@gmail.com>

redhat deployment scripts

0c1684a

Signed-off-by: Harika <codewith3@gmail.com>

update keycloak values

ef87340

Signed-off-by: Harika <codewith3@gmail.com>

merging both apisix and genai into single document

ae603b7

Signed-off-by: Harika <codewith3@gmail.com>

merging both apisix and genai into single document

69d1176

Signed-off-by: Harika <codewith3@gmail.com>

update redhat README

468a205

Signed-off-by: Harika <codewith3@gmail.com>

update redhat README

5d7c5d6

Signed-off-by: Harika <codewith3@gmail.com>

update redhat README

9655656

Signed-off-by: Harika <codewith3@gmail.com>

update redhat README

451f26e

Signed-off-by: Harika <codewith3@gmail.com>

updated troubleshooting guide with right keycloack values

af341de

Signed-off-by: Harika <codewith3@gmail.com>

adding code scans file

1437e9f

Signed-off-by: Harika <codewith3@gmail.com>

remove file

89f39e7

Signed-off-by: Harika <codewith3@gmail.com>

update redhat README.md with mount ISO section and update trivy scan …

bf49aea

…version to 0.35.0 Signed-off-by: Harika <codewith3@gmail.com>

Update third_party/Dell/redhat9.6/iac/README.md

5db8534

Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com> Signed-off-by: Harika <codewith3@gmail.com>

Update third_party/Dell/redhat9.6/iac/README.md

16930c3

Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com> Signed-off-by: Harika <codewith3@gmail.com>

updated README. for redhat mount ISO

34c02e0

Signed-off-by: Harika <codewith3@gmail.com>

updated README. for redhat mount ISO

b186fa0

Signed-off-by: Harika <codewith3@gmail.com>

update wget URL's with opea repo links

8b3480e

Signed-off-by: Harika <codewith3@gmail.com>

PR workflow for SDLE scans (#60)

6774feb

* Create code-scans.yaml Workflow to scan the code for Security vulnerabilities and Code quality issues * Updated the co-pilot review * Update code-scans.yaml Updated Trivy scan with latest stable version

remove code-scans.yaml

0118289

Signed-off-by: Harika <codewith3@gmail.com>

add model-deployment folder

aa6b79d

Signed-off-by: alexsin368 <alex.sin@intel.com>

Merge pull request #73 from cld2labs/cld2labs/redhat9.6-deployment-sc…

69e4468

…ripts Cld2labs/redhat9.6 deployment scripts

Merge pull request #74 from cld2labs/cld2labs/HybridSearch

2ddc29f

cld2labs/HybridSearch

Merge pull request #75 from cld2labs/cld2labs/Docugen-Microagents

ca3dfd7

cld2labs/Docugen-Microagents

Merge pull request #89 from alexsin368/model-deployment

4df0676

add model-deployment folder

AhmedSeemalK requested review from amberjain1 and sgurunat April 17, 2026 06:24

sgurunat approved these changes Apr 17, 2026

View reviewed changes

AhmedSeemalK merged commit d0c52d1 into dev Apr 17, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync finetuning changes from main to dev branch#90

Sync finetuning changes from main to dev branch#90
AhmedSeemalK merged 60 commits intodevfrom
main

AhmedSeemalK commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

AhmedSeemalK commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants