Skip to content

Sync finetuning changes from main to dev branch#90

Merged
AhmedSeemalK merged 60 commits intodevfrom
main
Apr 17, 2026
Merged

Sync finetuning changes from main to dev branch#90
AhmedSeemalK merged 60 commits intodevfrom
main

Conversation

@AhmedSeemalK
Copy link
Copy Markdown
Collaborator

Sync changes from main branch to development branch dev

arpannookala-12 and others added 30 commits March 10, 2026 18:07
Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Address PR review comments: correct the git clone URL to
opea-project/Enterprise-Inference, align model configuration with
.env.example, and add a prerequisite section listing required models.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Use consistent `docker compose` (not `docker-compose`) and list log
commands for all individual services for thoroughness.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Add EMBEDDING_API_ENDPOINT, RERANKER_API_ENDPOINT, and LLM_API_ENDPOINT
config vars so each service can target its own APISIX route. When set,
the service uses the per-model URL; when unset, it falls back to
GENAI_GATEWAY_URL for GenAI Gateway compatibility. Consistent with the
pattern used by RAGChatbot and other sample solutions.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
…n.md

- api_client.py: Remove /v1 from reranker URL (TEI uses /rerank, not /v1/rerank);
  add model name to rerank payload per TEI API requirements
- reranker-configuration.md: Scope guide to Xeon-only deployments with a note that
  Gaudi/TEI works out of the box; remove spurious :4000 port from BASE_URL; add
  TOKEN variable setup and replace literal "Token" with ${TOKEN} in all curl commands

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
- api_client.py: Branch on RERANKER_API_ENDPOINT to select URL path
  (/rerank vs /v1/rerank), payload field ("texts" vs "documents"),
  and response format (flat array vs nested results)
- reranker-configuration.md: Restructure guide to cover both Keycloak
  and GenAI Gateway deployments with separate curl examples, token
  setup, and expected responses
- README.md: Add Keycloak-specific notes for per-model APISIX route
  configuration and required API endpoint variables

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
- reranker-configuration.md: Scope entire guide to GenAI Gateway (LiteLLM)
  deployments; remove Keycloak/APISIX sections since reranker works out of
  the box for those deployments
- README.md: Clarify reranker post-deployment config is GenAI Gateway only;
  add note that Keycloak/APISIX needs no extra steps
- api_client.py: Send both "documents" and "texts" in rerank payload so it
  works across all backends (vLLM, TEI, LiteLLM) without branching

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
…k token TTL

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Gaudi (TEI) serves endpoints without /v1 prefix (/embeddings, /rerank)
while Xeon (vLLM) uses the /v1 prefix (/v1/embeddings, /v1/rerank).

- Add INFERENCE_BACKEND=vllm|tei to all three config.py files
- Update embedding, retrieval, and llm api_client.py to branch URL
  construction based on INFERENCE_BACKEND
- Pass INFERENCE_BACKEND through docker-compose.yml for all three services
- Add INFERENCE_BACKEND to .env.example with hardware guidance
- Scope reranker-configuration.md to GenAI Gateway + Xeon only
- Update README to reflect GenAI Gateway + Xeon scope and note that
  Keycloak tokens can be configured for longer TTL in Keycloak console

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
When LLM_API_ENDPOINT is set (APISIX/Keycloak), always keep /v1 prefix
regardless of INFERENCE_BACKEND. Only drop /v1 for GenAI Gateway + Gaudi
where LiteLLM itself handles the routing without the /v1 prefix.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Two issues were causing 500 errors when reranking over large uploads:

1. Batch size overflow (413): TOP_K_FUSION=50 sent all 50 candidates in
   a single rerank request, exceeding bge-reranker-base's max batch size.
   Fixed by adding RERANKER_MAX_BATCH_SIZE config (default 32) and
   looping over batches in rerank_pairs(). Index offsets are tracked so
   scores are written back to the correct positions in the full list.

2. Token length overflow (500 EngineCore): Technical document chunks
   tokenize at ~2 chars/token in worst case. At 1000-char truncation
   some docs in batch 2 exceeded the model's 512-token max sequence
   length (query + doc combined). Reduced truncation to 500 chars
   (~125 tokens), leaving safe headroom for the query and worst-case
   tokenization while preserving the leading context most relevant for
   reranking quality.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Clarify that MODEL_ENDPOINT values differ by deployment type:
- Xeon + Keycloak/APISIX: APISIX route name with -vllmcpu suffix
  (e.g. bge-base-en-v1.5-vllmcpu, bge-reranker-base-vllmcpu)
- Xeon + GenAI Gateway / Gaudi: HuggingFace model ID

Update APISIX endpoint URL examples in .env.example to use -vllmcpu
route names. Add deployment-type comparison table to README Configure
Models section.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
…c fixes

api_client.py (retrieval):
- Separate rerank payload by backend: Keycloak/APISIX uses "texts",
  GenAI Gateway uses "documents" — each backend expects its own field
- Add logger.info for raw reranker response per batch
- Clarify response format comments (Format 1 vs Format 2)

ingestion/config.py + main.py:
- Add embedding_batch_size config (default 32, must match embedding service)
- Use settings.embedding_batch_size instead of hardcoded 32 in main.py
- Log the batch size at start of embedding loop

docker-compose.yml + .env.example:
- Pass EMBEDDING_BATCH_SIZE to ingestion service so users can tune it
- Add EMBEDDING_BATCH_SIZE to .env.example with note to reduce for
  larger documents

reranker-configuration.md:
- Step 2: clarify TOKEN source (GenAI Gateway vault.yml, not Keycloak)
- Step 2: define BASE_URL with /v1 path so curl commands use /rerank
- Steps 3 + 7: update curl to use ${BASE_URL}/rerank
- Step 3: add note on "documents" vs "texts" field by deployment type
- Step 7: add Keycloak/APISIX response format (flat array) alongside
  GenAI Gateway format (nested results)

README.md:
- Replace docker-compose with docker compose throughout
- Expand log-checking section with per-service startup verification
  commands

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
BASE_URL must remain without /v1 because Steps 4 and 5 use the same
variable for LiteLLM admin endpoints (/model/info, /model/update)
which have no /v1 prefix. The inference curl commands correctly use
${BASE_URL}/v1/rerank explicitly.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Prevents bandit from scanning the HybridSearch dataset venv which
causes internal errors on Python 3.14 bytecode files.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Adds Trivy (vuln/misconfig/secret), Bandit, and ShellCheck scans
scoped to the HybridSearch sample solution. Runs on PR open/sync
and push to main/dev, with workflow_dispatch support for manual
PR scans.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
This reverts commit 33f85a1.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
GitHub Actions only picks up workflows from .github/workflows at the
repository root. Moves the SDLE scan workflow out of the
sample_solutions/HybridSearch subdirectory so it runs correctly.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
All Trivy, Bandit, and ShellCheck scans passed successfully.
Removing the workflow file as it is no longer needed on this branch.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
Signed-off-by: gopal-raj-suresh <gopal.raj.dummugudupu@cloud2labs.com>
Harika and others added 26 commits April 10, 2026 09:42
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
…version to 0.35.0

Signed-off-by: Harika <codewith3@gmail.com>
Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com>
Signed-off-by: Harika <codewith3@gmail.com>
Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
* Create code-scans.yaml

Workflow to scan the code for Security vulnerabilities and Code quality issues

* Updated the co-pilot review

* Update code-scans.yaml

Updated Trivy scan with latest stable version
Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>
* Release v1.5.2

Signed-off-by: amberjain1 <amber.jain@intel.com>
Signed-off-by: psurabh <pradeep.surabhi@intel.com>
Signed-off-by: mdfaheem-intel <mohammad.faheem@intel.com>
Signed-off-by: vivekrsintc <vivek.rs@intel.com>
Co-authored-by: pvishwan <pramodh.vishwanath@intel.com>
Co-authored-by: AhmedSeemalK <ahmed.seemal@intel.com>
Co-authored-by: vhpintel <vijay.kumar.h.p@intel.com>
Co-authored-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: jaswanth8888 <jaswanth.karani@intel.com>
Co-authored-by: sandeshk-intel <sandesh.kumar.s@intel.com>
Co-authored-by: vinayK34 <vinay3.kumar@intel.com>
Signed-off-by: Github Actions <actions@github.com>

* Adding Finetuning as a blueprint solution as part of release v1.5.2

Signed-off-by: S, Gurunath <gurunath.s@intel.com>

* False positive bandit san issue in gpu_engine file, added comment to supress it

Signed-off-by: S, Gurunath <gurunath.s@intel.com>

---------

Signed-off-by: amberjain1 <amber.jain@intel.com>
Signed-off-by: psurabh <pradeep.surabhi@intel.com>
Signed-off-by: mdfaheem-intel <mohammad.faheem@intel.com>
Signed-off-by: vivekrsintc <vivek.rs@intel.com>
Signed-off-by: Github Actions <actions@github.com>
Signed-off-by: S, Gurunath <gurunath.s@intel.com>
Co-authored-by: Github Actions <actions@github.com>
Co-authored-by: pvishwan <pramodh.vishwanath@intel.com>
Co-authored-by: AhmedSeemalK <ahmed.seemal@intel.com>
Co-authored-by: vhpintel <vijay.kumar.h.p@intel.com>
Co-authored-by: jaswanth8888 <jaswanth.karani@intel.com>
Co-authored-by: sandeshk-intel <sandesh.kumar.s@intel.com>
Co-authored-by: vinayK34 <vinay3.kumar@intel.com>
…ripts

Cld2labs/redhat9.6 deployment scripts
@AhmedSeemalK AhmedSeemalK merged commit d0c52d1 into dev Apr 17, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants