Skip to content

Conversation

@utsumi-fj
Copy link
Contributor

Allow configuration of document and query prefixes for text embedding in openai_api_compatible model plugin.

Related Issues or Context

Some embedding models improve accuracy when a prefix is specified. This PR makes it possible to set document and query prefixes.

Note: This pull request can be used even if the dify-plugin-sdk's bug langgenius/dify-plugin-sdks#241 is not yet fixed. Once the bug is fixed, the functionality will behave as intended.

This PR contains Changes to LLM Models Plugin

  • My Changes Affect Message Flow Handling (System Messages and User→Assistant Turn-Taking)
  • My Changes Affect Tool Interaction Flow (Multi-Round Usage and Output Handling, for both Agent App and Agent Node)
  • My Changes Affect Multimodal Input Handling (Images, PDFs, Audio, Video, etc.)
  • My Changes Affect Multimodal Output Generation (Images, Audio, Video, etc.)
  • My Changes Affect Structured Output Format (JSON, XML, etc.)
  • My Changes Affect Token Consumption Metrics
  • My Changes Affect Other LLM Functionalities (Reasoning Process, Grounding, Prompt Caching, etc.)
  • Other Changes (Add New Models, Fix Model Parameters etc.)
en_US zh_Hans

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

  • I have Bumped Up the Version in Manifest.yaml (Top-Level Version Field, Not in Meta Section)

Dify Plugin SDK Version

  • I have Ensured dify_plugin>=0.3.0,<0.6.0 is in requirements.txt (SDK docs)

Environment Verification (If Any Code Changes)

Local Deployment Environment

  • Dify Version is: 1.9.2, I have Tested My Changes on Local Deployment Dify with a Clean Environment That Matches the Production Configuration.

Allow configuration of document and query prefixes for text embedding.

Signed-off-by: Yuichiro Utsumi <utsumi.yuichiro@fujitsu.com>
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 17, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @utsumi-fj, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the openai_api_compatible plugin by enabling the configuration of document and query prefixes for text embedding. This feature addresses the need for certain embedding models to receive contextual prefixes, which can lead to more accurate and relevant embeddings. The changes include updates to the plugin's configuration schema to expose these new settings in the UI and modifications to the core Python logic to apply these prefixes dynamically.

Highlights

  • Text Embedding Prefixes: Introduced the capability to configure document and query prefixes for text embedding models within the openai_api_compatible plugin, enhancing embedding accuracy for models that benefit from such prefixes.
  • Configuration UI: Added new UI fields (document_prefix and query_prefix) to the plugin's configuration schema, allowing users to easily set these prefixes.
  • Core Logic Implementation: Modified the text embedding logic to apply the specified document or query prefixes to input texts before they are sent to the embedding model.
  • Version Bump: The plugin version in manifest.yaml has been incremented from 0.0.24 to 0.0.25.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the enhancement New feature or request label Nov 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a useful feature to allow configuring document and query prefixes for text embeddings in the openai_api_compatible model plugin. The implementation is clean and follows existing patterns. I've found one high-severity issue where an empty prefix would incorrectly add a leading space to texts, which could impact embedding results. I've provided a suggestion to fix this. Otherwise, the changes look good.

return ""

def _add_prefix(self, texts: list[str], prefix: str) -> list[str]:
return [f"{prefix} {text}" for text in texts]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation will add a leading space to each text if the prefix is empty (e.g., f' {text}'). This is likely not the intended behavior and could negatively affect embedding quality. This change ensures prefixes are only added when they are not empty, and avoids creating a new list unnecessarily when no prefix is provided.

Suggested change
return [f"{prefix} {text}" for text in texts]
return [f"{prefix} {text}" for text in texts] if prefix else texts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 18, 2025
@crazywoola crazywoola merged commit f0930df into langgenius:main Nov 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants