-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Gh-4182, GH-849: Add HuggingFace EmbeddingModel and ConnectionDetails Auto-Configuration #4887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Profile-exe
wants to merge
20
commits into
spring-projects:main
Choose a base branch
from
Profile-exe:GH-4182-huggingface-embedding-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Gh-4182, GH-849: Add HuggingFace EmbeddingModel and ConnectionDetails Auto-Configuration #4887
Profile-exe
wants to merge
20
commits into
spring-projects:main
from
Profile-exe:GH-4182-huggingface-embedding-support
+7,074
−1,135
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update pom.xml files to include required dependencies for the new HuggingFace Embedding implementation and enhanced Chat functionality. Changes in models/spring-ai-huggingface/pom.xml: - Add spring-ai-retry dependency for retry support - Add spring-ai-client-chat test dependency for fluent API tests - Add micrometer-observation-test for observation testing Changes in auto-configurations/.../pom.xml: - Add spring-ai-autoconfigure-model-tool for tool calling support - Add spring-ai-autoconfigure-retry for retry auto-configuration These dependencies enable: - Retry logic for both Chat and Embedding models - Tool/function calling capabilities for Chat - Comprehensive observability testing - Fluent ChatClient API integration Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Add HUGGINGFACE enum entry to AiProvider to support observability and metrics collection for HuggingFace Chat and Embedding models. This follows the established pattern used by other AI providers (OpenAI, Ollama, Anthropic, etc.). Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceApi as a unified client supporting both Chat and Embedding:
- Chat completions via OpenAI-compatible endpoint (/v1/chat/completions)
- Embeddings via Feature Extraction endpoint (/models/{model}/pipeline/feature-extraction)
Key features:
- Builder pattern with validation
- Bearer token authentication
- RestClient-based implementation
- Response error handling
- Streaming support for Chat
- Support for both 1D and 2D embedding responses
This follows the pattern established by OllamaApi while supporting HuggingFace's
dual API structure (OpenAI-compatible for chat, native API for embeddings).
Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceChatOptions and HuggingfaceEmbeddingOptions following the pattern established by Ollama and OpenAI integrations. Both classes include: - Builder pattern for fluent configuration - fromOptions() static factory method for option merging - toMap() method for ModelOptionsUtils integration - Proper equals(), hashCode(), copy() implementations - Comprehensive unit tests Chat options support: model, temperature, maxTokens, topP, frequencyPenalty, presencePenalty, stop, toolChoice, tools. Embedding options support: model (default: sentence-transformers/all-MiniLM-L6-v2). Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceEmbeddingModel extending AbstractEmbeddingModel to provide text embedding functionality through the HuggingFace Inference API. Features: - Support for HuggingFace Feature Extraction API - Full observability/metrics integration - Retry support using configurable RetryTemplate - Options merging (runtime + defaults) - Proper metadata handling (model name, usage) - Builder pattern for flexible configuration The implementation handles both 1D and 2D embedding responses from the API and converts them to the standard float[] format required by Spring AI's EmbeddingModel interface. Unit test coverage: - Basic embedding generation - Options merging - Error handling - Metadata extraction Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Update HuggingfaceChatModel to use the new unified HuggingfaceApi client and HuggingfaceChatOptions class, replacing the previous implementation. Changes: - Use unified HuggingfaceApi for chat completions - Migrate from ad-hoc configuration to HuggingfaceChatOptions - Enhanced tool calling support using ToolCallingManager - Improved observability integration - Better error handling and retry logic - Updated test configuration This refactoring aligns the Chat implementation with the new Embedding implementation and follows patterns established by OpenAI and Ollama integrations. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceRuntimeHints to provide reflection hints for GraalVM native image compilation. This enables HuggingFace Chat and Embedding models to work in native executables. Hints include: - HuggingfaceApi request/response classes - ChatModel and EmbeddingModel implementations - All JSON-serializable POJOs This follows the same pattern used by OpenAI and Ollama integrations to ensure compatibility with Spring Native and GraalVM. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…pattern Implement HuggingfaceApiAutoConfiguration as the foundation for HuggingFace Chat and Embedding auto-configuration. This follows Spring Boot 3.1+ ConnectionDetails pattern: - HuggingfaceConnectionDetails: Interface for connection information - HuggingfaceConnectionProperties: Property-based implementation (spring.ai.huggingface.api-key) - HuggingfaceApiAutoConfiguration: Provides ConnectionDetails bean The API key is shared between Chat and Embedding, while each model type creates its own HuggingfaceApi instance with model-specific base URLs in their respective auto-configurations. This pattern enables flexible configuration and testing through custom ConnectionDetails beans. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
… pattern Update HuggingfaceChatAutoConfiguration to use ConnectionDetails pattern and create a Chat-specific HuggingfaceApi bean. Key changes: - Use HuggingfaceConnectionDetails for API key - Create @qualifier("huggingfaceChatApi") bean for Chat - Enhance HuggingfaceChatProperties to include URL configuration - Support spring.ai.huggingface.chat.url - Proper conditional enablement using @ConditionalOnProperty Configuration example: ```yaml spring: ai: huggingface: api-key: ${HUGGINGFACE_API_KEY} chat: enabled: true url: https://api-inference.huggingface.co/v1 options: model: meta-llama/Llama-3.2-3B-Instruct temperature: 0.7 ``` Test coverage: - Auto-configuration enablement - Bean creation and wiring - Property binding - Integration with actual API (IT) Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceEmbeddingAutoConfiguration to provide Spring Boot auto-configuration for HuggingFace Embedding models. The auto-configuration: - Creates @qualifier("huggingfaceEmbeddingApi") bean for Embedding - Configures HuggingfaceEmbeddingModel with observability and retry support - Uses ConnectionDetails pattern for API key - Supports spring.ai.huggingface.embedding.* properties Configuration example: ```yaml spring: ai: huggingface: api-key: ${HUGGINGFACE_API_KEY} embedding: enabled: true url: https://api-inference.huggingface.co options: model: sentence-transformers/all-MiniLM-L6-v2 ``` Update AutoConfiguration.imports to register: - HuggingfaceApiAutoConfiguration - HuggingfaceChatAutoConfiguration - HuggingfaceEmbeddingAutoConfiguration Test coverage: - Auto-configuration enablement - Bean creation and wiring - Property binding - Integration with actual API (IT) - Embedding generation and metadata Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…odels Add extensive integration tests covering all aspects of HuggingFace Chat and Embedding models. Test coverage: Chat model integration tests: - Basic chat completion - Streaming responses - Multi-turn conversations - Tool/function calling using MockWeatherService - Error handling and edge cases Embedding model integration tests: - Single and batch embedding generation - Various embedding models - Metadata extraction - Error scenarios Observation/metrics tests: - Chat model observability (request/response, tokens, latency) - Embedding model observability (request/response, dimensions) - Proper span creation and context propagation - Metric tags and attributes Retry tests: - Transient error retry behavior - Retry exhaustion - Backoff strategies BaseHuggingfaceIT class provides common configuration and utilities for all integration tests, including API key validation and test environment setup. Tool calling tests demonstrate HuggingFace's function calling capabilities with a weather service example. All tests are marked with @EnabledIfEnvironmentVariable to only run when HUGGINGFACE_API_KEY is available. also includes -Pintegration-tests option. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Remove the outdated ClientIT.java test file which has been replaced by the new comprehensive integration test suite: - HuggingfaceChatModelIT.java - HuggingfaceEmbeddingModelIT.java - HuggingfaceChatModelObservationIT.java - HuggingfaceEmbeddingModelObservationIT.java - HuggingfaceApiToolFunctionCallIT.java The new test suite provides more comprehensive coverage including: - Both Chat and Embedding model testing - Observability/metrics validation - Tool calling functionality - Proper test organization following Spring AI patterns Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…ding Add complete user-facing documentation following Spring AI documentation standards and patterns established by OpenAI and Ollama integrations. New documentation (huggingface-embeddings.adoc - 269 lines): - Prerequisites and API key setup - Auto-configuration with Spring Boot - Three property tables (Retry/Connection/Configuration) - Runtime options and examples - Sample controller implementation - Manual configuration - Supported models list Enhanced Chat documentation (huggingface.adoc - 191 → 420 lines): - Expanded prerequisites section - Comprehensive property tables (Retry/Connection/Configuration) - Runtime options with examples - Function calling documentation - Low-level API usage - Supported models list - Observability section - Added NOTE about streaming not currently supported - Added NOTE clarifying default values are API defaults Enhanced provider overview (index.adoc - 15 → 138 lines): - Platform overview and features - Spring AI support summary (Chat + Embedding) - Popular models by category - Deployment options (Inference Endpoints, Serverless, Dedicated) - Getting started guide - Additional resources Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
- Fixed ChatRequest constructor in low-level API example
- Before: new ChatRequest(model, messages, 0.7, null, false) - compilation error
- After: new ChatRequest(model, messages, options) with Map<String, Object>
- Corrected property default values in configuration table
- temperature: 0.7 → - (API default, not hardcoded)
- frequency-penalty: 0.0 → - (API default)
- presence-penalty: 0.0 → - (API default)
- Removed obsolete property entry
- Deleted: spring.ai.huggingface.chat.enabled (Removed and no longer valid)
- Clarified embedding endpoint path
- Before: /models/{model}/pipeline/feature-extraction (incomplete)
- After: /{model}/pipeline/feature-extraction (relative to base URL with clarification)
- Removed unimplemented feature claim
- Removed "Streaming responses" from provider overview (not yet implemented)
Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…ation Remove unsupported 'dimensions' option and add officially supported options to accurately match the HuggingFace Feature Extraction API specification at https://huggingface.co/docs/inference-providers/tasks/feature-extraction Changes to HuggingfaceEmbeddingOptions: - Remove: dimensions field (not supported by Feature Extraction API) - Add: promptName field (prompt_name in JSON) - Add: truncate field - Add: truncationDirection field (truncation_direction in JSON) - Update: All field documentation to remove incorrect "TEI server only" claims - Fix: truncationDirection values to lowercase ("left"/"right" per API spec) - Keep: getDimensions() returning null for EmbeddingOptions interface compliance The 4 supported options per official API specification: 1. normalize (boolean) - Whether to normalize embedding vectors 2. prompt_name (string) - Predefined prompt from model configuration 3. truncate (boolean) - Whether to truncate text exceeding max length 4. truncation_direction (enum: "left"|"right") - Which side to truncate Test updates: - Remove all dimensions-related tests (11 test cases updated) - Add tests for new prompt_name, truncate, truncation_direction options - Fix HuggingfaceEmbeddingModelObservationIT baseUrl configuration (404 fix) - Update auto-configuration tests to use supported options Documentation updates: - Remove all "(TEI server only)" references from configuration table - Clarify that these options are part of standard Feature Extraction API - Fix truncation_direction case in examples: "Right" → "right" - Update section title: "TEI-Specific Options" → "Advanced Options" Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Add missing supported options and remove unsupported option to accurately match the HuggingFace Chat Completion API specification at https://huggingface.co/docs/inference-providers/tasks/chat-completion This addresses the opposite problem from Embedding (commit 14): - Embedding had unsupported 'dimensions' option (removed) - Chat was missing 6 supported options (now added) Changes to HuggingfaceChatOptions: - Add: stop field (List<String>) - up to 4 stop sequences - Add: seed field (Integer) - for reproducible outputs - Add: responseFormat field (Map<String, Object>) - JSON mode support - Add: toolPrompt field (String) - prompt appended before tools - Add: logprobs field (Boolean) - return log probabilities - Add: topLogprobs field (Integer) - number of top tokens (0-5) - Remove: top_k field (not in official Chat Completion API spec) - Fix: getStopSequences() implementation (was incorrectly no-op) - Keep: getTopK() returning null for ChatOptions interface compliance The 6 added options per official API specification: 1. stop (array, max 4) - Sequences where API stops generating tokens 2. seed (integer) - For reproducibility with same seed/parameters 3. response_format (object) - Output format (e.g., {"type": "json_object"}) 4. tool_prompt (string) - Prompt appended before tools in function calling 5. logprobs (boolean) - Whether to return log probabilities 6. top_logprobs (integer, 0-5) - Number of most likely tokens to return Test updates: - Remove all top_k-related tests and usages - Add tests for all 6 new options - Add integration tests: testStopSequences, testSeedForReproducibility, testResponseFormatJsonObject - Update auto-configuration tests with new parameters - Total: 21 unit tests + 16 request tests + 3 new IT tests Documentation updates: - Remove all "TGI" (Text Generation Inference) references - Remove top-k configuration entry - Add 6 new parameter entries to configuration table - Add "Advanced Options" section with stop sequences and seed examples Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Apply consistent code formatting following Spring Java Format conventions. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
- Remove unused swagger-codegen-maven-plugin and dependencies from pom.xml - Delete legacy openapi.json file (TGI API spec, not used by current implementation) - Fix incorrect comment in HuggingfaceEmbeddingModelIT.java about TEI-only parameters - Improve truncationDirection parameter documentation (case-sensitive values) The HuggingFace implementation uses official HuggingFace Inference API: - Chat: OpenAI-compatible /v1/chat/completions endpoint - Embedding: Feature Extraction pipeline endpoint All removed components were legacy artifacts not used by the current implementation. This commit ensures 100% alignment with official API specifications and removes any misleading references to Text Generation Inference (TGI) or Text Embeddings Inference (TEI) servers. References: - https://huggingface.co/docs/inference-providers/tasks/chat-completion - https://huggingface.co/docs/inference-providers/tasks/feature-extraction Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement function calling (tool calling) support following OpenAI and Ollama patterns: Core Implementation: - Add tools and tool_choice support to HuggingfaceApi.ChatRequest - Implement ToolCallingChatOptions handling in buildChatRequest() - Add explicit merge for @JsonIgnore fields (toolCallbacks, toolNames, toolContext) - Implement tool definitions resolution and conversion to API format - Add complete message type handling (USER, ASSISTANT, SYSTEM, TOOL) Testing: - Add HuggingfaceChatModelFunctionCallingIT with 2 test cases - Test basic function calling with automatic tool execution - Test ToolContext propagation with BiFunction signature - Use meta-llama/Llama-3.2-3B-Instruct:together model (provider suffix required) Documentation: - Enhance huggingface.adoc with comprehensive function calling section - Add provider suffix notation requirements and model compatibility info - Include configuration examples and HuggingFace guide references - Document streaming function calling limitation Technical Notes: - ModelOptionsUtils.merge() ignores @JsonIgnore fields - requires explicit merge - ToolCallingChatOptions.class must be used as source type in copyToTarget() - Streaming function calling not yet supported (planned for WebClient integration) References: - https://huggingface.co/docs/inference-providers/guides/function-calling - https://huggingface.co/collections/MarketAgents/function-calling-models-tool-use Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
c858e49 to
d0417dc
Compare
Update HuggingFace modules to align with Spring Boot 4.0.0-RC2 upgrade (spring-projects#4774) after rebasing onto main branch: RetryTemplate Migration (Spring Framework 7): - Update imports from org.springframework.retry.support to org.springframework.core.retry - Replace retryTemplate.execute(ctx -> ...) with RetryUtils.execute(retryTemplate, () -> ...) - Rewrite HuggingfaceRetryTests to use new RetryListener interface with RetryPolicy/Retryable RestClientAutoConfiguration Migration (Spring Boot 3.2+): - Update imports from org.springframework.boot.autoconfigure.web.client to org.springframework.boot.restclient.autoconfigure - Add spring-boot-starter-restclient dependency to autoconfigure pom.xml Test Improvements: - Fix beanOutputConverterRecords test with simpler prompt and lower temperature - Ensure consistent JSON response generation Modified Files: - HuggingfaceChatModel.java, HuggingfaceEmbeddingModel.java (RetryUtils pattern) - HuggingfaceRetryTests.java (complete rewrite for new API) - HuggingfaceChatModelIT.java (test fix) - All autoconfigure classes (RestClientAutoConfiguration import) - autoconfigure pom.xml (new dependency) This commit ensures compatibility with Spring Boot 4.0.0-RC2 and Spring Framework 7 following the main branch upgrade in commit d5e92be (spring-projects#4774). Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Author
|
This PR is ready for review. It adds comprehensive HuggingFace integration including:
Would appreciate your feedback when available. Thank you! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add HuggingFace Embedding Support with Enhanced Chat Implementation
Partially addresses #4182
Fixes #849
Summary
This PR adds comprehensive HuggingFace integration to Spring AI:
Key Changes
Core Implementation
HuggingfaceApiclient supporting Chat and Embedding endpointsHuggingfaceEmbeddingModelwith Feature Extraction API integrationHuggingfaceChatModelrefactored for OpenAI-compatible/v1/chat/completionsendpointAuto-Configuration
Spring Boot 4.0 / Spring Framework 7 Compatibility
spring-retrytospring-coreRetryUtils.execute()patternAPI Specification Compliance
Chat: Aligned with Chat Completion API and Function Calling Guide
Embedding: Aligned with Feature Extraction API
Configuration Example
Testing
Comprehensive test coverage includes: model functionality, function calling, auto-configuration, observability, error handling/retry, and AOT compatibility.
Documentation
Complete Antora documentation covering Chat, Embedding, and function calling with model compatibility guidance.