Gh-4182, GH-849: Add HuggingFace EmbeddingModel and ConnectionDetails Auto-Configuration #4887

Profile-exe · 2025-11-15T20:00:19Z

Add HuggingFace Embedding Support with Enhanced Chat Implementation

Partially addresses #4182
Fixes #849

Summary

This PR adds comprehensive HuggingFace integration to Spring AI:

New EmbeddingModel - Cloud-based embeddings via HuggingFace Inference API
Enhanced ChatModel - Full alignment with official Chat Completion API, including function-calling support
Modern Architecture - ConnectionDetails pattern, unified API client, complete observability
Spring Boot 4.0 Compatibility - Updated for Spring Framework 7 RetryTemplate migration

Key Changes

Core Implementation

Unified HuggingfaceApi client supporting Chat and Embedding endpoints
HuggingfaceEmbeddingModel with Feature Extraction API integration
HuggingfaceChatModel refactored for OpenAI-compatible /v1/chat/completions endpoint
Function calling with tools/tool_choice support via ToolCallingManager
AOT/GraalVM native image support with reflection hints

Auto-Configuration

ConnectionDetails pattern (Spring Boot 3.1+) following Ollama architecture
Separate Chat and Embedding auto-configurations
Property-based configuration following Spring AI conventions

Spring Boot 4.0 / Spring Framework 7 Compatibility

RetryTemplate migration from spring-retry to spring-core
Updated to use RetryUtils.execute() pattern
RestClientAutoConfiguration package migration

API Specification Compliance

Chat: Aligned with Chat Completion API and Function Calling Guide

Supports: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, stop, seed, response_format, tools, tool_choice

Embedding: Aligned with Feature Extraction API

Supports: normalize, prompt_name, truncate, truncation_direction

Configuration Example

spring.ai.huggingface:
  api-key: ${HUGGINGFACE_API_KEY}
  chat:
    url: https://router.huggingface.co/v1
    options.model: meta-llama/Llama-3.2-3B-Instruct
  embedding:
    url: https://router.huggingface.co/hf-inference/models
    options.model: sentence-transformers/all-MiniLM-L6-v2

Testing

# Unit tests
./mvnw clean test -pl models/spring-ai-huggingface,auto-configurations/models/spring-ai-autoconfigure-model-huggingface

# Integration tests (requires API key)
export HUGGINGFACE_API_KEY=your_key
./mvnw clean verify -Pintegration-tests -pl models/spring-ai-huggingface

Comprehensive test coverage includes: model functionality, function calling, auto-configuration, observability, error handling/retry, and AOT compatibility.

Documentation

Complete Antora documentation covering Chat, Embedding, and function calling with model compatibility guidance.

Update pom.xml files to include required dependencies for the new HuggingFace Embedding implementation and enhanced Chat functionality. Changes in models/spring-ai-huggingface/pom.xml: - Add spring-ai-retry dependency for retry support - Add spring-ai-client-chat test dependency for fluent API tests - Add micrometer-observation-test for observation testing Changes in auto-configurations/.../pom.xml: - Add spring-ai-autoconfigure-model-tool for tool calling support - Add spring-ai-autoconfigure-retry for retry auto-configuration These dependencies enable: - Retry logic for both Chat and Embedding models - Tool/function calling capabilities for Chat - Comprehensive observability testing - Fluent ChatClient API integration Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Add HUGGINGFACE enum entry to AiProvider to support observability and metrics collection for HuggingFace Chat and Embedding models. This follows the established pattern used by other AI providers (OpenAI, Ollama, Anthropic, etc.). Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Implement HuggingfaceApi as a unified client supporting both Chat and Embedding: - Chat completions via OpenAI-compatible endpoint (/v1/chat/completions) - Embeddings via Feature Extraction endpoint (/models/{model}/pipeline/feature-extraction) Key features: - Builder pattern with validation - Bearer token authentication - RestClient-based implementation - Response error handling - Streaming support for Chat - Support for both 1D and 2D embedding responses This follows the pattern established by OllamaApi while supporting HuggingFace's dual API structure (OpenAI-compatible for chat, native API for embeddings). Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Implement HuggingfaceChatOptions and HuggingfaceEmbeddingOptions following the pattern established by Ollama and OpenAI integrations. Both classes include: - Builder pattern for fluent configuration - fromOptions() static factory method for option merging - toMap() method for ModelOptionsUtils integration - Proper equals(), hashCode(), copy() implementations - Comprehensive unit tests Chat options support: model, temperature, maxTokens, topP, frequencyPenalty, presencePenalty, stop, toolChoice, tools. Embedding options support: model (default: sentence-transformers/all-MiniLM-L6-v2). Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Implement HuggingfaceEmbeddingModel extending AbstractEmbeddingModel to provide text embedding functionality through the HuggingFace Inference API. Features: - Support for HuggingFace Feature Extraction API - Full observability/metrics integration - Retry support using configurable RetryTemplate - Options merging (runtime + defaults) - Proper metadata handling (model name, usage) - Builder pattern for flexible configuration The implementation handles both 1D and 2D embedding responses from the API and converts them to the standard float[] format required by Spring AI's EmbeddingModel interface. Unit test coverage: - Basic embedding generation - Options merging - Error handling - Metadata extraction Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Update HuggingfaceChatModel to use the new unified HuggingfaceApi client and HuggingfaceChatOptions class, replacing the previous implementation. Changes: - Use unified HuggingfaceApi for chat completions - Migrate from ad-hoc configuration to HuggingfaceChatOptions - Enhanced tool calling support using ToolCallingManager - Improved observability integration - Better error handling and retry logic - Updated test configuration This refactoring aligns the Chat implementation with the new Embedding implementation and follows patterns established by OpenAI and Ollama integrations. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Implement HuggingfaceRuntimeHints to provide reflection hints for GraalVM native image compilation. This enables HuggingFace Chat and Embedding models to work in native executables. Hints include: - HuggingfaceApi request/response classes - ChatModel and EmbeddingModel implementations - All JSON-serializable POJOs This follows the same pattern used by OpenAI and Ollama integrations to ensure compatibility with Spring Native and GraalVM. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

…pattern Implement HuggingfaceApiAutoConfiguration as the foundation for HuggingFace Chat and Embedding auto-configuration. This follows Spring Boot 3.1+ ConnectionDetails pattern: - HuggingfaceConnectionDetails: Interface for connection information - HuggingfaceConnectionProperties: Property-based implementation (spring.ai.huggingface.api-key) - HuggingfaceApiAutoConfiguration: Provides ConnectionDetails bean The API key is shared between Chat and Embedding, while each model type creates its own HuggingfaceApi instance with model-specific base URLs in their respective auto-configurations. This pattern enables flexible configuration and testing through custom ConnectionDetails beans. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

@qualifier

… pattern Update HuggingfaceChatAutoConfiguration to use ConnectionDetails pattern and create a Chat-specific HuggingfaceApi bean. Key changes: - Use HuggingfaceConnectionDetails for API key - Create @qualifier("huggingfaceChatApi") bean for Chat - Enhance HuggingfaceChatProperties to include URL configuration - Support spring.ai.huggingface.chat.url - Proper conditional enablement using @ConditionalOnProperty Configuration example: ```yaml spring: ai: huggingface: api-key: ${HUGGINGFACE_API_KEY} chat: enabled: true url: https://api-inference.huggingface.co/v1 options: model: meta-llama/Llama-3.2-3B-Instruct temperature: 0.7 ``` Test coverage: - Auto-configuration enablement - Bean creation and wiring - Property binding - Integration with actual API (IT) Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

@qualifier

Implement HuggingfaceEmbeddingAutoConfiguration to provide Spring Boot auto-configuration for HuggingFace Embedding models. The auto-configuration: - Creates @qualifier("huggingfaceEmbeddingApi") bean for Embedding - Configures HuggingfaceEmbeddingModel with observability and retry support - Uses ConnectionDetails pattern for API key - Supports spring.ai.huggingface.embedding.* properties Configuration example: ```yaml spring: ai: huggingface: api-key: ${HUGGINGFACE_API_KEY} embedding: enabled: true url: https://api-inference.huggingface.co options: model: sentence-transformers/all-MiniLM-L6-v2 ``` Update AutoConfiguration.imports to register: - HuggingfaceApiAutoConfiguration - HuggingfaceChatAutoConfiguration - HuggingfaceEmbeddingAutoConfiguration Test coverage: - Auto-configuration enablement - Bean creation and wiring - Property binding - Integration with actual API (IT) - Embedding generation and metadata Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

…odels Add extensive integration tests covering all aspects of HuggingFace Chat and Embedding models. Test coverage: Chat model integration tests: - Basic chat completion - Streaming responses - Multi-turn conversations - Tool/function calling using MockWeatherService - Error handling and edge cases Embedding model integration tests: - Single and batch embedding generation - Various embedding models - Metadata extraction - Error scenarios Observation/metrics tests: - Chat model observability (request/response, tokens, latency) - Embedding model observability (request/response, dimensions) - Proper span creation and context propagation - Metric tags and attributes Retry tests: - Transient error retry behavior - Retry exhaustion - Backoff strategies BaseHuggingfaceIT class provides common configuration and utilities for all integration tests, including API key validation and test environment setup. Tool calling tests demonstrate HuggingFace's function calling capabilities with a weather service example. All tests are marked with @EnabledIfEnvironmentVariable to only run when HUGGINGFACE_API_KEY is available. also includes -Pintegration-tests option. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Remove the outdated ClientIT.java test file which has been replaced by the new comprehensive integration test suite: - HuggingfaceChatModelIT.java - HuggingfaceEmbeddingModelIT.java - HuggingfaceChatModelObservationIT.java - HuggingfaceEmbeddingModelObservationIT.java - HuggingfaceApiToolFunctionCallIT.java The new test suite provides more comprehensive coverage including: - Both Chat and Embedding model testing - Observability/metrics validation - Tool calling functionality - Proper test organization following Spring AI patterns Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

…ding Add complete user-facing documentation following Spring AI documentation standards and patterns established by OpenAI and Ollama integrations. New documentation (huggingface-embeddings.adoc - 269 lines): - Prerequisites and API key setup - Auto-configuration with Spring Boot - Three property tables (Retry/Connection/Configuration) - Runtime options and examples - Sample controller implementation - Manual configuration - Supported models list Enhanced Chat documentation (huggingface.adoc - 191 → 420 lines): - Expanded prerequisites section - Comprehensive property tables (Retry/Connection/Configuration) - Runtime options with examples - Function calling documentation - Low-level API usage - Supported models list - Observability section - Added NOTE about streaming not currently supported - Added NOTE clarifying default values are API defaults Enhanced provider overview (index.adoc - 15 → 138 lines): - Platform overview and features - Spring AI support summary (Chat + Embedding) - Popular models by category - Deployment options (Inference Endpoints, Serverless, Dedicated) - Getting started guide - Additional resources Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

- Fixed ChatRequest constructor in low-level API example - Before: new ChatRequest(model, messages, 0.7, null, false) - compilation error - After: new ChatRequest(model, messages, options) with Map<String, Object> - Corrected property default values in configuration table - temperature: 0.7 → - (API default, not hardcoded) - frequency-penalty: 0.0 → - (API default) - presence-penalty: 0.0 → - (API default) - Removed obsolete property entry - Deleted: spring.ai.huggingface.chat.enabled (Removed and no longer valid) - Clarified embedding endpoint path - Before: /models/{model}/pipeline/feature-extraction (incomplete) - After: /{model}/pipeline/feature-extraction (relative to base URL with clarification) - Removed unimplemented feature claim - Removed "Streaming responses" from provider overview (not yet implemented) Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

…ation Remove unsupported 'dimensions' option and add officially supported options to accurately match the HuggingFace Feature Extraction API specification at https://huggingface.co/docs/inference-providers/tasks/feature-extraction Changes to HuggingfaceEmbeddingOptions: - Remove: dimensions field (not supported by Feature Extraction API) - Add: promptName field (prompt_name in JSON) - Add: truncate field - Add: truncationDirection field (truncation_direction in JSON) - Update: All field documentation to remove incorrect "TEI server only" claims - Fix: truncationDirection values to lowercase ("left"/"right" per API spec) - Keep: getDimensions() returning null for EmbeddingOptions interface compliance The 4 supported options per official API specification: 1. normalize (boolean) - Whether to normalize embedding vectors 2. prompt_name (string) - Predefined prompt from model configuration 3. truncate (boolean) - Whether to truncate text exceeding max length 4. truncation_direction (enum: "left"|"right") - Which side to truncate Test updates: - Remove all dimensions-related tests (11 test cases updated) - Add tests for new prompt_name, truncate, truncation_direction options - Fix HuggingfaceEmbeddingModelObservationIT baseUrl configuration (404 fix) - Update auto-configuration tests to use supported options Documentation updates: - Remove all "(TEI server only)" references from configuration table - Clarify that these options are part of standard Feature Extraction API - Fix truncation_direction case in examples: "Right" → "right" - Update section title: "TEI-Specific Options" → "Advanced Options" Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Add missing supported options and remove unsupported option to accurately match the HuggingFace Chat Completion API specification at https://huggingface.co/docs/inference-providers/tasks/chat-completion This addresses the opposite problem from Embedding (commit 14): - Embedding had unsupported 'dimensions' option (removed) - Chat was missing 6 supported options (now added) Changes to HuggingfaceChatOptions: - Add: stop field (List<String>) - up to 4 stop sequences - Add: seed field (Integer) - for reproducible outputs - Add: responseFormat field (Map<String, Object>) - JSON mode support - Add: toolPrompt field (String) - prompt appended before tools - Add: logprobs field (Boolean) - return log probabilities - Add: topLogprobs field (Integer) - number of top tokens (0-5) - Remove: top_k field (not in official Chat Completion API spec) - Fix: getStopSequences() implementation (was incorrectly no-op) - Keep: getTopK() returning null for ChatOptions interface compliance The 6 added options per official API specification: 1. stop (array, max 4) - Sequences where API stops generating tokens 2. seed (integer) - For reproducibility with same seed/parameters 3. response_format (object) - Output format (e.g., {"type": "json_object"}) 4. tool_prompt (string) - Prompt appended before tools in function calling 5. logprobs (boolean) - Whether to return log probabilities 6. top_logprobs (integer, 0-5) - Number of most likely tokens to return Test updates: - Remove all top_k-related tests and usages - Add tests for all 6 new options - Add integration tests: testStopSequences, testSeedForReproducibility, testResponseFormatJsonObject - Update auto-configuration tests with new parameters - Total: 21 unit tests + 16 request tests + 3 new IT tests Documentation updates: - Remove all "TGI" (Text Generation Inference) references - Remove top-k configuration entry - Add 6 new parameter entries to configuration table - Add "Advanced Options" section with stop sequences and seed examples Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Apply consistent code formatting following Spring Java Format conventions. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

- Remove unused swagger-codegen-maven-plugin and dependencies from pom.xml - Delete legacy openapi.json file (TGI API spec, not used by current implementation) - Fix incorrect comment in HuggingfaceEmbeddingModelIT.java about TEI-only parameters - Improve truncationDirection parameter documentation (case-sensitive values) The HuggingFace implementation uses official HuggingFace Inference API: - Chat: OpenAI-compatible /v1/chat/completions endpoint - Embedding: Feature Extraction pipeline endpoint All removed components were legacy artifacts not used by the current implementation. This commit ensures 100% alignment with official API specifications and removes any misleading references to Text Generation Inference (TGI) or Text Embeddings Inference (TEI) servers. References: - https://huggingface.co/docs/inference-providers/tasks/chat-completion - https://huggingface.co/docs/inference-providers/tasks/feature-extraction Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Implement function calling (tool calling) support following OpenAI and Ollama patterns: Core Implementation: - Add tools and tool_choice support to HuggingfaceApi.ChatRequest - Implement ToolCallingChatOptions handling in buildChatRequest() - Add explicit merge for @JsonIgnore fields (toolCallbacks, toolNames, toolContext) - Implement tool definitions resolution and conversion to API format - Add complete message type handling (USER, ASSISTANT, SYSTEM, TOOL) Testing: - Add HuggingfaceChatModelFunctionCallingIT with 2 test cases - Test basic function calling with automatic tool execution - Test ToolContext propagation with BiFunction signature - Use meta-llama/Llama-3.2-3B-Instruct:together model (provider suffix required) Documentation: - Enhance huggingface.adoc with comprehensive function calling section - Add provider suffix notation requirements and model compatibility info - Include configuration examples and HuggingFace guide references - Document streaming function calling limitation Technical Notes: - ModelOptionsUtils.merge() ignores @JsonIgnore fields - requires explicit merge - ToolCallingChatOptions.class must be used as source type in copyToTarget() - Streaming function calling not yet supported (planned for WebClient integration) References: - https://huggingface.co/docs/inference-providers/guides/function-calling - https://huggingface.co/collections/MarketAgents/function-calling-models-tool-use Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Update HuggingFace modules to align with Spring Boot 4.0.0-RC2 upgrade (spring-projects#4774) after rebasing onto main branch: RetryTemplate Migration (Spring Framework 7): - Update imports from org.springframework.retry.support to org.springframework.core.retry - Replace retryTemplate.execute(ctx -> ...) with RetryUtils.execute(retryTemplate, () -> ...) - Rewrite HuggingfaceRetryTests to use new RetryListener interface with RetryPolicy/Retryable RestClientAutoConfiguration Migration (Spring Boot 3.2+): - Update imports from org.springframework.boot.autoconfigure.web.client to org.springframework.boot.restclient.autoconfigure - Add spring-boot-starter-restclient dependency to autoconfigure pom.xml Test Improvements: - Fix beanOutputConverterRecords test with simpler prompt and lower temperature - Ensure consistent JSON response generation Modified Files: - HuggingfaceChatModel.java, HuggingfaceEmbeddingModel.java (RetryUtils pattern) - HuggingfaceRetryTests.java (complete rewrite for new API) - HuggingfaceChatModelIT.java (test fix) - All autoconfigure classes (RestClientAutoConfiguration import) - autoconfigure pom.xml (new dependency) This commit ensures compatibility with Spring Boot 4.0.0-RC2 and Spring Framework 7 following the main branch upgrade in commit d5e92be (spring-projects#4774). Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Profile-exe · 2025-11-19T16:22:25Z

Hi @ilayaperumalg

This PR is ready for review. It adds comprehensive HuggingFace integration including:

New EmbeddingModel with Feature Extraction API
Enhanced ChatModel with function calling support
ConnectionDetails pattern for auto-configuration
Spring Boot 4.0 / Spring Framework 7 compatibility

Would appreciate your feedback when available. Thank you!

Profile-exe added 19 commits November 19, 2025 16:29

Apply Spring Java Format to BaseHuggingfaceIT

9c2b14a

Apply consistent code formatting following Spring Java Format conventions. Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>

Profile-exe force-pushed the GH-4182-huggingface-embedding-support branch from c858e49 to d0417dc Compare November 19, 2025 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gh-4182, GH-849: Add HuggingFace EmbeddingModel and ConnectionDetails Auto-Configuration #4887

Gh-4182, GH-849: Add HuggingFace EmbeddingModel and ConnectionDetails Auto-Configuration #4887

Profile-exe commented Nov 15, 2025 •

edited

Loading

Uh oh!

Profile-exe commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Gh-4182, GH-849: Add HuggingFace EmbeddingModel and ConnectionDetails Auto-Configuration #4887

Are you sure you want to change the base?

Gh-4182, GH-849: Add HuggingFace EmbeddingModel and ConnectionDetails Auto-Configuration #4887

Conversation

Profile-exe commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add HuggingFace Embedding Support with Enhanced Chat Implementation

Summary

Key Changes

Core Implementation

Auto-Configuration

Spring Boot 4.0 / Spring Framework 7 Compatibility

API Specification Compliance

Configuration Example

Testing

Documentation

Uh oh!

Profile-exe commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Profile-exe commented Nov 15, 2025 •

edited

Loading