Skip to content

Conversation

@Profile-exe
Copy link

@Profile-exe Profile-exe commented Nov 15, 2025

Add HuggingFace Embedding Support with Enhanced Chat Implementation

Partially addresses #4182
Fixes #849

Summary

This PR adds comprehensive HuggingFace integration to Spring AI:

  1. New EmbeddingModel - Cloud-based embeddings via HuggingFace Inference API
  2. Enhanced ChatModel - Full alignment with official Chat Completion API, including function-calling support
  3. Modern Architecture - ConnectionDetails pattern, unified API client, complete observability
  4. Spring Boot 4.0 Compatibility - Updated for Spring Framework 7 RetryTemplate migration

Key Changes

Core Implementation

  • Unified HuggingfaceApi client supporting Chat and Embedding endpoints
  • HuggingfaceEmbeddingModel with Feature Extraction API integration
  • HuggingfaceChatModel refactored for OpenAI-compatible /v1/chat/completions endpoint
  • Function calling with tools/tool_choice support via ToolCallingManager
  • AOT/GraalVM native image support with reflection hints

Auto-Configuration

  • ConnectionDetails pattern (Spring Boot 3.1+) following Ollama architecture
  • Separate Chat and Embedding auto-configurations
  • Property-based configuration following Spring AI conventions

Spring Boot 4.0 / Spring Framework 7 Compatibility

  • RetryTemplate migration from spring-retry to spring-core
  • Updated to use RetryUtils.execute() pattern
  • RestClientAutoConfiguration package migration

API Specification Compliance

Chat: Aligned with Chat Completion API and Function Calling Guide

  • Supports: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, stop, seed, response_format, tools, tool_choice

Embedding: Aligned with Feature Extraction API

  • Supports: normalize, prompt_name, truncate, truncation_direction

Configuration Example

spring.ai.huggingface:
  api-key: ${HUGGINGFACE_API_KEY}
  chat:
    url: https://router.huggingface.co/v1
    options.model: meta-llama/Llama-3.2-3B-Instruct
  embedding:
    url: https://router.huggingface.co/hf-inference/models
    options.model: sentence-transformers/all-MiniLM-L6-v2

Testing

# Unit tests
./mvnw clean test -pl models/spring-ai-huggingface,auto-configurations/models/spring-ai-autoconfigure-model-huggingface

# Integration tests (requires API key)
export HUGGINGFACE_API_KEY=your_key
./mvnw clean verify -Pintegration-tests -pl models/spring-ai-huggingface

Comprehensive test coverage includes: model functionality, function calling, auto-configuration, observability, error handling/retry, and AOT compatibility.

Documentation

Complete Antora documentation covering Chat, Embedding, and function calling with model compatibility guidance.

Update pom.xml files to include required dependencies for the new
HuggingFace Embedding implementation and enhanced Chat functionality.

Changes in models/spring-ai-huggingface/pom.xml:
- Add spring-ai-retry dependency for retry support
- Add spring-ai-client-chat test dependency for fluent API tests
- Add micrometer-observation-test for observation testing

Changes in auto-configurations/.../pom.xml:
- Add spring-ai-autoconfigure-model-tool for tool calling support
- Add spring-ai-autoconfigure-retry for retry auto-configuration

These dependencies enable:
- Retry logic for both Chat and Embedding models
- Tool/function calling capabilities for Chat
- Comprehensive observability testing
- Fluent ChatClient API integration

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Add HUGGINGFACE enum entry to AiProvider to support observability and
metrics collection for HuggingFace Chat and Embedding models. This follows
the established pattern used by other AI providers (OpenAI, Ollama,
Anthropic, etc.).

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceApi as a unified client supporting both Chat and Embedding:
- Chat completions via OpenAI-compatible endpoint (/v1/chat/completions)
- Embeddings via Feature Extraction endpoint (/models/{model}/pipeline/feature-extraction)

Key features:
- Builder pattern with validation
- Bearer token authentication
- RestClient-based implementation
- Response error handling
- Streaming support for Chat
- Support for both 1D and 2D embedding responses

This follows the pattern established by OllamaApi while supporting HuggingFace's
dual API structure (OpenAI-compatible for chat, native API for embeddings).

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceChatOptions and HuggingfaceEmbeddingOptions following
the pattern established by Ollama and OpenAI integrations.

Both classes include:
- Builder pattern for fluent configuration
- fromOptions() static factory method for option merging
- toMap() method for ModelOptionsUtils integration
- Proper equals(), hashCode(), copy() implementations
- Comprehensive unit tests

Chat options support: model, temperature, maxTokens, topP,
frequencyPenalty, presencePenalty, stop, toolChoice, tools.

Embedding options support: model (default: sentence-transformers/all-MiniLM-L6-v2).

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceEmbeddingModel extending AbstractEmbeddingModel to provide
text embedding functionality through the HuggingFace Inference API.

Features:
- Support for HuggingFace Feature Extraction API
- Full observability/metrics integration
- Retry support using configurable RetryTemplate
- Options merging (runtime + defaults)
- Proper metadata handling (model name, usage)
- Builder pattern for flexible configuration

The implementation handles both 1D and 2D embedding responses from the API
and converts them to the standard float[] format required by Spring AI's
EmbeddingModel interface.

Unit test coverage:
- Basic embedding generation
- Options merging
- Error handling
- Metadata extraction

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Update HuggingfaceChatModel to use the new unified HuggingfaceApi client
and HuggingfaceChatOptions class, replacing the previous implementation.

Changes:
- Use unified HuggingfaceApi for chat completions
- Migrate from ad-hoc configuration to HuggingfaceChatOptions
- Enhanced tool calling support using ToolCallingManager
- Improved observability integration
- Better error handling and retry logic
- Updated test configuration

This refactoring aligns the Chat implementation with the new Embedding
implementation and follows patterns established by OpenAI and Ollama integrations.

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceRuntimeHints to provide reflection hints for GraalVM
native image compilation. This enables HuggingFace Chat and Embedding models
to work in native executables.

Hints include:
- HuggingfaceApi request/response classes
- ChatModel and EmbeddingModel implementations
- All JSON-serializable POJOs

This follows the same pattern used by OpenAI and Ollama integrations to
ensure compatibility with Spring Native and GraalVM.

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…pattern

Implement HuggingfaceApiAutoConfiguration as the foundation for HuggingFace
Chat and Embedding auto-configuration.

This follows Spring Boot 3.1+ ConnectionDetails pattern:
- HuggingfaceConnectionDetails: Interface for connection information
- HuggingfaceConnectionProperties: Property-based implementation (spring.ai.huggingface.api-key)
- HuggingfaceApiAutoConfiguration: Provides ConnectionDetails bean

The API key is shared between Chat and Embedding, while each model type creates
its own HuggingfaceApi instance with model-specific base URLs in their respective
auto-configurations.

This pattern enables flexible configuration and testing through custom
ConnectionDetails beans.

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
… pattern

Update HuggingfaceChatAutoConfiguration to use ConnectionDetails pattern and
create a Chat-specific HuggingfaceApi bean.

Key changes:
- Use HuggingfaceConnectionDetails for API key
- Create @qualifier("huggingfaceChatApi") bean for Chat
- Enhance HuggingfaceChatProperties to include URL configuration
- Support spring.ai.huggingface.chat.url
- Proper conditional enablement using @ConditionalOnProperty

Configuration example:
```yaml
spring:
  ai:
    huggingface:
      api-key: ${HUGGINGFACE_API_KEY}
      chat:
        enabled: true
        url: https://api-inference.huggingface.co/v1
        options:
          model: meta-llama/Llama-3.2-3B-Instruct
          temperature: 0.7
```

Test coverage:
- Auto-configuration enablement
- Bean creation and wiring
- Property binding
- Integration with actual API (IT)

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement HuggingfaceEmbeddingAutoConfiguration to provide Spring Boot
auto-configuration for HuggingFace Embedding models.

The auto-configuration:
- Creates @qualifier("huggingfaceEmbeddingApi") bean for Embedding
- Configures HuggingfaceEmbeddingModel with observability and retry support
- Uses ConnectionDetails pattern for API key
- Supports spring.ai.huggingface.embedding.* properties

Configuration example:
```yaml
spring:
  ai:
    huggingface:
      api-key: ${HUGGINGFACE_API_KEY}
      embedding:
        enabled: true
        url: https://api-inference.huggingface.co
        options:
          model: sentence-transformers/all-MiniLM-L6-v2
```

Update AutoConfiguration.imports to register:
- HuggingfaceApiAutoConfiguration
- HuggingfaceChatAutoConfiguration
- HuggingfaceEmbeddingAutoConfiguration

Test coverage:
- Auto-configuration enablement
- Bean creation and wiring
- Property binding
- Integration with actual API (IT)
- Embedding generation and metadata

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…odels

Add extensive integration tests covering all aspects of HuggingFace Chat
and Embedding models.

Test coverage:

Chat model integration tests:
- Basic chat completion
- Streaming responses
- Multi-turn conversations
- Tool/function calling using MockWeatherService
- Error handling and edge cases

Embedding model integration tests:
- Single and batch embedding generation
- Various embedding models
- Metadata extraction
- Error scenarios

Observation/metrics tests:
- Chat model observability (request/response, tokens, latency)
- Embedding model observability (request/response, dimensions)
- Proper span creation and context propagation
- Metric tags and attributes

Retry tests:
- Transient error retry behavior
- Retry exhaustion
- Backoff strategies

BaseHuggingfaceIT class provides common configuration and utilities for all
integration tests, including API key validation and test environment setup.

Tool calling tests demonstrate HuggingFace's function calling capabilities
with a weather service example.

All tests are marked with @EnabledIfEnvironmentVariable to only run when
HUGGINGFACE_API_KEY is available. also includes -Pintegration-tests option.

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Remove the outdated ClientIT.java test file which has been replaced by
the new comprehensive integration test suite:

- HuggingfaceChatModelIT.java
- HuggingfaceEmbeddingModelIT.java
- HuggingfaceChatModelObservationIT.java
- HuggingfaceEmbeddingModelObservationIT.java
- HuggingfaceApiToolFunctionCallIT.java

The new test suite provides more comprehensive coverage including:
- Both Chat and Embedding model testing
- Observability/metrics validation
- Tool calling functionality
- Proper test organization following Spring AI patterns

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…ding

Add complete user-facing documentation following Spring AI documentation
standards and patterns established by OpenAI and Ollama integrations.

New documentation (huggingface-embeddings.adoc - 269 lines):
- Prerequisites and API key setup
- Auto-configuration with Spring Boot
- Three property tables (Retry/Connection/Configuration)
- Runtime options and examples
- Sample controller implementation
- Manual configuration
- Supported models list

Enhanced Chat documentation (huggingface.adoc - 191 → 420 lines):
- Expanded prerequisites section
- Comprehensive property tables (Retry/Connection/Configuration)
- Runtime options with examples
- Function calling documentation
- Low-level API usage
- Supported models list
- Observability section
- Added NOTE about streaming not currently supported
- Added NOTE clarifying default values are API defaults

Enhanced provider overview (index.adoc - 15 → 138 lines):
- Platform overview and features
- Spring AI support summary (Chat + Embedding)
- Popular models by category
- Deployment options (Inference Endpoints, Serverless, Dedicated)
- Getting started guide
- Additional resources

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
- Fixed ChatRequest constructor in low-level API example
  - Before: new ChatRequest(model, messages, 0.7, null, false) - compilation error
  - After: new ChatRequest(model, messages, options) with Map<String, Object>
- Corrected property default values in configuration table
  - temperature: 0.7 → - (API default, not hardcoded)
  - frequency-penalty: 0.0 → - (API default)
  - presence-penalty: 0.0 → - (API default)
- Removed obsolete property entry
  - Deleted: spring.ai.huggingface.chat.enabled (Removed and no longer valid)
- Clarified embedding endpoint path
  - Before: /models/{model}/pipeline/feature-extraction (incomplete)
  - After: /{model}/pipeline/feature-extraction (relative to base URL with clarification)
- Removed unimplemented feature claim
  - Removed "Streaming responses" from provider overview (not yet implemented)

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
…ation

Remove unsupported 'dimensions' option and add officially supported options
to accurately match the HuggingFace Feature Extraction API specification at
https://huggingface.co/docs/inference-providers/tasks/feature-extraction

Changes to HuggingfaceEmbeddingOptions:
- Remove: dimensions field (not supported by Feature Extraction API)
- Add: promptName field (prompt_name in JSON)
- Add: truncate field
- Add: truncationDirection field (truncation_direction in JSON)
- Update: All field documentation to remove incorrect "TEI server only" claims
- Fix: truncationDirection values to lowercase ("left"/"right" per API spec)
- Keep: getDimensions() returning null for EmbeddingOptions interface compliance

The 4 supported options per official API specification:
1. normalize (boolean) - Whether to normalize embedding vectors
2. prompt_name (string) - Predefined prompt from model configuration
3. truncate (boolean) - Whether to truncate text exceeding max length
4. truncation_direction (enum: "left"|"right") - Which side to truncate

Test updates:
- Remove all dimensions-related tests (11 test cases updated)
- Add tests for new prompt_name, truncate, truncation_direction options
- Fix HuggingfaceEmbeddingModelObservationIT baseUrl configuration (404 fix)
- Update auto-configuration tests to use supported options

Documentation updates:
- Remove all "(TEI server only)" references from configuration table
- Clarify that these options are part of standard Feature Extraction API
- Fix truncation_direction case in examples: "Right" → "right"
- Update section title: "TEI-Specific Options" → "Advanced Options"

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Add missing supported options and remove unsupported option to accurately
match the HuggingFace Chat Completion API specification at
https://huggingface.co/docs/inference-providers/tasks/chat-completion

This addresses the opposite problem from Embedding (commit 14):
- Embedding had unsupported 'dimensions' option (removed)
- Chat was missing 6 supported options (now added)

Changes to HuggingfaceChatOptions:
- Add: stop field (List<String>) - up to 4 stop sequences
- Add: seed field (Integer) - for reproducible outputs
- Add: responseFormat field (Map<String, Object>) - JSON mode support
- Add: toolPrompt field (String) - prompt appended before tools
- Add: logprobs field (Boolean) - return log probabilities
- Add: topLogprobs field (Integer) - number of top tokens (0-5)
- Remove: top_k field (not in official Chat Completion API spec)
- Fix: getStopSequences() implementation (was incorrectly no-op)
- Keep: getTopK() returning null for ChatOptions interface compliance

The 6 added options per official API specification:
1. stop (array, max 4) - Sequences where API stops generating tokens
2. seed (integer) - For reproducibility with same seed/parameters
3. response_format (object) - Output format (e.g., {"type": "json_object"})
4. tool_prompt (string) - Prompt appended before tools in function calling
5. logprobs (boolean) - Whether to return log probabilities
6. top_logprobs (integer, 0-5) - Number of most likely tokens to return

Test updates:
- Remove all top_k-related tests and usages
- Add tests for all 6 new options
- Add integration tests: testStopSequences, testSeedForReproducibility,
  testResponseFormatJsonObject
- Update auto-configuration tests with new parameters
- Total: 21 unit tests + 16 request tests + 3 new IT tests

Documentation updates:
- Remove all "TGI" (Text Generation Inference) references
- Remove top-k configuration entry
- Add 6 new parameter entries to configuration table
- Add "Advanced Options" section with stop sequences and seed examples

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Apply consistent code formatting following Spring Java Format conventions.

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
- Remove unused swagger-codegen-maven-plugin and dependencies from pom.xml
- Delete legacy openapi.json file (TGI API spec, not used by current implementation)
- Fix incorrect comment in HuggingfaceEmbeddingModelIT.java about TEI-only parameters
- Improve truncationDirection parameter documentation (case-sensitive values)

The HuggingFace implementation uses official HuggingFace Inference API:
- Chat: OpenAI-compatible /v1/chat/completions endpoint
- Embedding: Feature Extraction pipeline endpoint

All removed components were legacy artifacts not used by the current implementation.
This commit ensures 100% alignment with official API specifications and removes
any misleading references to Text Generation Inference (TGI) or Text Embeddings
Inference (TEI) servers.

References:
- https://huggingface.co/docs/inference-providers/tasks/chat-completion
- https://huggingface.co/docs/inference-providers/tasks/feature-extraction

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
Implement function calling (tool calling) support following OpenAI and Ollama patterns:

Core Implementation:
- Add tools and tool_choice support to HuggingfaceApi.ChatRequest
- Implement ToolCallingChatOptions handling in buildChatRequest()
- Add explicit merge for @JsonIgnore fields (toolCallbacks, toolNames, toolContext)
- Implement tool definitions resolution and conversion to API format
- Add complete message type handling (USER, ASSISTANT, SYSTEM, TOOL)

Testing:
- Add HuggingfaceChatModelFunctionCallingIT with 2 test cases
- Test basic function calling with automatic tool execution
- Test ToolContext propagation with BiFunction signature
- Use meta-llama/Llama-3.2-3B-Instruct:together model (provider suffix required)

Documentation:
- Enhance huggingface.adoc with comprehensive function calling section
- Add provider suffix notation requirements and model compatibility info
- Include configuration examples and HuggingFace guide references
- Document streaming function calling limitation

Technical Notes:
- ModelOptionsUtils.merge() ignores @JsonIgnore fields - requires explicit merge
- ToolCallingChatOptions.class must be used as source type in copyToTarget()
- Streaming function calling not yet supported (planned for WebClient integration)

References:
- https://huggingface.co/docs/inference-providers/guides/function-calling
- https://huggingface.co/collections/MarketAgents/function-calling-models-tool-use

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
@Profile-exe Profile-exe force-pushed the GH-4182-huggingface-embedding-support branch from c858e49 to d0417dc Compare November 19, 2025 08:18
Update HuggingFace modules to align with Spring Boot 4.0.0-RC2 upgrade (spring-projects#4774)
after rebasing onto main branch:

RetryTemplate Migration (Spring Framework 7):
- Update imports from org.springframework.retry.support to org.springframework.core.retry
- Replace retryTemplate.execute(ctx -> ...) with RetryUtils.execute(retryTemplate, () -> ...)
- Rewrite HuggingfaceRetryTests to use new RetryListener interface with RetryPolicy/Retryable

RestClientAutoConfiguration Migration (Spring Boot 3.2+):
- Update imports from org.springframework.boot.autoconfigure.web.client to
  org.springframework.boot.restclient.autoconfigure
- Add spring-boot-starter-restclient dependency to autoconfigure pom.xml

Test Improvements:
- Fix beanOutputConverterRecords test with simpler prompt and lower temperature
- Ensure consistent JSON response generation

Modified Files:
- HuggingfaceChatModel.java, HuggingfaceEmbeddingModel.java (RetryUtils pattern)
- HuggingfaceRetryTests.java (complete rewrite for new API)
- HuggingfaceChatModelIT.java (test fix)
- All autoconfigure classes (RestClientAutoConfiguration import)
- autoconfigure pom.xml (new dependency)

This commit ensures compatibility with Spring Boot 4.0.0-RC2 and Spring Framework 7
following the main branch upgrade in commit d5e92be (spring-projects#4774).

Signed-off-by: Myeongdeok Kang <kang67572346@gmail.com>
@Profile-exe
Copy link
Author

Hi @ilayaperumalg

This PR is ready for review. It adds comprehensive HuggingFace integration including:

  • New EmbeddingModel with Feature Extraction API
  • Enhanced ChatModel with function calling support
  • ConnectionDetails pattern for auto-configuration
  • Spring Boot 4.0 / Spring Framework 7 compatibility

Would appreciate your feedback when available. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't use Hugginface Inference API (serverless) due to hardcoded /generate path

1 participant