Skip to content

Conversation

@mldangelo
Copy link
Member

Summary

Adds support for HuggingFace's OpenAI-compatible chat completions API via a new huggingface:chat provider.

  • New HuggingfaceChatCompletionProvider class extending OpenAiChatCompletionProvider for code reuse
  • Auto-detection in huggingface:text-generation when apiEndpoint contains /v1
  • Supports chatCompletion: true/false config to explicitly control format
  • Maps HuggingFace parameters (max_new_tokens) to OpenAI format (max_tokens)
  • Registry support for huggingface:chat:<model> syntax

Usage

providers:
  - id: huggingface:chat:deepseek-ai/DeepSeek-R1
    config:
      temperature: 0.7
      max_new_tokens: 1000

  # Or with text-generation (auto-detects from URL)
  - id: huggingface:text-generation:meta-llama/Llama-3.3-70B-Instruct
    config:
      apiEndpoint: https://router.huggingface.co/v1/chat/completions

Test plan

  • Unit tests added for chat completion functionality
  • Integration tested with multiple models:
    • DeepSeek-R1
    • Llama-3.3-70B-Instruct
    • Qwen2.5-72B-Instruct
    • Llama-3.1-8B-Instruct (via auto-detect)
  • Existing HuggingFace tests still pass
  • Documentation updated
  • Example added in examples/huggingface-chat/

cc @jameshiester - this fixes the issue you reported where the HuggingFace text-generation provider was sending the wrong request format to /v1/chat/completions.

🤖 Generated with Claude Code

Adds support for HuggingFace's OpenAI-compatible chat completions API via a new `huggingface:chat` provider that extends the OpenAI provider.

Changes:
- New `HuggingfaceChatCompletionProvider` class extending OpenAiChatCompletionProvider
- Auto-detection in `huggingface:text-generation` when apiEndpoint contains `/v1`
- Supports `chatCompletion: true/false` config to explicitly control format
- Maps HuggingFace parameters (max_new_tokens) to OpenAI format (max_tokens)
- Registry support for `huggingface:chat:<model>` syntax
- Updated documentation with chat provider examples
- New example in `examples/huggingface-chat/`
- Unit tests for chat completion functionality

Usage:
```yaml
providers:
  - id: huggingface:chat:deepseek-ai/DeepSeek-R1
    config:
      temperature: 0.7
      max_new_tokens: 1000
```

Fixes issue reported by @jameshiester where HuggingFace text-generation provider
was sending incorrect request format to the /v1/chat/completions endpoint.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@use-tusk
Copy link
Contributor

use-tusk bot commented Feb 2, 2026

⏩ No test execution environment matched (885be34) View output ↗


View check history

Commit Status Output Created (UTC)
d39af5f ⏩ No test execution environment matched Output Feb 2, 2026 10:55PM
10a88f0 ⏩ No test execution environment matched Output Feb 2, 2026 11:30PM
be79f36 ⏩ No test execution environment matched Output Feb 3, 2026 12:16AM
e3f15f2 ⏩ No test execution environment matched Output Feb 3, 2026 12:17AM
ba88e67 ⏩ No test execution environment matched Output Feb 3, 2026 7:09AM
3a91865 ⏩ No test execution environment matched Output Feb 3, 2026 7:16AM
22fa5a9 ⏩ No test execution environment matched Output Feb 3, 2026 7:24AM
228bd3c ⏩ No test execution environment matched Output Feb 3, 2026 7:35AM
c8712a1 ⏩ No test execution environment matched Output Feb 3, 2026 7:46AM
317db54 ⏩ No test execution environment matched Output Feb 3, 2026 3:45PM
885be34 ⏩ No test execution environment matched Output Feb 3, 2026 5:38PM

View output in GitHub ↗

@jameshiester
Copy link
Contributor

should there be a smoke test?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

This pull request adds support for HuggingFace OpenAI-compatible chat completions. It introduces a new HuggingFaceChatCompletionProvider class alongside the existing Inference API provider, enabling conditional routing based on API endpoint and configuration. The changes include a new provider factory function, auto-detection logic for chat completion format, provider registry updates, documentation updates describing the chat models feature, an example configuration, and corresponding test coverage for the new chat completion path.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and concisely describes the main feature addition—a new HuggingFace chat completion provider—which aligns with the core changes across the codebase.
Description check ✅ Passed The PR description comprehensively relates to the changeset, detailing the implementation approach, usage examples, test coverage, and addressing a specific issue—all directly supported by the code and documentation changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/huggingface-chat-provider

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/providers/registry.ts (1)

1555-1582: ⚠️ Potential issue | 🟡 Minor

Include sentence-similarity in the HuggingFace path error message. It’s a supported task but not listed, which can mislead users.

🛠️ Suggested update
-          `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`,
+          `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`,
...
-        `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`,
+        `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`,
🤖 Fix all issues with AI agents
In `@examples/huggingface-chat/promptfooconfig.yaml`:
- Around line 5-42: Move the prompts section to appear before the providers
section in promptfooconfig.yaml so the top-level keys follow the required order
(description, env, prompts, providers, defaultTest, scenarios, tests);
specifically relocate the "prompts:" block (the "'What is 2+2?..." entry) above
the list of "providers:" entries and ensure missing intermediate keys (env,
defaultTest, scenarios) are present (can be empty) in that sequence to preserve
the mandated field order.

In `@site/docs/providers/huggingface.md`:
- Line 60: Restore the "Examples" heading to its original level instead of
demoting it; ensure the heading text "Examples" remains at the same Markdown
header level (the "### Examples" token) so existing anchors/links keep working
and do not change other surrounding headings or content.
- Around line 1-4: The front matter in the document currently only contains
sidebar_label and description; add the required YAML fields by inserting a title
(under 60 characters), a description expanded to ~150–160 characters, and a
numeric sidebar_position key alongside the existing sidebar_label and
description; update the existing description value (the "description" field) to
meet the 150–160 char length and ensure the newly added "title" and
"sidebar_position" keys are present near the top of the file so static site
tooling picks them up.

In `@src/providers/huggingface.ts`:
- Around line 712-714: Update the thrown Error message that references
providerPath in the Huggingface provider factory to include
"sentence-similarity" among the supported task types; locate the throw new
Error(...) that lists "huggingface:chat, huggingface:text-generation,
huggingface:feature-extraction, huggingface:text-classification,
huggingface:token-classification" and add
"huggingface:sentence-similarity:<model name>" to that list so the error
correctly reflects supported tasks.
- Around line 174-186: The auto-detect in useChatCompletionFormat() is too broad
because checking apiEndpoint.includes('/v1') will match both chat and non-chat
endpoints; update the detection so that it only returns true for chat-specific
endpoints (e.g., check for '/v1/chat' or '/v1/chat/completions' explicitly)
while leaving '/v1/completions' treated as non-chat; preserve the explicit
config.chatCompletion override and ensure the method returns false for generic
'/v1/completions' endpoints.

In `@test/providers/index.test.ts`:
- Around line 388-514: Add tests to cover HTTP 4xx/5xx errors, rate-limit
responses, invalid configuration, and token-usage tracking for the chat
completion path by extending the existing describe block that exercises
HuggingfaceTextGenerationProvider and its callApi method; specifically, add
cases that (1) mock fetch to return status 400/500 with an error body and assert
result.error contains the message, (2) mock a 429 response and/or a 200 with
rate-limit headers (eg. x-ratelimit-reset, retry-after) and assert the provider
surfaces rate-limit info, (3) call new HuggingfaceTextGenerationProvider
instances with invalid configs (e.g., missing apiEndpoint or invalid
chatCompletion types) and assert validation errors, and (4) mock responses that
include usage/token fields (or headers) and assert the provider records token
usage on result. Use the same test utilities (mockFetch, defaultMockResponse)
and follow the style of existing tests (parsing options.body, checking
messages/inputs) to implement these additional specs.
🧹 Nitpick comments (1)
examples/huggingface-chat/promptfooconfig.yaml (1)

35-37: Consider a quirky prompt for the simple test. It’ll make the example more engaging.

Based on learnings: For trivial test cases in configuration, make them quirky and fun to increase engagement.

Comment on lines 5 to 42
description: HuggingFace Chat Completion Tests

providers:
# New dedicated huggingface:chat provider (recommended approach)
- id: huggingface:chat:deepseek-ai/DeepSeek-R1
config:
temperature: 0.1
max_new_tokens: 100
label: DeepSeek-R1

- id: huggingface:chat:meta-llama/Llama-3.3-70B-Instruct
config:
temperature: 0.1
max_new_tokens: 100
label: Llama-3.3-70B

- id: huggingface:chat:Qwen/Qwen2.5-72B-Instruct
config:
temperature: 0.1
max_new_tokens: 100
label: Qwen2.5-72B

# text-generation with auto-detection from URL (backward compatible)
- id: huggingface:text-generation:meta-llama/Llama-3.1-8B-Instruct
config:
apiEndpoint: https://router.huggingface.co/v1/chat/completions
temperature: 0.1
max_new_tokens: 100
label: Llama-3.1-8B (auto-detect)

prompts:
- 'What is 2+2? Answer with just the number.'

tests:
- vars: {}
assert:
- type: contains
value: '4'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reorder sections to match required field order. prompts should come before providers.

🛠️ Suggested reorder
 description: HuggingFace Chat Completion Tests
 
+prompts:
+  - 'What is 2+2? Answer with just the number.'
+
 providers:
   # New dedicated huggingface:chat provider (recommended approach)
   - id: huggingface:chat:deepseek-ai/DeepSeek-R1
@@
- 
-prompts:
-  - 'What is 2+2? Answer with just the number.'
-
 tests:
   - vars: {}
     assert:
       - type: contains
         value: '4'
As per coding guidelines: Field order in promptfooconfig.yaml must be: description, env, prompts, providers, defaultTest, scenarios, tests.
🤖 Prompt for AI Agents
In `@examples/huggingface-chat/promptfooconfig.yaml` around lines 5 - 42, Move the
prompts section to appear before the providers section in promptfooconfig.yaml
so the top-level keys follow the required order (description, env, prompts,
providers, defaultTest, scenarios, tests); specifically relocate the "prompts:"
block (the "'What is 2+2?..." entry) above the list of "providers:" entries and
ensure missing intermediate keys (env, defaultTest, scenarios) are present (can
be empty) in that sequence to preserve the mandated field order.

Comment on lines 1 to 4
---
sidebar_label: HuggingFace
description: Configure HuggingFace's text classification, embedding, and NER models for LLM testing and eval tasks
description: Configure HuggingFace's chat models, text classification, embedding, and NER models for LLM testing and eval tasks
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add required front matter fields (and expand description length).

🛠️ Suggested front matter
 ---
+title: HuggingFace provider
 sidebar_label: HuggingFace
-description: Configure HuggingFace's chat models, text classification, embedding, and NER models for LLM testing and eval tasks
+sidebar_position: 40
+description: Configure HuggingFace chat models, embeddings, classification, and NER for Promptfoo evals using Inference API tasks and OpenAI‑compatible chat endpoints and routers.
 ---
As per coding guidelines: Front matter is required with title (under 60 chars), description (150-160 chars), and sidebar_position.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
---
sidebar_label: HuggingFace
description: Configure HuggingFace's text classification, embedding, and NER models for LLM testing and eval tasks
description: Configure HuggingFace's chat models, text classification, embedding, and NER models for LLM testing and eval tasks
---
---
title: HuggingFace provider
sidebar_label: HuggingFace
sidebar_position: 40
description: Configure HuggingFace chat models, embeddings, classification, and NER for Promptfoo evals using Inference API tasks and OpenAI‑compatible chat endpoints and routers.
---
🤖 Prompt for AI Agents
In `@site/docs/providers/huggingface.md` around lines 1 - 4, The front matter in
the document currently only contains sidebar_label and description; add the
required YAML fields by inserting a title (under 60 characters), a description
expanded to ~150–160 characters, and a numeric sidebar_position key alongside
the existing sidebar_label and description; update the existing description
value (the "description" field) to meet the 150–160 char length and ensure the
newly added "title" and "sidebar_position" keys are present near the top of the
file so static site tooling picks them up.

:::

## Examples
### Examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t demote the “Examples” heading. It can break existing anchor links.

🛠️ Suggested change
-### Examples
+## Examples
As per coding guidelines: Don't modify headings as they are often externally linked.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Examples
## Examples
🤖 Prompt for AI Agents
In `@site/docs/providers/huggingface.md` at line 60, Restore the "Examples"
heading to its original level instead of demoting it; ensure the heading text
"Examples" remains at the same Markdown header level (the "### Examples" token)
so existing anchors/links keep working and do not change other surrounding
headings or content.

Comment on lines 174 to 186
private useChatCompletionFormat(): boolean {
// Explicit config takes precedence
if (this.config.chatCompletion !== undefined) {
return this.config.chatCompletion;
}
// Auto-detect based on endpoint URL
if (this.config.apiEndpoint) {
return (
this.config.apiEndpoint.includes('/v1/chat/completions') ||
this.config.apiEndpoint.includes('/v1')
);
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

HuggingFace OpenAI-compatible API endpoints /v1/chat/completions /v1/completions documentation

💡 Result:

Hugging Face has two “OpenAI-compatible” surfaces that document /v1/chat/completions and/or /v1/completions:

  1. Hugging Face Inference Providers (HF Router)
  • Chat Completions: documented under Inference Providers → Chat Completion; examples use the OpenAI SDK with base_url="https://router.huggingface.co/v1" (the SDK then hits the OpenAI-style /v1/chat/completions route). [1]
  • (Legacy) Completions: documented under Inference Providers → Text Generation; examples show base_url="https://router.huggingface.co/<provider>/v1/completions" (i.e., the OpenAI-style /v1/completions route). [2]
  1. Hugging Face Inference Endpoints running TGI (Text Generation Inference)
  • /v1/chat/completions: documented as the Messages API, “fully compatible with the OpenAI Chat Completion API,” with curl examples directly calling .../v1/chat/completions. [3]
  • /v1/completions: documented at a high level in the TGI engine overview as “Fully OpenAI-compatible /v1/chat or /v1/completions APIs”. [4]

Sources


🏁 Script executed:

# First, find the HuggingFace provider file and understand the context
find . -name "huggingface.ts" -type f | head -5

Repository: promptfoo/promptfoo

Length of output: 93


🏁 Script executed:

# Search for how useChatCompletionFormat is used in the provider
rg "useChatCompletionFormat" -A 5 -B 2

Repository: promptfoo/promptfoo

Length of output: 1122


🏁 Script executed:

# Check what getChatProvider returns and how it handles the response
rg "getChatProvider" -A 10 src/providers/huggingface.ts

Repository: promptfoo/promptfoo

Length of output: 806


🏁 Script executed:

# Check the inference API path to understand what format it expects
rg "callApi.*prompt" -B 3 -A 10 src/providers/huggingface.ts | head -50

Repository: promptfoo/promptfoo

Length of output: 1277


🏁 Script executed:

# Check if there are tests for different endpoint patterns
find . -name "*huggingface*test*" -o -name "*huggingface*.spec.ts" | head -10

Repository: promptfoo/promptfoo

Length of output: 110


🏁 Script executed:

# Look at the broader context of useChatCompletionFormat to see all endpoint paths users might pass
rg "apiEndpoint" src/providers/huggingface.ts -B 2 -A 2

Repository: promptfoo/promptfoo

Length of output: 2334


Auto-detect is too broad and can misroute /v1/completions. HuggingFace offers both chat (/v1/chat/completions) and completions (/v1/completions) OpenAI-compatible endpoints with different request formats. The current check includes('/v1') matches both, causing completions endpoints to receive messages format. This needs narrowing to avoid breaking valid /v1/completions usage.

🛠️ Suggested narrowing
-    if (this.config.apiEndpoint) {
-      return (
-        this.config.apiEndpoint.includes('/v1/chat/completions') ||
-        this.config.apiEndpoint.includes('/v1')
-      );
-    }
+    if (this.config.apiEndpoint) {
+      if (this.config.apiEndpoint.includes('/v1/chat/completions')) {
+        return true;
+      }
+      // Treat base /v1 as OpenAI-compatible, but avoid /v1/completions or other non-chat paths
+      return /\/v1\/?$/.test(this.config.apiEndpoint);
+    }
🤖 Prompt for AI Agents
In `@src/providers/huggingface.ts` around lines 174 - 186, The auto-detect in
useChatCompletionFormat() is too broad because checking
apiEndpoint.includes('/v1') will match both chat and non-chat endpoints; update
the detection so that it only returns true for chat-specific endpoints (e.g.,
check for '/v1/chat' or '/v1/chat/completions' explicitly) while leaving
'/v1/completions' treated as non-chat; preserve the explicit
config.chatCompletion override and ensure the method returns false for generic
'/v1/completions' endpoints.

Comment on lines 712 to 714
throw new Error(
`Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include sentence-similarity in the factory error message. It’s a supported task but missing in the list.

🛠️ Suggested update
-    `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`,
+    `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
throw new Error(
`Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`,
);
throw new Error(
`Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`,
);
🤖 Prompt for AI Agents
In `@src/providers/huggingface.ts` around lines 712 - 714, Update the thrown Error
message that references providerPath in the Huggingface provider factory to
include "sentence-similarity" among the supported task types; locate the throw
new Error(...) that lists "huggingface:chat, huggingface:text-generation,
huggingface:feature-extraction, huggingface:text-classification,
huggingface:token-classification" and add
"huggingface:sentence-similarity:<model name>" to that list so the error
correctly reflects supported tasks.

Comment on lines 388 to 514
describe('HuggingfaceTextGenerationProvider chat completion format', () => {
it('auto-detects chat completion format from URL', async () => {
const mockResponse = {
...defaultMockResponse,
text: vi.fn().mockResolvedValue(
JSON.stringify({
choices: [{ message: { content: 'Chat response' } }],
}),
),
};
mockFetch.mockResolvedValue(mockResponse);

const provider = new HuggingfaceTextGenerationProvider('deepseek-ai/DeepSeek-R1', {
config: {
apiEndpoint: 'https://router.huggingface.co/v1/chat/completions',
apiKey: 'test-key',
},
});
const result = await provider.callApi('Test prompt');

expect(mockFetch).toHaveBeenCalledTimes(1);
const [url, options] = mockFetch.mock.calls[0];
expect(url).toBe('https://router.huggingface.co/v1/chat/completions');
const body = JSON.parse(options.body);
expect(body).toHaveProperty('model', 'deepseek-ai/DeepSeek-R1');
expect(body).toHaveProperty('messages');
expect(body.messages[0]).toEqual({ role: 'user', content: 'Test prompt' });
expect(result.output).toBe('Chat response');
});

it('uses explicit chatCompletion config', async () => {
const mockResponse = {
...defaultMockResponse,
text: vi.fn().mockResolvedValue(
JSON.stringify({
choices: [{ message: { content: 'Chat response' } }],
}),
),
};
mockFetch.mockResolvedValue(mockResponse);

const provider = new HuggingfaceTextGenerationProvider('my-model', {
config: {
apiEndpoint: 'https://my-custom-endpoint.com/api',
chatCompletion: true,
},
});
const result = await provider.callApi('Test prompt');

expect(mockFetch).toHaveBeenCalledTimes(1);
const [, options] = mockFetch.mock.calls[0];
const body = JSON.parse(options.body);
expect(body).toHaveProperty('messages');
expect(result.output).toBe('Chat response');
});

it('maps HuggingFace parameters to OpenAI format', async () => {
const mockResponse = {
...defaultMockResponse,
text: vi.fn().mockResolvedValue(
JSON.stringify({
choices: [{ message: { content: 'Response' } }],
}),
),
};
mockFetch.mockResolvedValue(mockResponse);

const provider = new HuggingfaceTextGenerationProvider('model', {
config: {
apiEndpoint: 'https://api.example.com/v1/chat/completions',
temperature: 0.7,
top_p: 0.9,
max_new_tokens: 100,
},
});
await provider.callApi('Test');

const [, options] = mockFetch.mock.calls[0];
const body = JSON.parse(options.body);
expect(body.temperature).toBe(0.7);
expect(body.top_p).toBe(0.9);
expect(body.max_tokens).toBe(100);
});

it('handles chat completion error response', async () => {
const mockResponse = {
...defaultMockResponse,
text: vi.fn().mockResolvedValue(
JSON.stringify({
error: { message: 'Model not found' },
}),
),
};
mockFetch.mockResolvedValue(mockResponse);

const provider = new HuggingfaceTextGenerationProvider('model', {
config: {
apiEndpoint: 'https://api.example.com/v1/chat/completions',
},
});
const result = await provider.callApi('Test');

expect(result.error).toContain('Model not found');
});

it('falls back to Inference API format when chatCompletion is false', async () => {
const mockResponse = {
...defaultMockResponse,
text: vi.fn().mockResolvedValue(JSON.stringify({ generated_text: 'Output' })),
};
mockFetch.mockResolvedValue(mockResponse);

const provider = new HuggingfaceTextGenerationProvider('model', {
config: {
apiEndpoint: 'https://api.example.com/v1/chat/completions',
chatCompletion: false, // Explicitly disable
},
});
await provider.callApi('Test');

const [, options] = mockFetch.mock.calls[0];
const body = JSON.parse(options.body);
expect(body).toHaveProperty('inputs');
expect(body).toHaveProperty('parameters');
expect(body).not.toHaveProperty('messages');
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add required 4xx/5xx, rate‑limit, config‑validation, and token‑usage tests for the chat path. The new block covers success and body‑error cases only.

As per coding guidelines: Every provider must have tests covering: success cases, error cases (4xx, 5xx, rate limits), configuration validation, and token usage tracking.

🤖 Prompt for AI Agents
In `@test/providers/index.test.ts` around lines 388 - 514, Add tests to cover HTTP
4xx/5xx errors, rate-limit responses, invalid configuration, and token-usage
tracking for the chat completion path by extending the existing describe block
that exercises HuggingfaceTextGenerationProvider and its callApi method;
specifically, add cases that (1) mock fetch to return status 400/500 with an
error body and assert result.error contains the message, (2) mock a 429 response
and/or a 200 with rate-limit headers (eg. x-ratelimit-reset, retry-after) and
assert the provider surfaces rate-limit info, (3) call new
HuggingfaceTextGenerationProvider instances with invalid configs (e.g., missing
apiEndpoint or invalid chatCompletion types) and assert validation errors, and
(4) mock responses that include usage/token fields (or headers) and assert the
provider records token usage on result. Use the same test utilities (mockFetch,
defaultMockResponse) and follow the style of existing tests (parsing
options.body, checking messages/inputs) to implement these additional specs.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

Security Review ✅

No critical issues found. This PR adds a well-structured HuggingFace chat completion provider that extends the existing OpenAI provider.

🟡 Minor Observations (3 items)
  • src/providers/huggingface.ts:88-89 - The inference provider parsing logic uses !modelName.includes(':') which could fail for models containing colons in their name that aren't provider suffixes. This is a minor edge case and matches the existing documentation pattern.

  • test/providers/huggingface.test.ts:550-551 - Global fetch mock is set without cleanup in afterAll. The afterEach with vi.clearAllMocks() handles this, but consider using vi.restoreAllMocks() for complete mock restoration.

  • site/docs/providers/huggingface.md:185-186 - The curl example in docs exposes a pattern for API enumeration (curl https://huggingface.co/api/models/MODEL_ID?expand[]=inferenceProviderMapping). This is intentional and documented by HuggingFace, so not a security concern.

Notes

  • ✅ API keys properly sourced from environment variables (HF_TOKEN, HF_API_TOKEN) with no hardcoded secrets
  • ✅ Extends OpenAiChatCompletionProvider which has established security patterns for API communication
  • ✅ URL handling properly strips duplicate path segments (/chat/completions) to prevent malformed requests
  • ✅ Tests cover API key fallback behavior and error handling
  • ✅ PR scope (providers) is correct per conventions (not redteam-related)

Last updated: 2026-02-03T12:00:00Z | Reviewing: 885be34

Copy link
Contributor

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 All Clear

I reviewed this PR which adds a new HuggingFace chat completion provider. The changes implement provider infrastructure code that connects applications to HuggingFace's OpenAI-compatible API. After analyzing the data flows, API key handling, and inherited capabilities from the OpenAI provider, no LLM security vulnerabilities were found.

Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more


Was this helpful?  👍 Yes  |  👎 No 

- Fix overly broad auto-detection: change `/v1` to `/v1/chat` to avoid
  matching non-chat endpoints like `/v1/completions`
- Add sentence-similarity to error messages in registry
- Remove unused factory function `createHuggingfaceProvider`
- Fix example config field order: prompts before providers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mldangelo
Copy link
Member Author

@jameshiester Re: smoke test - looking at the existing smoke tests in test/smoke/, they use deterministic providers (like echo) to test CLI behavior without making real API calls. For HuggingFace, a meaningful smoke test would require real API calls (which needs an API key and has latency).

What we could test in smoke tests without API calls:

  • Provider registry loads huggingface:chat: correctly (already covered by unit tests in test/providers/index.test.ts)
  • Config validation accepts the new provider format

What would require an integration test with a real API key:

  • End-to-end chat completion responses
  • Auto-detection behavior with real endpoints

I've also addressed the review feedback:

  • Fixed overly broad /v1 auto-detection → now checks /v1/chat specifically
  • Added sentence-similarity to error messages
  • Removed unused createHuggingfaceProvider factory function
  • Fixed example config field ordering

mldangelo and others added 2 commits February 2, 2026 16:16
Update documentation and code comment to accurately reflect that
auto-detection matches `/v1/chat` (not `/v1`) after the fix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ensures auto-detection only triggers for /v1/chat endpoints, not
/v1/completions (which would be a non-chat completion endpoint).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jameshiester jameshiester self-requested a review February 3, 2026 05:05
mldangelo and others added 8 commits February 2, 2026 23:09
…verage

- Remove requiresApiKey() override that suppressed auth error messages
- Remove misleading repetition_penalty → frequency_penalty mapping
- Fix URL stripping to use anchored regex instead of String.replace
- Guard max_new_tokens mapping to not override explicit max_tokens
- Shallow-copy providerOptions to prevent mutation bugs
- Add dedicated test file with 29 tests covering identity, auth,
  URL handling, parameter mapping, errors, and registry integration
- Move chat delegation tests from index.test.ts to dedicated file
- Restore ## Examples heading in docs (was demoted, breaking anchors)
- Update providers index to show huggingface:chat as primary example
- Simplify example config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ce provider

- getApiKey() now checks this.env?.HF_TOKEN and this.env?.HF_API_TOKEN,
  matching the provider-level env override pattern used by other
  OpenAI-compatible wrappers
- HuggingfaceTextGenerationProvider.cleanup() forwards to inner chat
  provider, preventing MCP resource leaks when delegating

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Qualify tool calling and streaming as model-dependent in HF docs
- Reduce example config to a single provider to avoid surprise costs
- Add cost note to example README

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… IDs

Update docs and example to use HuggingFace's officially recommended
Inference Provider models that have been verified working via E2E
testing: DeepSeek-R1, openai/gpt-oss-120b, Qwen2.5-Coder-32B,
GLM-4.5, and google/gemma-3-27b-it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…r routing

Add support for HuggingFace's Inference Provider routing system, which
allows routing requests to specific backend providers (Cerebras, Together,
Fireworks AI, etc.). Users can specify a provider via:

- Model name suffix: `huggingface:chat:org/model:provider-name`
- Config option: `config.inferenceProvider: 'provider-name'`

This enables access to models that require explicit provider selection,
such as `Qwen/QwQ-32B:featherless-ai`.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…lability

Clarify that model name suffix takes precedence over inferenceProvider
config, and note that available models/providers change over time.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mldangelo mldangelo merged commit cd709b7 into main Feb 3, 2026
45 checks passed
@mldangelo mldangelo deleted the feat/huggingface-chat-provider branch February 3, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants