-
-
Notifications
You must be signed in to change notification settings - Fork 899
feat(providers): add HuggingFace chat completion provider #7446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds support for HuggingFace's OpenAI-compatible chat completions API via a new `huggingface:chat` provider that extends the OpenAI provider.
Changes:
- New `HuggingfaceChatCompletionProvider` class extending OpenAiChatCompletionProvider
- Auto-detection in `huggingface:text-generation` when apiEndpoint contains `/v1`
- Supports `chatCompletion: true/false` config to explicitly control format
- Maps HuggingFace parameters (max_new_tokens) to OpenAI format (max_tokens)
- Registry support for `huggingface:chat:<model>` syntax
- Updated documentation with chat provider examples
- New example in `examples/huggingface-chat/`
- Unit tests for chat completion functionality
Usage:
```yaml
providers:
- id: huggingface:chat:deepseek-ai/DeepSeek-R1
config:
temperature: 0.7
max_new_tokens: 1000
```
Fixes issue reported by @jameshiester where HuggingFace text-generation provider
was sending incorrect request format to the /v1/chat/completions endpoint.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
⏩ No test execution environment matched (885be34) View output ↗ View check history
|
|
should there be a smoke test? |
📝 WalkthroughWalkthroughThis pull request adds support for HuggingFace OpenAI-compatible chat completions. It introduces a new Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/providers/registry.ts (1)
1555-1582:⚠️ Potential issue | 🟡 MinorInclude sentence-similarity in the HuggingFace path error message. It’s a supported task but not listed, which can mislead users.
🛠️ Suggested update
- `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`, + `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`, ... - `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`, + `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`,
🤖 Fix all issues with AI agents
In `@examples/huggingface-chat/promptfooconfig.yaml`:
- Around line 5-42: Move the prompts section to appear before the providers
section in promptfooconfig.yaml so the top-level keys follow the required order
(description, env, prompts, providers, defaultTest, scenarios, tests);
specifically relocate the "prompts:" block (the "'What is 2+2?..." entry) above
the list of "providers:" entries and ensure missing intermediate keys (env,
defaultTest, scenarios) are present (can be empty) in that sequence to preserve
the mandated field order.
In `@site/docs/providers/huggingface.md`:
- Line 60: Restore the "Examples" heading to its original level instead of
demoting it; ensure the heading text "Examples" remains at the same Markdown
header level (the "### Examples" token) so existing anchors/links keep working
and do not change other surrounding headings or content.
- Around line 1-4: The front matter in the document currently only contains
sidebar_label and description; add the required YAML fields by inserting a title
(under 60 characters), a description expanded to ~150–160 characters, and a
numeric sidebar_position key alongside the existing sidebar_label and
description; update the existing description value (the "description" field) to
meet the 150–160 char length and ensure the newly added "title" and
"sidebar_position" keys are present near the top of the file so static site
tooling picks them up.
In `@src/providers/huggingface.ts`:
- Around line 712-714: Update the thrown Error message that references
providerPath in the Huggingface provider factory to include
"sentence-similarity" among the supported task types; locate the throw new
Error(...) that lists "huggingface:chat, huggingface:text-generation,
huggingface:feature-extraction, huggingface:text-classification,
huggingface:token-classification" and add
"huggingface:sentence-similarity:<model name>" to that list so the error
correctly reflects supported tasks.
- Around line 174-186: The auto-detect in useChatCompletionFormat() is too broad
because checking apiEndpoint.includes('/v1') will match both chat and non-chat
endpoints; update the detection so that it only returns true for chat-specific
endpoints (e.g., check for '/v1/chat' or '/v1/chat/completions' explicitly)
while leaving '/v1/completions' treated as non-chat; preserve the explicit
config.chatCompletion override and ensure the method returns false for generic
'/v1/completions' endpoints.
In `@test/providers/index.test.ts`:
- Around line 388-514: Add tests to cover HTTP 4xx/5xx errors, rate-limit
responses, invalid configuration, and token-usage tracking for the chat
completion path by extending the existing describe block that exercises
HuggingfaceTextGenerationProvider and its callApi method; specifically, add
cases that (1) mock fetch to return status 400/500 with an error body and assert
result.error contains the message, (2) mock a 429 response and/or a 200 with
rate-limit headers (eg. x-ratelimit-reset, retry-after) and assert the provider
surfaces rate-limit info, (3) call new HuggingfaceTextGenerationProvider
instances with invalid configs (e.g., missing apiEndpoint or invalid
chatCompletion types) and assert validation errors, and (4) mock responses that
include usage/token fields (or headers) and assert the provider records token
usage on result. Use the same test utilities (mockFetch, defaultMockResponse)
and follow the style of existing tests (parsing options.body, checking
messages/inputs) to implement these additional specs.
🧹 Nitpick comments (1)
examples/huggingface-chat/promptfooconfig.yaml (1)
35-37: Consider a quirky prompt for the simple test. It’ll make the example more engaging.Based on learnings: For trivial test cases in configuration, make them quirky and fun to increase engagement.
| description: HuggingFace Chat Completion Tests | ||
|
|
||
| providers: | ||
| # New dedicated huggingface:chat provider (recommended approach) | ||
| - id: huggingface:chat:deepseek-ai/DeepSeek-R1 | ||
| config: | ||
| temperature: 0.1 | ||
| max_new_tokens: 100 | ||
| label: DeepSeek-R1 | ||
|
|
||
| - id: huggingface:chat:meta-llama/Llama-3.3-70B-Instruct | ||
| config: | ||
| temperature: 0.1 | ||
| max_new_tokens: 100 | ||
| label: Llama-3.3-70B | ||
|
|
||
| - id: huggingface:chat:Qwen/Qwen2.5-72B-Instruct | ||
| config: | ||
| temperature: 0.1 | ||
| max_new_tokens: 100 | ||
| label: Qwen2.5-72B | ||
|
|
||
| # text-generation with auto-detection from URL (backward compatible) | ||
| - id: huggingface:text-generation:meta-llama/Llama-3.1-8B-Instruct | ||
| config: | ||
| apiEndpoint: https://router.huggingface.co/v1/chat/completions | ||
| temperature: 0.1 | ||
| max_new_tokens: 100 | ||
| label: Llama-3.1-8B (auto-detect) | ||
|
|
||
| prompts: | ||
| - 'What is 2+2? Answer with just the number.' | ||
|
|
||
| tests: | ||
| - vars: {} | ||
| assert: | ||
| - type: contains | ||
| value: '4' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorder sections to match required field order. prompts should come before providers.
🛠️ Suggested reorder
description: HuggingFace Chat Completion Tests
+prompts:
+ - 'What is 2+2? Answer with just the number.'
+
providers:
# New dedicated huggingface:chat provider (recommended approach)
- id: huggingface:chat:deepseek-ai/DeepSeek-R1
@@
-
-prompts:
- - 'What is 2+2? Answer with just the number.'
-
tests:
- vars: {}
assert:
- type: contains
value: '4'🤖 Prompt for AI Agents
In `@examples/huggingface-chat/promptfooconfig.yaml` around lines 5 - 42, Move the
prompts section to appear before the providers section in promptfooconfig.yaml
so the top-level keys follow the required order (description, env, prompts,
providers, defaultTest, scenarios, tests); specifically relocate the "prompts:"
block (the "'What is 2+2?..." entry) above the list of "providers:" entries and
ensure missing intermediate keys (env, defaultTest, scenarios) are present (can
be empty) in that sequence to preserve the mandated field order.
| --- | ||
| sidebar_label: HuggingFace | ||
| description: Configure HuggingFace's text classification, embedding, and NER models for LLM testing and eval tasks | ||
| description: Configure HuggingFace's chat models, text classification, embedding, and NER models for LLM testing and eval tasks | ||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add required front matter fields (and expand description length).
🛠️ Suggested front matter
---
+title: HuggingFace provider
sidebar_label: HuggingFace
-description: Configure HuggingFace's chat models, text classification, embedding, and NER models for LLM testing and eval tasks
+sidebar_position: 40
+description: Configure HuggingFace chat models, embeddings, classification, and NER for Promptfoo evals using Inference API tasks and OpenAI‑compatible chat endpoints and routers.
---📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --- | |
| sidebar_label: HuggingFace | |
| description: Configure HuggingFace's text classification, embedding, and NER models for LLM testing and eval tasks | |
| description: Configure HuggingFace's chat models, text classification, embedding, and NER models for LLM testing and eval tasks | |
| --- | |
| --- | |
| title: HuggingFace provider | |
| sidebar_label: HuggingFace | |
| sidebar_position: 40 | |
| description: Configure HuggingFace chat models, embeddings, classification, and NER for Promptfoo evals using Inference API tasks and OpenAI‑compatible chat endpoints and routers. | |
| --- |
🤖 Prompt for AI Agents
In `@site/docs/providers/huggingface.md` around lines 1 - 4, The front matter in
the document currently only contains sidebar_label and description; add the
required YAML fields by inserting a title (under 60 characters), a description
expanded to ~150–160 characters, and a numeric sidebar_position key alongside
the existing sidebar_label and description; update the existing description
value (the "description" field) to meet the 150–160 char length and ensure the
newly added "title" and "sidebar_position" keys are present near the top of the
file so static site tooling picks them up.
site/docs/providers/huggingface.md
Outdated
| ::: | ||
|
|
||
| ## Examples | ||
| ### Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t demote the “Examples” heading. It can break existing anchor links.
🛠️ Suggested change
-### Examples
+## Examples📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Examples | |
| ## Examples |
🤖 Prompt for AI Agents
In `@site/docs/providers/huggingface.md` at line 60, Restore the "Examples"
heading to its original level instead of demoting it; ensure the heading text
"Examples" remains at the same Markdown header level (the "### Examples" token)
so existing anchors/links keep working and do not change other surrounding
headings or content.
| private useChatCompletionFormat(): boolean { | ||
| // Explicit config takes precedence | ||
| if (this.config.chatCompletion !== undefined) { | ||
| return this.config.chatCompletion; | ||
| } | ||
| // Auto-detect based on endpoint URL | ||
| if (this.config.apiEndpoint) { | ||
| return ( | ||
| this.config.apiEndpoint.includes('/v1/chat/completions') || | ||
| this.config.apiEndpoint.includes('/v1') | ||
| ); | ||
| } | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
HuggingFace OpenAI-compatible API endpoints /v1/chat/completions /v1/completions documentation
💡 Result:
Hugging Face has two “OpenAI-compatible” surfaces that document /v1/chat/completions and/or /v1/completions:
- Hugging Face Inference Providers (HF Router)
- Chat Completions: documented under Inference Providers → Chat Completion; examples use the OpenAI SDK with
base_url="https://router.huggingface.co/v1"(the SDK then hits the OpenAI-style/v1/chat/completionsroute). [1] - (Legacy) Completions: documented under Inference Providers → Text Generation; examples show
base_url="https://router.huggingface.co/<provider>/v1/completions"(i.e., the OpenAI-style/v1/completionsroute). [2]
- Hugging Face Inference Endpoints running TGI (Text Generation Inference)
/v1/chat/completions: documented as the Messages API, “fully compatible with the OpenAI Chat Completion API,” with curl examples directly calling.../v1/chat/completions. [3]/v1/completions: documented at a high level in the TGI engine overview as “Fully OpenAI-compatible/v1/chator/v1/completionsAPIs”. [4]
Sources
- [1] Hugging Face Docs — Inference Providers: Chat Completion: https://huggingface.co/docs/inference-providers/en/tasks/chat-completion
- [2] Hugging Face Docs — Inference Providers: Text Generation: https://huggingface.co/docs/inference-providers/en/tasks/text-generation
- [3] Hugging Face Docs — TGI: Messages API: https://huggingface.co/docs/text-generation-inference/messages_api
- [4] Hugging Face Docs — Inference Endpoints (dedicated): TGI engine: https://huggingface.co/docs/inference-endpoints/engines/tgi
🏁 Script executed:
# First, find the HuggingFace provider file and understand the context
find . -name "huggingface.ts" -type f | head -5Repository: promptfoo/promptfoo
Length of output: 93
🏁 Script executed:
# Search for how useChatCompletionFormat is used in the provider
rg "useChatCompletionFormat" -A 5 -B 2Repository: promptfoo/promptfoo
Length of output: 1122
🏁 Script executed:
# Check what getChatProvider returns and how it handles the response
rg "getChatProvider" -A 10 src/providers/huggingface.tsRepository: promptfoo/promptfoo
Length of output: 806
🏁 Script executed:
# Check the inference API path to understand what format it expects
rg "callApi.*prompt" -B 3 -A 10 src/providers/huggingface.ts | head -50Repository: promptfoo/promptfoo
Length of output: 1277
🏁 Script executed:
# Check if there are tests for different endpoint patterns
find . -name "*huggingface*test*" -o -name "*huggingface*.spec.ts" | head -10Repository: promptfoo/promptfoo
Length of output: 110
🏁 Script executed:
# Look at the broader context of useChatCompletionFormat to see all endpoint paths users might pass
rg "apiEndpoint" src/providers/huggingface.ts -B 2 -A 2Repository: promptfoo/promptfoo
Length of output: 2334
Auto-detect is too broad and can misroute /v1/completions. HuggingFace offers both chat (/v1/chat/completions) and completions (/v1/completions) OpenAI-compatible endpoints with different request formats. The current check includes('/v1') matches both, causing completions endpoints to receive messages format. This needs narrowing to avoid breaking valid /v1/completions usage.
🛠️ Suggested narrowing
- if (this.config.apiEndpoint) {
- return (
- this.config.apiEndpoint.includes('/v1/chat/completions') ||
- this.config.apiEndpoint.includes('/v1')
- );
- }
+ if (this.config.apiEndpoint) {
+ if (this.config.apiEndpoint.includes('/v1/chat/completions')) {
+ return true;
+ }
+ // Treat base /v1 as OpenAI-compatible, but avoid /v1/completions or other non-chat paths
+ return /\/v1\/?$/.test(this.config.apiEndpoint);
+ }🤖 Prompt for AI Agents
In `@src/providers/huggingface.ts` around lines 174 - 186, The auto-detect in
useChatCompletionFormat() is too broad because checking
apiEndpoint.includes('/v1') will match both chat and non-chat endpoints; update
the detection so that it only returns true for chat-specific endpoints (e.g.,
check for '/v1/chat' or '/v1/chat/completions' explicitly) while leaving
'/v1/completions' treated as non-chat; preserve the explicit
config.chatCompletion override and ensure the method returns false for generic
'/v1/completions' endpoints.
src/providers/huggingface.ts
Outdated
| throw new Error( | ||
| `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`, | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include sentence-similarity in the factory error message. It’s a supported task but missing in the list.
🛠️ Suggested update
- `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`,
+ `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| throw new Error( | |
| `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>`, | |
| ); | |
| throw new Error( | |
| `Invalid Huggingface provider path: ${providerPath}. Use one of the following providers: huggingface:chat:<model name>, huggingface:text-generation:<model name>, huggingface:feature-extraction:<model name>, huggingface:text-classification:<model name>, huggingface:token-classification:<model name>, huggingface:sentence-similarity:<model name>`, | |
| ); |
🤖 Prompt for AI Agents
In `@src/providers/huggingface.ts` around lines 712 - 714, Update the thrown Error
message that references providerPath in the Huggingface provider factory to
include "sentence-similarity" among the supported task types; locate the throw
new Error(...) that lists "huggingface:chat, huggingface:text-generation,
huggingface:feature-extraction, huggingface:text-classification,
huggingface:token-classification" and add
"huggingface:sentence-similarity:<model name>" to that list so the error
correctly reflects supported tasks.
test/providers/index.test.ts
Outdated
| describe('HuggingfaceTextGenerationProvider chat completion format', () => { | ||
| it('auto-detects chat completion format from URL', async () => { | ||
| const mockResponse = { | ||
| ...defaultMockResponse, | ||
| text: vi.fn().mockResolvedValue( | ||
| JSON.stringify({ | ||
| choices: [{ message: { content: 'Chat response' } }], | ||
| }), | ||
| ), | ||
| }; | ||
| mockFetch.mockResolvedValue(mockResponse); | ||
|
|
||
| const provider = new HuggingfaceTextGenerationProvider('deepseek-ai/DeepSeek-R1', { | ||
| config: { | ||
| apiEndpoint: 'https://router.huggingface.co/v1/chat/completions', | ||
| apiKey: 'test-key', | ||
| }, | ||
| }); | ||
| const result = await provider.callApi('Test prompt'); | ||
|
|
||
| expect(mockFetch).toHaveBeenCalledTimes(1); | ||
| const [url, options] = mockFetch.mock.calls[0]; | ||
| expect(url).toBe('https://router.huggingface.co/v1/chat/completions'); | ||
| const body = JSON.parse(options.body); | ||
| expect(body).toHaveProperty('model', 'deepseek-ai/DeepSeek-R1'); | ||
| expect(body).toHaveProperty('messages'); | ||
| expect(body.messages[0]).toEqual({ role: 'user', content: 'Test prompt' }); | ||
| expect(result.output).toBe('Chat response'); | ||
| }); | ||
|
|
||
| it('uses explicit chatCompletion config', async () => { | ||
| const mockResponse = { | ||
| ...defaultMockResponse, | ||
| text: vi.fn().mockResolvedValue( | ||
| JSON.stringify({ | ||
| choices: [{ message: { content: 'Chat response' } }], | ||
| }), | ||
| ), | ||
| }; | ||
| mockFetch.mockResolvedValue(mockResponse); | ||
|
|
||
| const provider = new HuggingfaceTextGenerationProvider('my-model', { | ||
| config: { | ||
| apiEndpoint: 'https://my-custom-endpoint.com/api', | ||
| chatCompletion: true, | ||
| }, | ||
| }); | ||
| const result = await provider.callApi('Test prompt'); | ||
|
|
||
| expect(mockFetch).toHaveBeenCalledTimes(1); | ||
| const [, options] = mockFetch.mock.calls[0]; | ||
| const body = JSON.parse(options.body); | ||
| expect(body).toHaveProperty('messages'); | ||
| expect(result.output).toBe('Chat response'); | ||
| }); | ||
|
|
||
| it('maps HuggingFace parameters to OpenAI format', async () => { | ||
| const mockResponse = { | ||
| ...defaultMockResponse, | ||
| text: vi.fn().mockResolvedValue( | ||
| JSON.stringify({ | ||
| choices: [{ message: { content: 'Response' } }], | ||
| }), | ||
| ), | ||
| }; | ||
| mockFetch.mockResolvedValue(mockResponse); | ||
|
|
||
| const provider = new HuggingfaceTextGenerationProvider('model', { | ||
| config: { | ||
| apiEndpoint: 'https://api.example.com/v1/chat/completions', | ||
| temperature: 0.7, | ||
| top_p: 0.9, | ||
| max_new_tokens: 100, | ||
| }, | ||
| }); | ||
| await provider.callApi('Test'); | ||
|
|
||
| const [, options] = mockFetch.mock.calls[0]; | ||
| const body = JSON.parse(options.body); | ||
| expect(body.temperature).toBe(0.7); | ||
| expect(body.top_p).toBe(0.9); | ||
| expect(body.max_tokens).toBe(100); | ||
| }); | ||
|
|
||
| it('handles chat completion error response', async () => { | ||
| const mockResponse = { | ||
| ...defaultMockResponse, | ||
| text: vi.fn().mockResolvedValue( | ||
| JSON.stringify({ | ||
| error: { message: 'Model not found' }, | ||
| }), | ||
| ), | ||
| }; | ||
| mockFetch.mockResolvedValue(mockResponse); | ||
|
|
||
| const provider = new HuggingfaceTextGenerationProvider('model', { | ||
| config: { | ||
| apiEndpoint: 'https://api.example.com/v1/chat/completions', | ||
| }, | ||
| }); | ||
| const result = await provider.callApi('Test'); | ||
|
|
||
| expect(result.error).toContain('Model not found'); | ||
| }); | ||
|
|
||
| it('falls back to Inference API format when chatCompletion is false', async () => { | ||
| const mockResponse = { | ||
| ...defaultMockResponse, | ||
| text: vi.fn().mockResolvedValue(JSON.stringify({ generated_text: 'Output' })), | ||
| }; | ||
| mockFetch.mockResolvedValue(mockResponse); | ||
|
|
||
| const provider = new HuggingfaceTextGenerationProvider('model', { | ||
| config: { | ||
| apiEndpoint: 'https://api.example.com/v1/chat/completions', | ||
| chatCompletion: false, // Explicitly disable | ||
| }, | ||
| }); | ||
| await provider.callApi('Test'); | ||
|
|
||
| const [, options] = mockFetch.mock.calls[0]; | ||
| const body = JSON.parse(options.body); | ||
| expect(body).toHaveProperty('inputs'); | ||
| expect(body).toHaveProperty('parameters'); | ||
| expect(body).not.toHaveProperty('messages'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add required 4xx/5xx, rate‑limit, config‑validation, and token‑usage tests for the chat path. The new block covers success and body‑error cases only.
As per coding guidelines: Every provider must have tests covering: success cases, error cases (4xx, 5xx, rate limits), configuration validation, and token usage tracking.
🤖 Prompt for AI Agents
In `@test/providers/index.test.ts` around lines 388 - 514, Add tests to cover HTTP
4xx/5xx errors, rate-limit responses, invalid configuration, and token-usage
tracking for the chat completion path by extending the existing describe block
that exercises HuggingfaceTextGenerationProvider and its callApi method;
specifically, add cases that (1) mock fetch to return status 400/500 with an
error body and assert result.error contains the message, (2) mock a 429 response
and/or a 200 with rate-limit headers (eg. x-ratelimit-reset, retry-after) and
assert the provider surfaces rate-limit info, (3) call new
HuggingfaceTextGenerationProvider instances with invalid configs (e.g., missing
apiEndpoint or invalid chatCompletion types) and assert validation errors, and
(4) mock responses that include usage/token fields (or headers) and assert the
provider records token usage on result. Use the same test utilities (mockFetch,
defaultMockResponse) and follow the style of existing tests (parsing
options.body, checking messages/inputs) to implement these additional specs.
Security Review ✅No critical issues found. This PR adds a well-structured HuggingFace chat completion provider that extends the existing OpenAI provider. 🟡 Minor Observations (3 items)
Notes
Last updated: 2026-02-03T12:00:00Z | Reviewing: 885be34 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 All Clear
I reviewed this PR which adds a new HuggingFace chat completion provider. The changes implement provider infrastructure code that connects applications to HuggingFace's OpenAI-compatible API. After analyzing the data flows, API key handling, and inherited capabilities from the OpenAI provider, no LLM security vulnerabilities were found.
Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more
- Fix overly broad auto-detection: change `/v1` to `/v1/chat` to avoid matching non-chat endpoints like `/v1/completions` - Add sentence-similarity to error messages in registry - Remove unused factory function `createHuggingfaceProvider` - Fix example config field order: prompts before providers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@jameshiester Re: smoke test - looking at the existing smoke tests in What we could test in smoke tests without API calls:
What would require an integration test with a real API key:
I've also addressed the review feedback:
|
Update documentation and code comment to accurately reflect that auto-detection matches `/v1/chat` (not `/v1`) after the fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ensures auto-detection only triggers for /v1/chat endpoints, not /v1/completions (which would be a non-chat completion endpoint). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…verage - Remove requiresApiKey() override that suppressed auth error messages - Remove misleading repetition_penalty → frequency_penalty mapping - Fix URL stripping to use anchored regex instead of String.replace - Guard max_new_tokens mapping to not override explicit max_tokens - Shallow-copy providerOptions to prevent mutation bugs - Add dedicated test file with 29 tests covering identity, auth, URL handling, parameter mapping, errors, and registry integration - Move chat delegation tests from index.test.ts to dedicated file - Restore ## Examples heading in docs (was demoted, breaking anchors) - Update providers index to show huggingface:chat as primary example - Simplify example config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ce provider - getApiKey() now checks this.env?.HF_TOKEN and this.env?.HF_API_TOKEN, matching the provider-level env override pattern used by other OpenAI-compatible wrappers - HuggingfaceTextGenerationProvider.cleanup() forwards to inner chat provider, preventing MCP resource leaks when delegating Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Qualify tool calling and streaming as model-dependent in HF docs - Reduce example config to a single provider to avoid surprise costs - Add cost note to example README Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… IDs Update docs and example to use HuggingFace's officially recommended Inference Provider models that have been verified working via E2E testing: DeepSeek-R1, openai/gpt-oss-120b, Qwen2.5-Coder-32B, GLM-4.5, and google/gemma-3-27b-it. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…r routing Add support for HuggingFace's Inference Provider routing system, which allows routing requests to specific backend providers (Cerebras, Together, Fireworks AI, etc.). Users can specify a provider via: - Model name suffix: `huggingface:chat:org/model:provider-name` - Config option: `config.inferenceProvider: 'provider-name'` This enables access to models that require explicit provider selection, such as `Qwen/QwQ-32B:featherless-ai`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…lability Clarify that model name suffix takes precedence over inferenceProvider config, and note that available models/providers change over time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Adds support for HuggingFace's OpenAI-compatible chat completions API via a new
huggingface:chatprovider.HuggingfaceChatCompletionProviderclass extendingOpenAiChatCompletionProviderfor code reusehuggingface:text-generationwhenapiEndpointcontains/v1chatCompletion: true/falseconfig to explicitly control formatmax_new_tokens) to OpenAI format (max_tokens)huggingface:chat:<model>syntaxUsage
Test plan
examples/huggingface-chat/cc @jameshiester - this fixes the issue you reported where the HuggingFace text-generation provider was sending the wrong request format to
/v1/chat/completions.🤖 Generated with Claude Code