-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Overview
Add support for Hugging Face Inference API as an AI provider, enabling users to access thousands of open-source models for chat, embeddings, and specialized tasks.
Why Hugging Face?
- Model diversity: Access to 500,000+ open models from the community
- Specialized models: Domain-specific models (legal, medical, code, multilingual)
- Open source: Many models available with permissive licenses
- Academic/Research: Popular in research and academic environments
- Cost-effective: Free tier + pay-per-use for larger workloads
- Custom models: Users can deploy their own fine-tuned models
Capabilities
- ✅ Chat Completion: Llama, Mistral, Qwen, Gemma, Phi, Falcon, and 100+ others
- ✅ Embedding: BAAI/bge, sentence-transformers, multilingual embeddings, domain-specific
- ✅ Vision: CLIP, LLaVA, Qwen-VL, and vision-language models
⚠️ Function Calling: Model-dependent (some models support it)- ✅ Specialized: Code generation, translation, summarization, NER, etc.
Implementation Checklist
Backend
- Create Hugging Face provider client in
ai-service-clientpackage - Implement chat completion (text-generation task)
- Implement embedding generation (feature-extraction task)
- Implement vision support (image-to-text, visual-question-answering)
- Handle model-specific parameters (temperature, max_tokens, etc.)
- Add model capability detection (via model card metadata)
- Add to provider cache system
- Add connection testing endpoint
- Handle rate limits and quotas
Database
- Add
huggingfaceto provider enum (if needed) - Update provider configuration schema
- Support model-specific configuration (task type, parameters)
GraphQL API
- Add Hugging Face provider to
AiServiceProvidermutations - Model discovery from Hugging Face API (filter by task)
- Support task-based filtering (text-generation, feature-extraction, etc.)
- Handle model availability (some models may be offline/loading)
Frontend
- Add Hugging Face provider UI in
/admin/ai-services - API token configuration form
- Connection testing
- Model selection with task filtering
- Display model metadata (license, downloads, likes)
Documentation
- Update
/docs/admin/ai-modelswith Hugging Face - Update
/docs/admin/ai-serviceswith Hugging Face configuration - Add Hugging Face setup guide
- Document popular model recommendations
- Explain task types and use cases
Testing
- Unit tests for Hugging Face client
- Integration tests for chat completion (Llama, Mistral)
- Integration tests for embeddings (bge, sentence-transformers)
- E2E tests for provider configuration
- Test model availability handling
- Test rate limiting
API Details
Base URL: https://api-inference.huggingface.co
Authentication: Bearer token via Authorization header
Inference Endpoint: /models/{model_id}
Models List: Via Hugging Face Hub API (https://huggingface.co/api/models)
Task Types:
text-generation→ Chat completionfeature-extraction→ Embeddingsimage-to-text→ Visionvisual-question-answering→ Vision + Q&A- Many more specialized tasks
Popular Models:
- Chat:
meta-llama/Llama-3.2-3B-Instruct,mistralai/Mistral-7B-Instruct-v0.3 - Embedding:
BAAI/bge-large-en-v1.5,sentence-transformers/all-MiniLM-L6-v2 - Vision:
Qwen/Qwen2-VL-7B-Instruct,llava-hf/llava-1.5-7b-hf
Challenges to Address
- Model variability: Different models have different parameters and capabilities
- Model availability: Some models may be "loading" or unavailable
- Rate limits: Free tier has strict rate limits
- No official capabilities API: Need to infer from model cards/tags
- Inconsistent APIs: Older vs newer models may have different request formats
Solutions
- Use model tags/metadata to detect capabilities automatically
- Implement retry logic for "model loading" responses
- Cache model metadata to reduce API calls
- Provide curated list of recommended models for common tasks
- Support custom model IDs for user-deployed models
Resources
Related
- Part of multi-provider support initiative
- Enables open-source model experimentation
- Critical for users wanting model flexibility and control
Priority
P1-high - Provides access to open-source models and specialized domains, enabling experimentation and cost optimization
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request