Skip to content

Add Hugging Face Inference API provider support #868

@moncapitaine

Description

@moncapitaine

Overview

Add support for Hugging Face Inference API as an AI provider, enabling users to access thousands of open-source models for chat, embeddings, and specialized tasks.

Why Hugging Face?

  • Model diversity: Access to 500,000+ open models from the community
  • Specialized models: Domain-specific models (legal, medical, code, multilingual)
  • Open source: Many models available with permissive licenses
  • Academic/Research: Popular in research and academic environments
  • Cost-effective: Free tier + pay-per-use for larger workloads
  • Custom models: Users can deploy their own fine-tuned models

Capabilities

  • Chat Completion: Llama, Mistral, Qwen, Gemma, Phi, Falcon, and 100+ others
  • Embedding: BAAI/bge, sentence-transformers, multilingual embeddings, domain-specific
  • Vision: CLIP, LLaVA, Qwen-VL, and vision-language models
  • ⚠️ Function Calling: Model-dependent (some models support it)
  • Specialized: Code generation, translation, summarization, NER, etc.

Implementation Checklist

Backend

  • Create Hugging Face provider client in ai-service-client package
  • Implement chat completion (text-generation task)
  • Implement embedding generation (feature-extraction task)
  • Implement vision support (image-to-text, visual-question-answering)
  • Handle model-specific parameters (temperature, max_tokens, etc.)
  • Add model capability detection (via model card metadata)
  • Add to provider cache system
  • Add connection testing endpoint
  • Handle rate limits and quotas

Database

  • Add huggingface to provider enum (if needed)
  • Update provider configuration schema
  • Support model-specific configuration (task type, parameters)

GraphQL API

  • Add Hugging Face provider to AiServiceProvider mutations
  • Model discovery from Hugging Face API (filter by task)
  • Support task-based filtering (text-generation, feature-extraction, etc.)
  • Handle model availability (some models may be offline/loading)

Frontend

  • Add Hugging Face provider UI in /admin/ai-services
  • API token configuration form
  • Connection testing
  • Model selection with task filtering
  • Display model metadata (license, downloads, likes)

Documentation

  • Update /docs/admin/ai-models with Hugging Face
  • Update /docs/admin/ai-services with Hugging Face configuration
  • Add Hugging Face setup guide
  • Document popular model recommendations
  • Explain task types and use cases

Testing

  • Unit tests for Hugging Face client
  • Integration tests for chat completion (Llama, Mistral)
  • Integration tests for embeddings (bge, sentence-transformers)
  • E2E tests for provider configuration
  • Test model availability handling
  • Test rate limiting

API Details

Base URL: https://api-inference.huggingface.co
Authentication: Bearer token via Authorization header
Inference Endpoint: /models/{model_id}
Models List: Via Hugging Face Hub API (https://huggingface.co/api/models)

Task Types:

  • text-generation → Chat completion
  • feature-extraction → Embeddings
  • image-to-text → Vision
  • visual-question-answering → Vision + Q&A
  • Many more specialized tasks

Popular Models:

  • Chat: meta-llama/Llama-3.2-3B-Instruct, mistralai/Mistral-7B-Instruct-v0.3
  • Embedding: BAAI/bge-large-en-v1.5, sentence-transformers/all-MiniLM-L6-v2
  • Vision: Qwen/Qwen2-VL-7B-Instruct, llava-hf/llava-1.5-7b-hf

Challenges to Address

  1. Model variability: Different models have different parameters and capabilities
  2. Model availability: Some models may be "loading" or unavailable
  3. Rate limits: Free tier has strict rate limits
  4. No official capabilities API: Need to infer from model cards/tags
  5. Inconsistent APIs: Older vs newer models may have different request formats

Solutions

  • Use model tags/metadata to detect capabilities automatically
  • Implement retry logic for "model loading" responses
  • Cache model metadata to reduce API calls
  • Provide curated list of recommended models for common tasks
  • Support custom model IDs for user-deployed models

Resources

Related

  • Part of multi-provider support initiative
  • Enables open-source model experimentation
  • Critical for users wanting model flexibility and control

Priority

P1-high - Provides access to open-source models and specialized domains, enabling experimentation and cost optimization

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions