Add Hugging Face Inference API provider support

## Overview

Add support for Hugging Face Inference API as an AI provider, enabling users to access thousands of open-source models for chat, embeddings, and specialized tasks.

## Why Hugging Face?

- **Model diversity**: Access to 500,000+ open models from the community
- **Specialized models**: Domain-specific models (legal, medical, code, multilingual)
- **Open source**: Many models available with permissive licenses
- **Academic/Research**: Popular in research and academic environments
- **Cost-effective**: Free tier + pay-per-use for larger workloads
- **Custom models**: Users can deploy their own fine-tuned models

## Capabilities

- ✅ **Chat Completion**: Llama, Mistral, Qwen, Gemma, Phi, Falcon, and 100+ others
- ✅ **Embedding**: BAAI/bge, sentence-transformers, multilingual embeddings, domain-specific
- ✅ **Vision**: CLIP, LLaVA, Qwen-VL, and vision-language models
- ⚠️ **Function Calling**: Model-dependent (some models support it)
- ✅ **Specialized**: Code generation, translation, summarization, NER, etc.

## Implementation Checklist

### Backend
- [ ] Create Hugging Face provider client in `ai-service-client` package
- [ ] Implement chat completion (text-generation task)
- [ ] Implement embedding generation (feature-extraction task)
- [ ] Implement vision support (image-to-text, visual-question-answering)
- [ ] Handle model-specific parameters (temperature, max_tokens, etc.)
- [ ] Add model capability detection (via model card metadata)
- [ ] Add to provider cache system
- [ ] Add connection testing endpoint
- [ ] Handle rate limits and quotas

### Database
- [ ] Add `huggingface` to provider enum (if needed)
- [ ] Update provider configuration schema
- [ ] Support model-specific configuration (task type, parameters)

### GraphQL API
- [ ] Add Hugging Face provider to `AiServiceProvider` mutations
- [ ] Model discovery from Hugging Face API (filter by task)
- [ ] Support task-based filtering (text-generation, feature-extraction, etc.)
- [ ] Handle model availability (some models may be offline/loading)

### Frontend
- [ ] Add Hugging Face provider UI in `/admin/ai-services`
- [ ] API token configuration form
- [ ] Connection testing
- [ ] Model selection with task filtering
- [ ] Display model metadata (license, downloads, likes)

### Documentation
- [ ] Update `/docs/admin/ai-models` with Hugging Face
- [ ] Update `/docs/admin/ai-services` with Hugging Face configuration
- [ ] Add Hugging Face setup guide
- [ ] Document popular model recommendations
- [ ] Explain task types and use cases

### Testing
- [ ] Unit tests for Hugging Face client
- [ ] Integration tests for chat completion (Llama, Mistral)
- [ ] Integration tests for embeddings (bge, sentence-transformers)
- [ ] E2E tests for provider configuration
- [ ] Test model availability handling
- [ ] Test rate limiting

## API Details

**Base URL**: `https://api-inference.huggingface.co`
**Authentication**: Bearer token via `Authorization` header
**Inference Endpoint**: `/models/{model_id}`
**Models List**: Via Hugging Face Hub API (`https://huggingface.co/api/models`)

**Task Types**:
- `text-generation` → Chat completion
- `feature-extraction` → Embeddings
- `image-to-text` → Vision
- `visual-question-answering` → Vision + Q&A
- Many more specialized tasks

**Popular Models**:
- **Chat**: `meta-llama/Llama-3.2-3B-Instruct`, `mistralai/Mistral-7B-Instruct-v0.3`
- **Embedding**: `BAAI/bge-large-en-v1.5`, `sentence-transformers/all-MiniLM-L6-v2`
- **Vision**: `Qwen/Qwen2-VL-7B-Instruct`, `llava-hf/llava-1.5-7b-hf`

## Challenges to Address

1. **Model variability**: Different models have different parameters and capabilities
2. **Model availability**: Some models may be "loading" or unavailable
3. **Rate limits**: Free tier has strict rate limits
4. **No official capabilities API**: Need to infer from model cards/tags
5. **Inconsistent APIs**: Older vs newer models may have different request formats

## Solutions

- Use model tags/metadata to detect capabilities automatically
- Implement retry logic for "model loading" responses
- Cache model metadata to reduce API calls
- Provide curated list of recommended models for common tasks
- Support custom model IDs for user-deployed models

## Resources

- [Hugging Face Inference API Docs](https://huggingface.co/docs/api-inference/index)
- [Model Hub](https://huggingface.co/models)
- [Supported Tasks](https://huggingface.co/tasks)
- [Rate Limits](https://huggingface.co/docs/api-inference/rate-limits)

## Related

- Part of multi-provider support initiative
- Enables open-source model experimentation
- Critical for users wanting model flexibility and control

## Priority

**P1-high** - Provides access to open-source models and specialized domains, enabling experimentation and cost optimization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Hugging Face Inference API provider support #868

Overview

Why Hugging Face?

Capabilities

Implementation Checklist

Backend

Database

GraphQL API

Frontend

Documentation

Testing

API Details

Challenges to Address

Solutions

Resources

Related

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Hugging Face Inference API provider support #868

Description

Overview

Why Hugging Face?

Capabilities

Implementation Checklist

Backend

Database

GraphQL API

Frontend

Documentation

Testing

API Details

Challenges to Address

Solutions

Resources

Related

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions