Run local LLMs in .NET through IChatClient β the same interface you'd use for Azure OpenAI, Ollama, or any other provider. Powered by ONNX Runtime GenAI.
- π
IChatClientimplementation β seamless integration with Microsoft.Extensions.AI - π¦ Automatic model download β models are fetched from HuggingFace on first use
- π Zero friction β works out of the box with sensible defaults (Phi-3.5 mini)
- π₯οΈ Multi-hardware β CPU, CUDA, and DirectML execution providers
- π DI-friendly β register with
AddLocalLLMs()in ASP.NET Core - π Streaming β token-by-token streaming via
GetStreamingResponseAsync - π Multi-model β switch between Phi-3.5, Phi-4, Qwen2.5, Llama 3.2, and more
dotnet add package ElBruno.LocalLLMsThis works everywhere (CPU). To enable GPU acceleration, add one extra package:
# π’ NVIDIA GPU (CUDA):
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda
# π΅ Any Windows GPU β AMD, Intel, NVIDIA (DirectML):
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.DirectMLπ The library defaults to
ExecutionProvider.Autoβ it tries GPU first and falls back to CPU automatically. No code changes needed.
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
// Create a local chat client (downloads Phi-3.5 mini on first run)
using var client = await LocalChatClient.CreateAsync();
var response = await client.GetResponseAsync([
new(ChatRole.User, "What is the capital of France?")
]);
Console.WriteLine(response.Text);using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
using var client = await LocalChatClient.CreateAsync(new LocalLLMsOptions
{
Model = KnownModels.Phi35MiniInstruct
});
await foreach (var update in client.GetStreamingResponseAsync([
new(ChatRole.System, "You are a helpful assistant."),
new(ChatRole.User, "Explain quantum computing in simple terms.")
]))
{
Console.Write(update.Text);
}builder.Services.AddLocalLLMs(options =>
{
options.Model = KnownModels.Phi35MiniInstruct;
options.ExecutionProvider = ExecutionProvider.DirectML;
});
// Inject IChatClient anywhere
public class MyService(IChatClient chatClient) { ... }| Tier | Model | Parameters | ONNX | ID |
|---|---|---|---|---|
| βͺ Tiny | TinyLlama-1.1B-Chat | 1.1B | β Native | tinyllama-1.1b-chat |
| βͺ Tiny | SmolLM2-1.7B-Instruct | 1.7B | β Native | smollm2-1.7b-instruct |
| βͺ Tiny | Qwen2.5-0.5B-Instruct | 0.5B | β Native | qwen2.5-0.5b-instruct |
| βͺ Tiny | Qwen2.5-1.5B-Instruct | 1.5B | β Native | qwen2.5-1.5b-instruct |
| βͺ Tiny | Gemma-2B-IT | 2B | β Native | gemma-2b-it |
| βͺ Tiny | StableLM-2-1.6B-Chat | 1.6B | π Convert | stablelm-2-1.6b-chat |
| π’ Small | Phi-3.5 mini instruct | 3.8B | β Native | phi-3.5-mini-instruct |
| π’ Small | Qwen2.5-3B-Instruct | 3B | β Native | qwen2.5-3b-instruct |
| π’ Small | Llama-3.2-3B-Instruct | 3B | β Native | llama-3.2-3b-instruct |
| π’ Small | Gemma-2-2B-IT | 2B | β Native | gemma-2-2b-it |
| π‘ Medium | Qwen2.5-7B-Instruct | 7B | β Native | qwen2.5-7b-instruct |
| π‘ Medium | Llama-3.1-8B-Instruct | 8B | β Native | llama-3.1-8b-instruct |
| π‘ Medium | Mistral-7B-Instruct-v0.3 | 7B | β Native | mistral-7b-instruct-v0.3 |
| π‘ Medium | Gemma-2-9B-IT | 9B | β Native | gemma-2-9b-it |
| π‘ Medium | Phi-4 | 14B | β Native | phi-4 |
| π‘ Medium | DeepSeek-R1-Distill-Qwen-14B | 14B | β Native | deepseek-r1-distill-qwen-14b |
| π‘ Medium | Mistral-Small-24B-Instruct | 24B | β Native | mistral-small-24b-instruct |
| π΄ Large | Qwen2.5-14B-Instruct | 14B | β Native | qwen2.5-14b-instruct |
| π΄ Large | Qwen2.5-32B-Instruct | 32B | β Native | qwen2.5-32b-instruct |
| π΄ Large | Llama-3.3-70B-Instruct | 70B | β ONNX | llama-3.3-70b-instruct |
| π΄ Large | Mixtral-8x7B-Instruct-v0.1 | 8x7B | π Convert | mixtral-8x7b-instruct-v0.1 |
| π΄ Large | DeepSeek-R1-Distill-Llama-70B | 70B | π Convert | deepseek-r1-distill-llama-70b |
| π΄ Large | Command-R (35B) | 35B | π Convert | command-r-35b |
See the Supported Models Guide for detailed model cards, performance benchmarks, and selection guidance.
| Sample | Description |
|---|---|
| HelloChat | Minimal console chat |
| StreamingChat | Token-by-token streaming |
| MultiModelChat | Switch models at runtime |
| DependencyInjection | ASP.NET Core DI registration |
- .NET 8.0 or .NET 10.0
- CPU (default), NVIDIA GPU (CUDA), or Windows GPU (DirectML)
- ~2-8 GB disk space per model (depending on size and quantization)
- Getting Started β installation, first steps, configuration
- Supported Models β full model reference with tiers, specs, decision tree
- Architecture β design decisions and internal structure
- Samples Guide β walkthrough of each sample application
- Benchmarks β how to run and interpret performance benchmarks
- ONNX Conversion β converting HuggingFace models to ONNX format
- Publishing β NuGet package publishing with OIDC
- Contributing β how to contribute
- Changelog β version history
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License β see the LICENSE file for details.
Hi! I'm ElBruno π§‘, a passionate developer and content creator exploring AI, .NET, and modern development practices.
Made with β€οΈ by ElBruno
If you like this project, consider following my work across platforms:
- π» Podcast: No Tienen Nombre β Spanish-language episodes on AI, development, and tech culture
- π» Blog: ElBruno.com β Deep dives on embeddings, RAG, .NET, and local AI
- πΊ YouTube: youtube.com/elbruno β Demos, tutorials, and live coding
- π LinkedIn: @elbruno β Professional updates and insights
- π Twitter: @elbruno β Quick tips, releases, and tech news