DotNet LLM Eval Samples

Overview

This repository provides samples and examples for evaluating and monitoring Large Language Models (LLMs) in .NET applications. The focus is on observability through traces, metrics, and logging using popular tools and existing systems such as OpenTelemetry, Grafana, Azure Monitor, System.Diagnostics, Semantic Kernel, xUnit, and Polyglots.

The goal of this project is to offer easy-to-integrate solutions for evaluating LLMs within existing .NET systems. By providing samples that seamlessly fit into CI/CD workflows, including GitHub Actions, we aim to enhance the dotnet ecosystem for Machine Learning (ML) and foster integration with commonly used tools.

Motivation

While there are existing evaluation frameworks for LLMs, such as OpenAI evals, ffmodel, Azure Prompt Flow, PromptBench, TraceLoop, and ToolTalk, our motivation for creating this sample repository is to address the need for integration with existing dotnet systems. We recognize the importance of simplicity in integration, especially in CI/CD pipelines, and we want to bridge the gap for the dotnet ML community.

Using Polyglots provides a familiar environment for those accustomed to Jupyter Notebooks, and Semantic Kernel offers maintainability benefits for systems already utilizing it. We acknowledge that introducing new tools or frameworks may not always be desirable, and our samples aim to provide options for those looking to avoid adding unnecessary complexity to their existing solutions.

Samples

1. Unit Tests

Illustrates how to conduct unit tests for LLMs in a .NET environment. These tests will cover various aspects of model evaluation, ensuring the robustness and correctness of the implemented logic.

Check the UserStoryGenerator.Tests project to get started.

2. CI/CD Integration

Demonstrates the integration of LLM evaluation into a CI/CD pipeline using GitHub Actions. This sample showcases how to automate the evaluation process as part of the development workflow. (WORK IN PROGRESS)

Check the DotNet GitHub actions workflow to get started.

3. Batch Evaluation

Provides examples of batch evaluation processes for large files using dotnet. This sample focuses on efficient processing and monitoring/analyzing data, emphasizing scalability and performance.

Check the Batch Evaluation Notebook to get started.

Getting Started

To get started with the samples, refer to the individual README files within each sample directory. Follow the step-by-step instructions to integrate LLM evaluation into your dotnet applications seamlessly.

Check the Batch Evaluation Notebook to get started.

OpenTelemetry + Grafana dashboard

You need to open this project either with GitHub Codespaces, or a docker enabled machine. Go to the /infra/dashboard and execute docker-compose up:

cd /infra/dashboard
docker-compose up

Prometheus explorer should be on the port 9090 and grafana dashboard on the port 3000. It combines the metrics generated by BatchEval + built-in Semantic Kernel ones.

You can import the sample dashboard in Grafana.

Contribution

Contributions are welcome! If you have additional samples, improvements, or ideas, please open an issue or submit a pull request. We aim to make this repository a collaborative resource for the dotnet ML community.

License

This repository is licensed under the MIT License - see the LICENSE file for details. Feel free to use, modify, and share these samples in accordance with the license terms.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
examples		examples
infra/dashboard		infra/dashboard
notebooks		notebooks
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md

License

microsoft/dotnet-llm-eval-samples

Folders and files

Latest commit

History

Repository files navigation

DotNet LLM Eval Samples

Overview

Motivation

Samples

1. Unit Tests

2. CI/CD Integration

3. Batch Evaluation

Getting Started

OpenTelemetry + Grafana dashboard

Contribution

License

Contributing

Trademarks

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages