# MLRunLLM: Unleashing the Power of LLMs with MLRun

## Introduction

[MLRun](https://www.mlrun.org/) is an open-source AI orchestration framework for managing ML and generative AI applications across their lifecycle.

Unlock the full potential of your LLMs with MLRun. Our powerful platform streamlines deployment, offering:

* Comprehensive Model Tracking: Effortlessly monitor performance, versions, and drifts 
* Intelligent Optimizations: Automatically boost efficiency and accuracy 
* Seamless Integration: Fits smoothly into your existing ML workflow
* Scalable Architecture: Grow from proof-of-concept to production with ease

Plus, enjoy advanced features like:
* Real-time monitoring<br>
* Automated retraining<br>

## Setup

First, ensure you have the `mlrun` package installed:

In [None]:
%pip install mlrun

Import necessary libraries and set up your environment:

In [None]:
import os

import mlrun

## Creating an MLRun Project

Initialize your MLRun project:

In [None]:
project = mlrun.get_or_create_project(name="mlrun-langchain-example", context="./")

## Setting Up the Serving Function

Configure the serving function for your LLM: <br>
(You can create your own or take one from `mlrun.frameworks`)

In [None]:
serving_func = project.set_function(
    func="llm_model_server.py",  # Your serving function
    name="Qwen2-0.5B-Instruct",  # The name you want to give your function
    kind="serving",  # Always serving for this type of functions
    image="mlrun/mlrun",  # This is mlrun's default image, you can use your own
)

## Adding the Model to the Server

Add your chosen model to the server: <br>
(This action depends on your model server and it's parameters)

In [None]:
serving_func.add_model(
    "huggingface_local_model",  # Model name
    class_name="LLMModelServer",
    llm_type="HuggingFace",
    model_name="Qwen/Qwen2-0.5B-Instruct",
    model_path=".",
)

## Deploying the Model Server

Deploy your model server with GPU support:

In [None]:
serving_func.with_limits(gpus=1)  # If you want to add one GPU
server = serving_func.deploy()

## Initializing and Using MLRunLLM

Now that your model is deployed, initialize and use MLRunLLM:

In [None]:
from mlrun_llm_local import Mlrun

llm = Mlrun(
    server, "huggingface_local_model"
)  # Give the serving function and the name we gave the deployd model

# Example usage
response = llm.invoke("What is the best jacket to wear with jeans?")
print(response)

## Conclusion

This notebook demonstrates the basics of using MLRun to deploy and interact with an LLM. MLRun's features for model tracking, optimization, and scalability make it an excellent choice for managing LLMs in production environments.

For more advanced usage and detailed documentation, visit the [MLRun documentation](https://docs.mlrun.org/).