# Getting Started with MLC-LLM in Python

Here's a quick overview of how to get started with the MLC-LLM `ChatModule` in Python. In this tutorial, we will chat with the [Vicuna-7B](https://huggingface.co/lmsys/vicuna-7b-delta-v1.1) model, which was trained by fine-tuning LLaMa and developed by LMSYS.

## Environment Setup

Let's set up your environment, so you can successfully run the `ChatModule`. First, lets set up the Conda environment which we'll be running this notebook in.

```bash
conda create --name mlc-llm python=3.10
conda activate mlc-llm
```

Next, let's download the MLC-AI and MLC-Chat nightly build packages. Go to https://mlc.ai/package/ and replace the command below with the one that is appropriate for your hardware and OS. Let's say we are using CUDA 11.6 on Linux.

In [None]:
!pip install --pre --force-reinstall mlc-ai-nightly-cu116 mlc-chat-nightly-cu116 -f https://mlc.ai/wheels

Next, we can clone the [MLC-LLM project](https://github.com/mlc-ai/mlc-llm).

In [None]:
!git clone git@github.com:mlc-ai/mlc-llm.git
!cd mlc-llm
!git submodule update --init --recursive

Next, let's download the model weights for the Vicuna-7B model and the prebuilt model libraries from Github. In order to download the large weights, we'll have to use `git lfs`.

In [None]:
!conda install git git-lfs
!git lfs install

In [None]:
!mkdir -p mlc-llm/dist/prebuilt
!git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git mlc-llm/dist/prebuilt/lib

In [None]:
!git clone https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q3f16_0
!mv mlc-chat-vicuna-v1-7b-q3f16_0 mlc-llm/dist/prebuilt

## Let's Chat

Before we can chat with the model, we must first import a few libraries and instantiate a `ChatModule` instance.

In [1]:
from mlc_chat import ChatModule
import tvm

from IPython.display import clear_output

We must invoke the `ChatModule` with the appropriate device type, such as `vulkan`, `cuda`, etc.

In [2]:
cm = ChatModule(target="vulkan")

In order to load the model weights and prebuilt model library into the `ChatModule`, we have to first call the `reload` function.

In [3]:
lib = tvm.runtime.load_module("mlc-llm/dist/prebuilt/lib/vicuna-v1-7b-q3f16_0-vulkan.so")
cm.reload(lib=lib, model_path="mlc-llm/dist/prebuilt/mlc-chat-vicuna-v1-7b-q3f16_0")

That's all that's needed to set up the `ChatModule`. You can now chat with the model by inputting any prompt you'd like. Try it out below!

In [27]:
prompt = input("Prompt: ")
cm.prefill(input=prompt)

msg = None
while not cm.stopped():
    cm.decode()
    msg = cm.get_message()
    clear_output(wait=True)
    print(msg, flush=True)

Hello! How can I help you today?


To evaluate the speed of the chat bot, you can print some statistics.

In [28]:
cm.runtime_stats_text()

'prefill: 130.9 tok/s, decode: 35.8 tok/s'

By default, the `ChatModule` will keep a history of your chat. You can run the chat history by running the following.

In [29]:
cm.reset_chat()