# MLC-LLM Raw Text Generation in Python

Here's a quick overview of how to perform raw text generation in Python. In this tutorial, we will be chatting with the Llama2 model. For the easiest setup, we recommend trying this out in a Google Colab notebook. Click the button below to get started!

Raw text generation allows the user to have more flexibility over the prompts, without being forced to create a new conversational template, making prompt customization easier. This serves other demands for APIs to handle LLM generation without the usual system prompts and other items.

Learn more about MLC LLM here: https://mlc.ai/mlc-llm/docs.

Click the button below to get started!

<a target="_blank" href="https://colab.research.google.com/github/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_raw_text_generation.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Install MLC LLM

We will start from setting up the environment. First, let us create a new Conda environment, in which we will run the rest of the notebook.

```
conda create --name mlc-llm python=3.10
conda activate mlc-llm
```

**Google Colab**

- If you are running this in a Google Colab notebook, you would not need to create a conda environment.
- However, be sure to change your runtime to GPU by going to `Runtime` > `Change runtime type` and setting the Hardware accelerator to be "GPU".

If you are using CUDA, you can run the following command to confirm that CUDA is set up correctly, and check the driver version number as well as what GPUs are currently available for use.

In [None]:
!nvidia-smi

Next, let's download the MLC-AI and MLC-Chat nightly build packages. If you are running in a Colab environment, then you can just run the following command. Otherwise, go to https://mlc.ai/package/ and replace the command below with the one that is appropriate for your hardware and OS.

**Google Colab**: If you are using Colab, you may see the red warnings such as "You must restart the runtime in order to use newly installed versions." For our purpose, we can disregard them, the notebook will still run correctly.

In [None]:
!pip install --pre --force-reinstall mlc-ai-nightly-cu118 mlc-chat-nightly-cu118 -f https://mlc.ai/wheels

Let's confirm we have installed the packages successfully!

In [None]:
!python -c "import tvm; print('tvm installed properly!')"
!python -c "import mlc_chat; print('mlc_chat installed properly!')"

## Download Prebuilt Models and Library

The following commands will download all the available prebuilt libraries (e.g., `.so` files). This may take a while. If in **Google Colab**, you can verify that the files are being downloaded by clicking on the folder icon on the left.

Note: If you are NOT running in **Google Colab** you may need to run this line `!conda install git git-lfs` to install `git` and `git-lfs` before running the following cell.

In [None]:
!git lfs install

In [None]:
!mkdir -p dist/prebuilt
!git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib

#### Llama-2-7b-chat q4f16_1 prebuilt weights

In [None]:
!cd dist/prebuilt && git clone https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4f16_1

In [9]:
# Restart colab
exit()

## Let's try raw text generation with Llama-2-7b-chat!

In [1]:
from mlc_chat import ChatModule, ChatConfig, ConvConfig
from mlc_chat.callback import StreamToStdout

Use a `ConvConfig` to define the generation settings. Since we will be using the `LM` template, which supports raw text generation, system prompts will not be executed if provided.

In [2]:
conv_config = ConvConfig(stop_tokens=[2,], add_bos=True, stop_str="[INST]")

Note that `conv_config` is an optional subfield of `chat_config`. The `LM` template serves the basic purposes of raw text generation.

In [3]:
chat_config = ChatConfig(conv_config=conv_config, conv_template="LM")

Using the `chat_config` we created, instantiate a `ChatModule`.

In [4]:
cm = ChatModule('Llama-2-7b-chat-hf-q4f16_1', chat_config=chat_config)

Let's depict our first prompt. Essentially the LLM will be fed with this exact piece of text, unlike other conversational templates that structure the conversation beforehand to abstract specific settings. However, to make the model follow conversations a chat structure should be provided. Specific tags should be placed, because the model was finetuned with those tags to accurately follow conversations. This allows users to build their own prompts without necessarily building a new template.

In [5]:
system_prompt = "<<SYS>>\nYou are a helpful, respectful and honest assistant.\n<</SYS>>\n\n"
inst_prompt = "What is mother nature?"

Concatenate system and instruction prompts, and add instruction tags before generation. As you can see, the model will correctly follow the conversation.

In [6]:
output = cm.generate(
   prompt=f"[INST] {system_prompt+inst_prompt} [/INST]",
   progress_callback=StreamToStdout(callback_interval=2),
)

Hello! I'm so glad you asked! Mother Nature is a term used to describe the natural world around us, including all living things and the environment that supports them. It encompasses everything from the tiniest microorganisms to the largest landscapes, and includes all the elements and processes that shape our planet.
Mother Nature is the source of all life, providing us with the air we breathe, the water we drink, the food we eat, and the beauty we behold. She is the foundation of our very existence, and yet, she is often taken for granted.
It's important to remember that Mother Nature is not just something we rely on for our survival, but she also provides us with endless opportunities for inspiration, creativity, and joy. From the majestic mountains to the rolling hills, from the sparkling oceans to the babbling brooks, Mother Nature offers us a never-ending array of wonders and marvels.
So, the next time you take a moment to appreciate the beauty of Mother Nature, remember that you

Structuring the conversation in this way is equivelent to using the following conversational template in MLC-LLM:

```cpp
Conversation Llama2() {
  Conversation conv;
  conv.name = "llama-2";
  conv.system =
      ("[INST] <<SYS>>\n\nYou are a helpful, respectful and honest assistant.\n<</SYS>>\n\n ");
  conv.roles = {"[INST]", "[/INST]"};
  conv.messages = {};
  conv.offset = 0;
  conv.separator_style = SeparatorStyle::kSepRoleMsg;
  conv.seps = {" "};
  conv.role_msg_sep = " ";
  conv.role_empty_sep = " ";
  conv.stop_tokens = {2};
  conv.stop_str = "[INST]";
  conv.add_bos = true;
  return conv;
}
```

In following case, since we do not add any tags, the model will just follow normal text completion because there isn't a chat structure.

**Note:** The `LM` template has no memory, so it will be reset every single generation (as if we would run `cm.reset_chat()`).

In [7]:
output = cm.generate(
   prompt="Life is a quality that distinguishes",
   progress_callback=StreamToStdout(callback_interval=2),
)

living beings from non-living matter. literally, it is characterized by growth, reproduction, metabolism, response to stimuli, and adaptation to their environment. The concept of life has puzzled scientists and philosophers for centuries, and there is no consensus on a definition that encompasses all aspects of life.
The most commonly used definition of life is the "chemical definition," which states that living things are composed of cells, which are the basic structural and functional units of life. Cells are made up of biomolecules such as DNA, RNA, and proteins, which perform a variety of functions necessary for life, such as metabolism, growth, and reproduction.
Another definition of life is the "functional definition," which states that living things have the ability to maintain homeostasis, or a stable internal environment, despite changes in the external environment. This means that living things are able to regulate their internal processes and maintain a stable balance of che