# Demo: CodeLlama-13b with MLC LLM

Recently, Meta unveiled [CodeLlama](https://github.com/facebookresearch/codellama), a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. This notebook demonstrates MLC LLM's support for the CodeLlama family:

- **[CodeLlama](https://huggingface.co/codellama/CodeLlama-13b-hf): a coding foundation LLM**
- **[CodeLlama-Instruct](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf): an instruction-tuned LLM for coding**
- **[CodeLlama-Python](https://huggingface.co/codellama/CodeLlama-13b-Python-hf): a Python specialized LLM**

In this respect, MLC LLM allows everyone to develop, optimize and deploy AI models natively on everyone's devices. Therefore, making possible the deployment of coding LLMs natively, acting as **a personal AI coding assistant**.

In this notebook, we walk over the steps of using MLC LLM to run these pre-compiled CodeLlama models! We have uploaded various versions of the pre-compiled and quantized CodeLlama models here: https://huggingface.co/mlc-ai.

Learn more about MLC LLM here: https://mlc.ai/mlc-llm/docs.

Here's an overview regarding each model's capabilities:

|                       | Code Completion | Infilling | Instruction/chat | Python specialist |
|-----------------------|-----------------|-----------|------------------|-------------------|
| CodeLlama-13b          |        X        |     X     |                  |                   |
| CodeLlama-13b-Python   |        X        |           |                  |         X         |
| CodeLlama-13b-Instruct |        X        |     X     |         X        |                   |

Click the button below to get started!

<a target="_blank" href="https://colab.research.google.com/github/mlc-ai/notebooks/blob/main/mlc-llm/models/demo_CodeLlama_13b.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Install MLC LLM

We will start from setting up the environment. First, let us create a new Conda environment, in which we will run the rest of the notebook.

```
conda create --name mlc-llm python=3.10
conda activate mlc-llm
```

**Google Colab**

- If you are running this in a Google Colab notebook, you would not need to create a conda environment.
- However, be sure to change your runtime to GPU by going to `Runtime` > `Change runtime type` and setting the Hardware accelerator to be "GPU".

If you are using CUDA, you can run the following command to confirm that CUDA is set up correctly, and check the driver version number as well as what GPUs are currently available for use.

In [None]:
!nvidia-smi

Next, let's download the MLC-AI and MLC-Chat nightly build packages. If you are running in a Colab environment, then you can just run the following command. Otherwise, go to https://mlc.ai/package/ and replace the command below with the one that is appropriate for your hardware and OS.

**Google Colab**: If you are using Colab, you may see the red warnings such as "You must restart the runtime in order to use newly installed versions." For our purpose, we can disregard them, the notebook will still run correctly.

In [None]:
!pip install --pre --force-reinstall mlc-ai-nightly-cu118 mlc-chat-nightly-cu118 -f https://mlc.ai/wheels

Let's confirm we have installed the packages successfully!

In [None]:
!python -c "import tvm; print('tvm installed properly!')"
!python -c "import mlc_chat; print('mlc_chat installed properly!')"

## Download Prebuilt Models and Library

The following commands will download all the available prebuilt libraries (e.g., `.so` files), including the precompiled CodeLlama models. This may take a while. If in **Google Colab**, you can verify that the files are being downloaded by clicking on the folder icon on the left.

Note: If you are NOT running in **Google Colab** you may need to run this line `!conda install git git-lfs` to install `git` and `git-lfs` before running the following cell.

In [None]:
!git lfs install

In [None]:
!mkdir -p dist/prebuilt
!git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib

#### CodeLlama-13b q4f16_1 prebuilt weights

In [None]:
!cd dist/prebuilt && git clone https://huggingface.co/mlc-ai/mlc-chat-CodeLlama-13b-hf-q4f16_1

#### CodeLlama-13b-Instruct q4f16_1 prebuilt weights

In [None]:
!cd dist/prebuilt && git clone https://huggingface.co/mlc-ai/mlc-chat-CodeLlama-13b-Instruct-hf-q4f16_1

#### CodeLlama-13b-Python q4f16_1 prebuilt weights

In [None]:
!cd dist/prebuilt && git clone https://huggingface.co/mlc-ai/mlc-chat-CodeLlama-13b-Python-hf-q4f16_1

In [None]:
# Restart colab
exit()

## Let's code with CodeLlama!

Let's first try a simple code completion task with the CodeLlama-Python.

In [None]:
from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout

In [None]:
codellama_python = ChatModule(model="CodeLlama-13b-Python-hf-q4f16_1", device="cuda")

In [None]:
prompt = """\
import argparse

def main(string: str):
    print(string)
    print(string[::-1])

if __name__ == "__main__":"""

output = codellama_python.generate(
    prompt=prompt,
    progress_callback=StreamToStdout(callback_interval=2)
)

In [None]:
print(prompt+output)

import argparse

def main(string: str):
    print(string)
    print(string[::-1])

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Checks if the provided string is a palindrome."))
    parser.add_argument("-s", "--string",
    help="The string to check."))

    args = parser.parse_args()
    main(args.string))


In [None]:
# Restart colab to initialize a new ChatModule
exit()

The CodeLlama models support infilling based on surrounding content. Let's try it with the foundation CodeLlama.

In [None]:
from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout

def text_infilling(prompt: str):
    prefix = prompt.split("<FILL>")[0]
    suffix = prompt.split("<FILL>")[1]
    return f"<PRE> {prefix} <SUF>{suffix} <MID>"

def print_infilling(prompt: str, output: str):
    print(prompt.replace("<FILL>", output.replace("<EOT>", "")))

In [None]:
codellama = ChatModule(model="CodeLlama-13b-hf-q4f16_1", device="cuda")

In [None]:
prompt = """\
# Installation instructions:
    <FILL>
This downloads the LLaMA inference code and installs the repository as a local pip package.
"""

output = codellama.generate(
    prompt=text_infilling(prompt),
    progress_callback=StreamToStdout(callback_interval=2)
)

In [None]:
print_infilling(prompt, output)

# Installation instructions:
    pip install llamapy

# Using the local pip package:

    import llamapy
    my_model = llamapy.LLaMA(n_components=2))

# Requirements:

    Python 3.x


# Installation (easy way):

    pip install git+https://github.com/BBIC-BBC/LLAMA


# Installation (advanced way)):


    1) Download the repository from Github:


        git clone https://github.com/BBIC-BBC/LLAMA


    2) Install the repository as a local pip package:


        cd LLAMA



        python setup.py install




# Using the local pip package:


    import llamapy
    my_model = llamapy.LLaMA(n_components=2))))



# Requirements:


    Python 3.x



# Installation (easy way):


    pip install git+https://github.com/BBIC-BBC/LLAMA



# Installation (advanced way)):



    1) Download the LLaMA inference code from Github:


        git clone https://github.com/BBIC-BBC/LLAMA


    2) Install the LLaMA inference code as a local pip package:


        cd LLaMA



        python setup.py inst

In [None]:
# Restart colab to create a new ChatModule
exit()

Finally, the CodeLlama-Instruct has instruction following ability for programming tasks.

In [None]:
from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout

In [None]:
codellama_instruct = ChatModule(model="CodeLlama-13b-Instruct-hf-q4f16_1", device="cuda")

In [None]:
prompt = ("Write a Java program that computes the set of sums of all contiguous"
          "sublists of a given list.")

output = codellama_instruct.generate(
    prompt=prompt,
    progress_callback=StreamToStdout(callback_interval=2)
)

Here is a possible implementation of the program:
```
import java.util.*;
public class SumOfSublists {
    public static void main(String[] args) {
        List<Integer> list = Arrays.asList(1, 2, 3, 4, 5));
        List<Integer> sums = new ArrayList<>();
        for (int i = 0; i < list.size(); i++) {
            int sum = 0;
            for (int j = i; j < list.size(); j++) {
                sum += list.get(j));
            }

            sums.add(sum));
        }


        System.out.println("The sums of all contiguous sublists are: " + sums));
    }


In [None]:
codellama_instruct.reset_chat()

In [None]:
prompt = ("Given an array of integers nums and an integer target, return"
          "indices of the two numbers such that they add up to target."
          " Write this program in Python.")

output = codellama_instruct.generate(
    prompt=prompt,
    progress_callback=StreamToStdout(callback_interval=2)
)

Here is a program in Python that solves the problem of finding the indices of two numbers in an array that add up to a target value:
```
def find_indices(nums, target):
    # Initialize two empty lists to store the indices of the two numbers
    for i in range(len(nums)))):
        for j in range(len(nums)))):
            if i != j and nums[i] + nums[j]] == target:
                indices = [i, j]]
    return indices


In [None]:
# Restart colab to create a new ChatModule
exit()