In this notebook, I provide a detailed guideline, **how one can donwload model weights for Meta's Codellama model and run `CodeLlama` from scratch without any external libraries alike Huggingface.

**Caution:** this notebook is not intended for anyone with `RAM < 16/Up`. Also you can't run it on Google Colab free Edition

Clone the official `codellama` repository

In [None]:
!git clone https://github.com/facebookresearch/codellama.git

enter into the repo

In [None]:
%cd codellama

list all the directories and files inside the directory

In [None]:
%ls

Downloading the dataset. (When you run the below script, in the terminal, you'll be asked to input a link provided to you via email by Meta for accessing `Codellama` input that)

In [None]:
!bash download.sh

note: in the model selection to be downloaded, end the model name with `,` like `7b,` (while prompted with model selection)

now let's install all the dependences using pip....

In [None]:
!pip install .

Now we come to the main part

below, I have written a modified script based on `example.py` file present in `codellama` repo. It streamlines the process and as well as provide you with the opportunity to curate the code based on your hardware and requirements.

You can obviously try choose to execute example.py file from command line but if you are having trouble, then you can't fix it. And let's be honest, if you know programming, its good to see some code : )

In [None]:
from typing import Optional

import os
import torch
import torch.distributed as dist

# this is important due to pytorch distributed being used. Otherwise you'll encounter an error
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355'
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

# Initialize the distributed environment
torch.cuda.set_device(0)  # Set the desired GPU device
dist.init_process_group(backend='nccl', init_method='env://', rank=0, world_size=1)


from llama import Llama

def generate_text_from_prompts(
    ckpt_dir: str,
    tokenizer_path: str,
    prompts: list,
    temperature: float = 0.2,
    top_p: float = 0.9,
    max_seq_len: int = 256,
    max_batch_size: int = 4,
    max_gen_len: Optional[int] = None,
):
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    results = generator.text_completion(
        prompts,
        max_gen_len=max_gen_len,
        temperature=temperature,
        top_p=top_p,
    )

    generated_texts = []
    for prompt, result in zip(prompts, results):
        generated_texts.append(result['generation'])

    return generated_texts

# Example usage
if __name__ == "__main__":
    ckpt_dir = "<directory where codellama model was saved>"
    tokenizer_path = "<path to the tokenizer.model file>"
    prompts = [
        "import socket\n\ndef ping_exponential_backoff(host: str):",
        "import argparse\n\n",
    ]

    generated_texts = generate_text_from_prompts(
        ckpt_dir, tokenizer_path, prompts
    )

    for prompt, generated_text in zip(prompts, generated_texts):
        print(prompt)
        print(f"> {generated_text}")
        print("\n==================================\n")
