Hey everyone, this is a notebook i used to play around with a LLM for code generation. Even though i don't know how exactly everything works, but I made this on a full speed mode, resolved all the issues so that it can be run on free trial version. Hope you enjoy it and learn from it.

# Python Code Generator with Fine-Tuned CodeLLaMA 7B
Fine-tuned CodeLLaMA 7B in Google Colab (free tier) to generate Python code from natural language prompts. Deployed a Gradio UI for interactive demos.

## Features
- Fine-tuned on 5,000 CodeSearchNet examples.
- Optimized with 4-bit quantization and LoRA for Colab’s ~12GB RAM and T4 GPU.
- Generates Python functions (e.g., "Write a function to reverse a string" → `def reverse_string(s): return s[::-1]`).

## Setup
1. Open in Colab.
2. Run cells sequentially.
3. Access the Gradio UI via the public link.

## Challenges Overcome
- RAM crashes: Used quantization and bfloat16.
- Tokenizer padding: Set `eos_token` as pad token.
- Training setup: Added labels for loss computation.

In [None]:
# This will restart the entire runtime
import os
os._exit(00)

In [2]:
!pip install torch transformers datasets peft accelerate bitsandbytes



In [2]:
from datasets import load_dataset

dataset = load_dataset("code_search_net", "python")["train"]
print(dataset[0])  # Print the first row to see its keys

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


{'repository_name': 'mjirik/imcut', 'func_path_in_repository': 'imcut/pycut.py', 'func_name': 'ImageGraphCut.__msgc_step3_discontinuity_localization', 'whole_func_string': 'def __msgc_step3_discontinuity_localization(self):\n        """\n        Estimate discontinuity in basis of low resolution image segmentation.\n        :return: discontinuity in low resolution\n        """\n        import scipy\n\n        start = self._start_time\n        seg = 1 - self.segmentation.astype(np.int8)\n        self.stats["low level object voxels"] = np.sum(seg)\n        self.stats["low level image voxels"] = np.prod(seg.shape)\n        # in seg is now stored low resolution segmentation\n        # back to normal parameters\n        # step 2: discontinuity localization\n        # self.segparams = sparams_hi\n        seg_border = scipy.ndimage.filters.laplace(seg, mode="constant")\n        logger.debug("seg_border: %s", scipy.stats.describe(seg_border, axis=None))\n        # logger.debug(str(np.max(seg_bo

In [3]:
from datasets import load_dataset, Dataset

# Load dataset
dataset = load_dataset("code_search_net", "python")["train"]

# Filter to 5,000 examples and format
small_dataset = [{"text": f"[Prompt]: {row['func_documentation_string']} [Code]: {row['whole_func_string']}"}
                 for row in dataset.shuffle().select(range(5000))]

# Convert to Dataset object
dataset = Dataset.from_list(small_dataset)

# Check a sample
print(dataset[0]["text"])

[Prompt]: Pops top 2 operands out of the stack, and checks
        if 1st operand AND (logical) 2nd operand (top of the stack),
        pushes 0 if False, not 0 if True.

        8 bit un/signed version [Code]: def _and8(ins):
    """ Pops top 2 operands out of the stack, and checks
        if 1st operand AND (logical) 2nd operand (top of the stack),
        pushes 0 if False, not 0 if True.

        8 bit un/signed version
    """
    op1, op2 = tuple(ins.quad[2:])
    if _int_ops(op1, op2) is not None:
        op1, op2 = _int_ops(op1, op2)

        output = _8bit_oper(op1)  # Pops the stack (if applicable)
        if op2 != 0:  # X and True = X
            output.append('push af')
            return output

        # False and X = False
        output.append('xor a')
        output.append('push af')
        return output

    output = _8bit_oper(op1, op2)
    # output.append('call __AND8')
    lbl = tmp_label()
    output.append('or a')
    output.append('jr z, %s' % lbl)
    output.

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
import torch

# Load model and tokenizer with 4-bit quantization
model_name = "codellama/CodeLLaMA-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # Set padding token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    device_map="auto"
)

# Apply LoRA
config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
model = get_peft_model(model, config)

# Tokenize dataset with padding and labels
def tokenize_function(examples):
    tokenized = tokenizer(examples["text"], truncation=True, padding="max_length", max_length=256)
    tokenized["labels"] = tokenized["input_ids"].copy()  # Add labels
    return tokenized

tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])

# Set up training
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    logging_steps=10,
    save_steps=500,
    fp16=True,
)
trainer = Trainer(model=model, args=training_args, train_dataset=tokenized_dataset)

# Train and save
trainer.train()
model.save_pretrained("/content/finetuned-codellama")
tokenizer.save_pretrained("/content/finetuned-codellama")

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mbansal-jayant[0m ([33mbansal-jayant-elevatix[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin




Step,Training Loss
10,2.6594
20,1.9649
30,1.2741
40,1.2005
50,1.1606
60,1.157
70,1.1057
80,1.0851
90,1.0764
100,1.0525


Step,Training Loss
10,2.6594
20,1.9649
30,1.2741
40,1.2005
50,1.1606
60,1.157
70,1.1057
80,1.0851
90,1.0764
100,1.0525


('/content/finetuned-codellama/tokenizer_config.json',
 '/content/finetuned-codellama/special_tokens_map.json',
 '/content/finetuned-codellama/tokenizer.model',
 '/content/finetuned-codellama/added_tokens.json',
 '/content/finetuned-codellama/tokenizer.json')

In [1]:
from transformers import pipeline
import torch

# Clear GPU memory if possible
torch.cuda.empty_cache()

# Load pipeline with explicit quantization
generator = pipeline(
    "text-generation",
    model="/content/finetuned-codellama",
    device=0,  # GPU
    torch_dtype=torch.bfloat16  # Match training dtype
)

prompt = "[Prompt]: Write a Python function to reverse a string [Code]:"
result = generator(prompt, max_length=50, num_return_sequences=1)[0]["generated_text"]
code = result.split("[Code]:")[1].strip()
print("Generated code:", code)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Generated code: ```python
def reverse(string):
    return string[::-1]
```

[Prompt]: Write a Python function to check if


In [1]:
!pip install torch transformers accelerate gradio
import gradio as gr
from transformers import pipeline
import torch

# Clear GPU memory
torch.cuda.empty_cache()

generator = pipeline(
    "text-generation",
    model="/content/finetuned-codellama",
    device=0,
    torch_dtype=torch.bfloat16
)

def generate(prompt):
    result = generator(f"[Prompt]: {prompt} [Code]:", max_length=50, num_return_sequences=1)
    return result[0]["generated_text"].split("[Code]:")[1].strip()

interface = gr.Interface(
    fn=generate,
    inputs=gr.Textbox(label="Enter a prompt (e.g., 'Write a function to sum a list')"),
    outputs=gr.Textbox(label="Generated Python Code"),
    title="Python Code Generator",
    description="Fine-tuned CodeLLaMA 7B in Colab!"
)
interface.launch(share=True)



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://ecab5d8051d7454fc5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


