<a href="https://colab.research.google.com/github/lizhieffe/llm_knowledge/blob/main/kaggle/neurips_2025_google_code_golf_championship/vLLM_Inference_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This colab is for the championship: https://www.kaggle.com/competitions/google-code-golf-2025/overview

This colab run bulk inference using vLLM, and store the generated code into GCS.

# TLDR

## Model Quality Comparison

The 400 data points eval results are used to measure the quality.

> Note: most of the model names are for Ollama; you can map to the equivalent vLLM (HF) model names yourself.

| Model | thinking enabled | Valid Code Rate | Correct Code Rate |
| :--- | :--- | :--- | :--- |
| **W3S** | True | 0.78 | 0.06 |
| **W3M** | True | 0.47 | 0.04 |
| **qwen2.5-coder:0.5b** | True | 0.04 | 0.00 |
| **qwen2.5-coder:1.5b** | True | 0.15 | 0.00 |
| **qwen2.5-coder:7b** | True | 0.34 | 0.01 |
| **qwen2.5-coder:14b** | True | 0.39 | 0.03 |
| **qwen3-coder:30b** | True | 0.61 | 0.11 |
| **deepseek-coder:1.3b** | True | 0.00 | 0.00 |
| **deepseek-coder:6.7b** | True | 0.19 | 0.01 |
| **deepseek-coder:33b** | True | 0.20 | 0.05 |
| **cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit** (FP8 KV quant) | True | 0.51 | 0.07 |
| **cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit** (FP8 KV quant) | False | 0.53 | 0.06 |


## Model Feasibility Comparison

| Model | `max_model_len` | `kv_cache_dtype` | GPU | Result |
| :--- | :--- | :--- | :--- | :--- |
| **Qwen/Qwen3-4B** | `default` | `default` | L4 | ✅ Works |
| | `default` | `default` | A100 | ✅ Works |
| **Qwen/Qwen3-8B** | `default` | `default` | L4 | ❌ Doesn't Work |
| | `default` | `default` | A100 | ✅ Works |
| | `4096` | `default` | L4 | ✅ Works |
| **Qwen/Qwen3-14B** | `default` | `default` | L4 | ❌ Doesn't Work |
| | `default` | `default` | A100 | ✅ Works |
| | `2048` | `default` | L4 | ❌ Doesn't Work |
| **Qwen/Qwen3-32B** | `3072` | `default` | A100 | ❌ Doesn't Work |
| **Qwen3-Coder-30B-FP8** | `3072` | `default` | A100 | ✅ Works |
| | `4096` | `default` | A100 | ❌ Doesn't Work |
| **Qwen3-Coder-30B-AWQ** | `4K` | `default` | L4 | ✅ Works |
| | `5K` | `default` | L4 | ❌ Doesn't Work |
| | `12K` | `fp8` | L4 | ✅ Works |
| | `16K` | `fp8` | L4 | ❌ Doesn't Work |

## Performance

| `MODEL_ID` | `GPU` | `Engine` | `MAX_MODEL_LEN` | `kv_cache_dtype` | Thinking | Speed | Throughput |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| `cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit` | L4 | vLLM | `8192` | `fp8` | True | `7s/it` | `200 toks/s` |
| `Qwen/Qwen3-32B` | L4 | `Ollama` | `default` | `N/A` | True | `43s/it` | `?` |
| `cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit` | A100 | vLLM | `8192` | `fp8` | True | `15s/it` | `122 toks/s` |
| `cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit` | A100 | vLLM | `8192` | `fp8` | False | `7.3s/it` | `122 toks/s` |



In [None]:
from typing import Sequence

# Public Interface

In [None]:
from typing import Any

import dataclasses
import numpy as np


@dataclasses.dataclass
class DataPoint:
  # A list of (input, output) tuples
  # train: list[tuple[np.ndarray, np.ndarray]]
  train: list[tuple[Sequence[Sequence[int]], Sequence[Sequence[int]]]]
  train_raw: list[dict]

  # (input, output) tuple
  # test: list[tuple[np.ndarray, np.ndarray]]
  test: list[tuple[Sequence[Sequence[int]], Sequence[Sequence[int]]]]
  test_raw: list[dict]

  json_dict: dict[str, Any]

# Download data and load

In [None]:
# @title Setup Kaggle credential

# Option 1 - Load kaggle secret from the Colab's Secrets
#
# This requires to save the download kaggle secrect json file's content to the colab's Secrets with name "kaggle"
import os
from google.colab import userdata
kaggle_secret_json = userdata.get('kaggle')
os.environ['env_var_kaggle_secret_json'] = kaggle_secret_json

!mkdir -p ~/.kaggle/ && > ~/.kaggle/kaggle.json && echo $env_var_kaggle_secret_json >> ~/.kaggle/kaggle.json && chmod 600 ~/.kaggle/kaggle.json

# Option 2 - Upload kaggle secret from local file
#
# from google.colab import files

# uploaded = files.upload()

# for fn in uploaded.keys():
#   print('User uploaded file "{name}" with length {length} bytes'.format(
#       name=fn, length=len(uploaded[fn])))

# # Then move kaggle.json into the folder where the API expects to find it.
# !mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

In [None]:
# @title Download data

%%capture


import os

base_path = "/content/google-code-golf-2025"

if not os.path.isdir(base_path):
  !pip install --user kaggle
  !kaggle competitions download -c google-code-golf-2025
  !unzip /content/google-code-golf-2025.zip -d /content/google-code-golf-2025/

In [None]:
# @title Parse the data

import json
import os

import numpy as np

print(f"Attempting to list files in: {base_path}")

# This returns a list of filename strings. It doesn't include the path.
files = os.listdir(base_path)
json_file_paths = [
    os.path.join(base_path, f) for f in files if f.endswith(".json")
]
print(f"Found {len(json_file_paths)} json files")

def extract_data_point(json_filepath: str) -> DataPoint:
  """Extract DataPoint from a json file.

  Args:
    json_filepath: The path to the json file.

  Returns:
    A DataPoint object.
  """
  with open(json_filepath, "rt") as my_file:
    content = my_file.read()
    json_dict = json.loads(content)

    train_val = json_dict["train"]
    all_train = [
        (it["input"], it["output"]) for it in train_val
    ]
    train_raw = train_val

    test_val = json_dict["test"]
    all_test = [
        (it["input"], it["output"]) for it in test_val
    ]
    test_raw = test_val

    return DataPoint(
        train=all_train, train_raw=train_raw, test=all_test, test_raw=test_raw, json_dict=json_dict
    )


import tqdm
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor(
    max_workers=min(512, len(json_file_paths)), thread_name_prefix="Worker"
) as executor:
  # The map() function is the key.
  # It applies 'worker_task' to each item in 'items_to_process'.
  # It automatically collects the results and returns them as an iterator.
  data_points = list(
      tqdm.tqdm(
          executor.map(extract_data_point, json_file_paths),
          total=len(json_file_paths),
      )
  )
  print(f"{len(data_points)=}")
  print(f"An example of the data points {data_points[0]}")

In [None]:
# @title Install vLLM and dependencies

%%capture

# The 0.10 version of vLLM doesn't load model successfully. So we use 0.9.2 version

# Similar bugs
#   https://github.com/vllm-project/vllm/issues/17618
!pip install vllm==0.9.2 lm-format-enforcer pandas

# This is needed to be compatible with vLLM 0.9.2.
#
# For issue: https://github.com/vllm-project/vllm/issues/17618
!pip install "transformers<4.54.0"

!pip show vllm

# Generation libs

In [None]:
# @title Visualization libs

import matplotlib.pyplot as plt
import seaborn as sns # Optional, for slightly nicer colorbars and default styles

def visualize_np_array(np_array):
    """
    Visualizes a 2D NumPy array (rectangular matrix of integers between 0 and 9)
    as a heatmap.

    Args:
        np_array (np.ndarray): The NumPy array to visualize.
                               Expected shape: (rows, cols)
                               Expected values: integers between 0 and 9.
    """
    if not isinstance(np_array, np.ndarray):
        np_array = np.array(np_array)

    if np_array.ndim != 2:
        print(f"Error: Input array must be 2-dimensional, but has {np_array.ndim} dimensions.")
        return

    rows, cols = np_array.shape

    if not (1 <= rows <= 30 and 1 <= cols <= 30):
        print(f"Error: Array dimensions ({rows}x{cols}) are outside the allowed range (1x1 to 30x30).")
        return

    # Check if all values are integers between 0 and 9
    if not (np.all(np_array >= 0) and np.all(np_array <= 9) and np.all(np_array == np_array.astype(int))):
        print("Warning: Array contains values outside the 0-9 integer range. Visualization might be misleading.")
        # Attempt to cast to int to prevent issues with imshow expecting numeric data
        np_array = np_array.astype(int)

    # Set up the plot
    image_zoom_factor = 1.0 if rows <= 10 and cols <= 10 else 0.5
    plt.figure(figsize=(cols * image_zoom_factor, rows * image_zoom_factor)) # Adjust figure size dynamically for better aspect ratio

    # Use seaborn's heatmap for a more aesthetically pleasing visualization
    # 'cmap' defines the color map. 'viridis' is a good default for sequential data.
    # 'RdYlGn' (Red-Yellow-Green) or 'Greens' are also good options.
    # 'annot=True' will display the value in each cell (useful for small grids)
    # 'fmt="d"' ensures the annotation is an integer
    # 'cbar=True' shows the color bar
    # 'linewidths' and 'linecolor' add borders between cells
    sns.heatmap(np_array, annot=True, fmt="d", cmap="viridis", cbar=True,
                linewidths=0.5, linecolor='black', vmin=0, vmax=9)

    plt.title(f'Visualization of {rows}x{cols} NumPy Array')
    plt.xlabel('Column Index')
    plt.ylabel('Row Index')
    plt.xticks(np.arange(cols) + 0.5, labels=np.arange(cols)) # Center ticks
    plt.yticks(np.arange(rows) + 0.5, labels=np.arange(rows)) # Center ticks
    plt.gca().invert_yaxis() # Invert y-axis to have (0,0) at top-left like typical arrays

    plt.show()

array = np.array([[0, 7, 7], [7, 7, 7], [0, 7, 7]])
visualize_np_array(array)

In [None]:
# @title Prompt Libs
SYSTEM_TURN = """You are a principle software engineer.

"""


USER_TURN_PREFIX = """You should implement a Python function to do a transformation, which is implicitly described by pairs of <input, output> image grids. The transformation may include rotation, cropping, magnification, etc. Your code should achieve the desired result across all exemplars, and uses the fewest possible number of characters.

A "grid" is a rectangular matrix (list of lists) of integers between 0 and 9 (inclusive). The size is between 1x1 and the 30x30.

The function name is "fn". Its signature is `fn(input: typing.Sequence[typing.Sequence[int]]) -> typing.Sequence[typing.Sequence[int]]:`.

Do NOT use any external library!

Examplars:

"""


def build_prompt(data_point: DataPoint) -> list[dict[str, str]]:
  user_turn = USER_TURN_PREFIX

  # Remove the whitespace in list of integers string to save tokens.
  grid_pairs_str = str(data_point.train_raw)
  len_before = len(grid_pairs_str)
  grid_pairs_str = grid_pairs_str.replace(", ", ",")
  user_turn += grid_pairs_str

  messages = [
      {"role": "system", "content": SYSTEM_TURN},
      {"role": "user", "content": user_turn},
  ]
  return messages


# Test
conversation = build_prompt(data_points[0])
print(f"Prompt = {conversation}")

In [None]:
# @title Generate Code Libs

def extract_code_from_resp(resp_str: str) -> str:
  prefix = "```python"
  suffix = "```"
  while True:
    prefix_idx = resp_str.find(prefix)
    if prefix_idx != -1:
      resp_str = resp_str[prefix_idx + len(prefix) :]
    else:
      break
  suffix_idx = resp_str.find(suffix)
  if suffix_idx != -1:
    resp_str = resp_str[: suffix_idx]
  return resp_str.strip()

In [None]:
# @title JIT Python execution libs

import typing
import numpy as np


def exec_and_ret(code_str: str, input: np.ndarray) -> typing.Any:
  """Execute a string of code and return the result.

  The result must be assigned to a variable named 'ret'.

  Args:
    code_str: A string of code.
    input: the input to the code.

  Returns:
    The result of the code.
  """
  loc = {}
  exec(code_str, {'input': input}, loc)
  return loc['ret']


# # Test
# code_str = """ret = 1 + 2"""
# assert 3 == exec_and_ret(code_str, input)

In [None]:
# @title Verification libs
!pip install func-timeout

from typing import Any
from func_timeout import func_timeout, FunctionTimedOut


def verify_code_dp(
    code_str: str, data_point: DataPoint, visualize: bool = False, timeout_seconds:int = 30
) -> tuple[bool, Any, str|None]:
  """Verify the given code on its expected result."""
  assert data_point.test
  # inp = data_point.test[0][0].tolist()
  # expected_output = data_point.test[0][1].tolist()
  inp = data_point.test[0][0]
  expected_output = data_point.test[0][1]
  return verify_code(code_str, inp, expected_output, visualize)

def execute_with_timeout(
    code_str: str, inp: Sequence[Sequence[int]], timeout_seconds:int = 10
) -> tuple[Any|None, str|None]:
  """Execute the code.

  Args:
    code_str: the str of the code.
    inp: the input to the code.
    timeout_seconds: the timeout in seconds.

  Returns:
    [0]: the code's output.
    [1]: if the code is not executed successfully, return the err msg.
  """

  # This is needed to emit the return of the execution.
  code_str += """
ret = fn(input)"""

  # Execute the code with a timeout to prevent culprit code never returns.
  try:
    # The return value of successful_function is captured in 'result'
    output = func_timeout(timeout_seconds, exec_and_ret, args=(code_str, inp))
    return output, None
  except FunctionTimedOut:
    err_msg = f"🛑 The function was killed after {timeout_seconds} seconds.\n"
    return None, err_msg
  except Exception as e:
    stack_trace = traceback.extract_stack()
    err_msg = f"Error: {e}. Stack trace: {stack_trace}"
    return None, err_msg


def verify_code(
    code_str: str, inp: Sequence[Sequence[int]], expected_output: Sequence[Sequence[int]], visualize: bool = False
) -> tuple[bool, Any, str|None]:
  """Verify the code.

  Args:
    code_str: the str of the code.
    inp: the input to the code.
    expected_output: the expected output.
    visualize: whether to visualize the execution output.

  Returns:
    [0]: whether the execution output matches the expected output.
    [1]: the code's output.
    [2]: if the code is not executed successfully, return the err msg.
  """
  output, err_msg = execute_with_timeout(code_str, inp)
  if output is None:
    return False, output, err_msg

  assert err_msg is None
  try:
    if visualize:
      visualize_np_array(output)
      visualize_np_array(expected_output)
    output_np = np.array(output)
    expected_output_np =np.array(expected_output)
    if output_np.shape != expected_output_np.shape:
      return False, output, err_msg
    is_match = np.allclose(np.array(output), np.array(expected_output))
    return is_match, output, err_msg
  except Exception as e:
    stack_trace = traceback.extract_stack()
    err_msg = f"Error: {e}. Stack trace: {stack_trace}"
    return False, None, err_msg

# # Test
# is_match, output, expected_output = verify_code_dp(
#     """def fn(inp):
#       return 1""",
#     data_points[0],
# )
# print(is_match)

In [None]:
# @title Storage Libs

from typing import Mapping, Sequence

import os

def persist_data_locally(file_path: str, data: Sequence[Mapping[str, Any]]) -> None:
  """Persist the data to a file and store the data in jsonl format.

  Args:
    file_path: the path to the file
    data: the data to store. Each element is stored as a json.
  """
  # Create dir if necessary
  base_path = os.path.dirname(file_path)
  if not os.path.exists(base_path):
    os.makedirs(base_path)

  # Will overwrite the file content.
  with open(file_path, 'w', encoding='utf-8') as f:
    for entry in data:
      # Convert dictionary to a JSON string
      json_str = json.dumps(entry)
      # Write the JSON string to the file, followed by a newline
      f.write(json_str + '\n')
    f.flush()

# Unit Test
data = [{
    "1": "2",
    "3": "4",
},
        {
    "1": "2",
    "3": "45",
}]
test_file_path = "/content/tmp/test.jsonl"
persist_data_locally(test_file_path, data)

parsed_data = []
with open(test_file_path, 'r', encoding='utf-8') as f:
  for line in f:

    parsed_data.append(json.loads(line.strip()))

assert data == parsed_data

# Run evals

In [None]:
# @title Inference Libs

def batch_inference(llm, data_points: list[DataPoint], enable_thinking: bool = True) -> Sequence[tuple[Mapping[str, str], str]]:
  """Run batch inference.

  Args:
    llm: the vllm instance.
    data_points: the data to run inference.
    enable_thinking: whether to enable thinking mode.

  Returns:
    [0]: the input prompt.
    [1]: the inference result.
  """
  # Process input
  formatted_prompts_tokenized = []

  tokenizer = llm.get_tokenizer()

  input_cutoff_len = MAX_MODEL_LEN * 3 // 4
  examples_over_max_ctx_length = []
  prompts = []

  for dp in data_points:
      messages = build_prompt(dp)
      prompts.append(messages)

      input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, enable_thinking=enable_thinking)

      if len(input_ids) > input_cutoff_len:
        # print(f"⚠️ Warning: Prompt is too long ({len(input_ids)} tokens). Truncating to {input_cutoff_len} tokens.")
        examples_over_max_ctx_length.append(input_ids)

        # Keep the most recent tokens by slicing from the end
        input_ids = input_ids[-input_cutoff_len:]

      formatted_prompts_tokenized.append(input_ids)
  print(f"There are {len(examples_over_max_ctx_length)} examples over the max context length limit {input_cutoff_len}!")

  # Inference
  sampling_params = vllm.SamplingParams(temperature=0.8, top_p=0.95, max_tokens=2048)
  tokens_prompts = [vllm.inputs.TokensPrompt(prompt_token_ids=it) for it in formatted_prompts_tokenized]

  inference_start = time.time()
  outputs = llm.generate(prompts=tokens_prompts, sampling_params=sampling_params)
  inference_end = time.time()

  inference_time = inference_end - inference_start
  print(f"Elapsed time: {inference_time:.1f} seconds")

  # Process output
  generated_codes = []
  for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    generated_code = extract_code_from_resp(generated_text)
    generated_codes.append(generated_code)
  return [(p, gc) for p, gc in zip(prompts, generated_codes)]

def persist_inference_results(results: list[str], fname: str):
  pass


In [None]:
# @title Eval Libs

import time
import traceback

def batch_eval(llm, data_points: list[DataPoint], generated_codes: list[str]) -> list[bool, str]:
  """Run batch eval.

  Args:
    llm: the llm.
    data_points: the input data points.
    generated_codes: the generated codes.

  Returns:
    [0]: whether the generated code pass the test.
    [1]: if the code fails to run, the corresponding err msg.
  """

  # Eval the outputs
  eval_results = []
  for code, dp in zip(generated_codes, data_points):
    is_match, _, err_msg = verify_code_dp(code, dp)
    eval_results.append((is_match, err_msg))

  return eval_results

def generate_fixed_length_int_str(number: int, target_length: int = 6):
  """
  Generates a fixed-length string (6) of digits with leading zeros.

  Args:
    number: An integer.

  Returns:
    A string of length 6, with leading zeros and the given number at the end.
    Returns an error message if the number is too large to fit in 6 digits.
  """
  if not isinstance(number, int) or number < 0:
    raise ValueError("Input must be a non-negative integer.")

  str_number = str(number)

  if len(str_number) > target_length:
    raise ValueError(f"Number {number} is too large to fit in {target_length} digits.")

  # Pad with leading zeros
  padded_string = str_number.zfill(target_length)
  return padded_string

In [None]:
# @title Download model to local files

from huggingface_hub import snapshot_download


MODEL_ID = "cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit"

# model_id = "meta-llama/Llama-2-7b-chat-hf"
local_model_path = f"./{MODEL_ID.split('/')[-1]}" # e.g., ./Llama-2-7b-chat-hf

if os.path.exists(local_model_path):
  print(f"The model is already on disk. Skip downloading ... {local_model_path}")
else:
  print(f"Downloading model to {local_model_path}...")
  snapshot_download(
      repo_id=MODEL_ID,
      local_dir=local_model_path,
      local_dir_use_symlinks=False # Set to False to download files directly
  )

In [None]:
# @title Initialize the vLLM from local files

import vllm
import os

MAX_MODEL_LEN = 1024 * 8

import torch

llm = vllm.LLM(
    model=local_model_path,
    trust_remote_code=True,
    max_model_len=MAX_MODEL_LEN,
    gpu_memory_utilization=0.95,
    kv_cache_dtype="fp8",
  )


In [None]:
# @title Setup GCS

from google.cloud import storage
from google.colab import userdata
import datetime


# Get Service Account credential
data_stroage_secrect = userdata.get('gdrive_data_storage_service_account')

data_storage_secret_file = "/content/data_storage_secret.json"

with open(data_storage_secret_file, 'w') as f:
  f.write(data_stroage_secrect)

# Set credential
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = data_storage_secret_file

# Connect to the bucket
storage_client = storage.Client()
bucket_name = 'lizhi_general_storage'
bucket = storage_client.bucket(bucket_name)
print(f"Accessing bucket: {bucket.name}")

def persist_data_to_gcs(bucket, data: Sequence[Mapping[str, Any]], filename: str):
  content = ""
  for entry in data:
    # Convert dictionary to a JSON string
    json_str = json.dumps(entry)
    # Write the JSON string to the file, followed by a newline
    content += (json_str + "\n")

  # Get the current date and time

  blob_dir = 'neurips_2025_google_code_golf_championship/inference_outputs'
  blob_path = os.path.join(blob_dir, filename)
  blob = bucket.blob(blob_path)

  blob.upload_from_string(content)
  print(f"'{blob_path}' uploaded to bucket '{bucket_name}'.")

In [None]:
# @title Run eval

import datetime
import time

print(f"Evaluating **{MODEL_ID}** ...\n")

CHUNK_SIZE = 50

data_points_split = data_points[:300]
for epoch in range(10):
  print(f"====================== Starting epoch {epoch+1} ======================")

  # Inference
  inference_dps = data_points_split[:]
  dps_chunks = [inference_dps[i:i + CHUNK_SIZE] for i in range(0, len(inference_dps), CHUNK_SIZE)]
  inference_outputs = []
  inference_start = time.time()

  now = datetime.datetime.now()
  inference_outputs_filename_prefix = "inference_outputs_" + now.strftime("%Y%m%d%H%M%S")

  for batch, dps_chunk in enumerate(dps_chunks):
    print(f"Inferencing batch {batch} ...")
    inference_outputs_chunk = batch_inference(llm, dps_chunk, enable_thinking=False)
    inference_outputs.extend(inference_outputs_chunk)

    # Persist inference results
    predictions = []
    for dp, (prompt, gc) in zip(dps_chunk, inference_outputs_chunk):
      predictions.append({
          "train": dp.train_raw,
          "test": dp.test_raw,
          "prompt": prompt,
          "pred": gc,
      })

    filename = inference_outputs_filename_prefix + f"_{generate_fixed_length_int_str(batch)}.jsonl"

    # # Persist to the local Colab fs
    # inference_results_path = "/content/data/prediction_data.jsonl"
    # persist_data_locally(os.path.join("/content/data/", filename), predictions)
    # print(f"Inference data is stored to {inference_results_path}")

    # Persist to GCS
    persist_data_to_gcs(bucket, predictions, filename)

  inference_end = time.time()
  print(f"Inference took {(inference_end - inference_start) / 60:.1f}mins")


  # Eval
  generated_codes = [it for (_, it) in inference_outputs]
  eval_results = batch_eval(llm, data_points_split, generated_codes)

  is_matches = [result[0] for result in eval_results]
  errors = [result[1] for result in eval_results]

  num_valid_code = errors.count(None)
  num_invalid_code = errors.count(not None)
  num_correct_code = sum(is_matches)
  num_incorrect_code = num_valid_code - num_correct_code

  print(
      f"Valid code rate = {num_valid_code / len(eval_results):.2f}"
  )
  print(
      f"Correct code rate ="
      f" {num_correct_code / len(eval_results):.2f}"
  )
  print("=" * 80 + "\n")

# [Optional] Delete vLLM instance and release VRAM

> [!WARNING]
> This approach is not reliable in Colab.



In [None]:
%%script true

import gc
import torch

try:
  del llm
except NameError as e:
  pass

gc.collect()

# Deletes all unused tensors
torch.cuda.empty_cache()