# Python → C++ via LLMs

This notebook uses **OpenRouter** to call multiple LLMs that convert Python code to C++. Each model generates C++ from the same Python snippet; we then **compile and run** each result and **rank models by execution time**, including how much faster the C++ is than the original Python.

In [46]:
import os
import io
import sys
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr
import subprocess
from IPython.display import Markdown, display

In [47]:
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')
if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:6]}")

OpenRouter API Key exists and begins sk-or-


Leaderboard coding:

- Gemini 3 pro preview
- codex 5.2
- Claude opus 4.6
- Gemini 3 flash
- Kimi K2.5
- GLM 5

In [48]:
models = ['google/gemini-3-pro-preview', 'openai/gpt-5.2-codex', 'anthropic/claude-opus-4.6', 'google/gemini-3-flash-preview', 'moonshotai/kimi-k2.5', 'z-ai/glm-5']


## 2. Models & OpenRouter client

Models to compare (coding leaderboard). The client talks to OpenRouter’s API so we can call any of these with the same interface.

In [49]:
# OpenRouter client (OpenAI-compatible API)
openrouter = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=openrouter_api_key,
)
OPENROUTER_MODEL = models[0]  # e.g. 'google/gemini-3-pro-preview'

In [31]:
from system_info import retrieve_system_info

system_info = retrieve_system_info()
system_info

{'os': {'system': 'Darwin',
  'arch': 'x86_64',
  'release': '21.6.0',
  'version': 'Darwin Kernel Version 21.6.0: Mon Jun 24 00:56:10 PDT 2024; root:xnu-8020.240.18.709.2~1/RELEASE_X86_64',
  'kernel': '21.6.0',
  'distro': None,
  'wsl': False,
  'rosetta2_translated': False,
  'target_triple': 'x86_64-apple-darwin21.6.0'},
 'package_managers': ['xcode-select (CLT)'],
 'cpu': {'brand': 'Intel(R) Core(TM) i5-5350U CPU @ 1.80GHz',
  'cores_logical': 4,
  'cores_physical': 2,
  'simd': ['AVX2', 'FMA']},
 'toolchain': {'compilers': {'gcc': 'Apple clang version 14.0.0 (clang-1400.0.29.202)',
   'g++': 'Apple clang version 14.0.0 (clang-1400.0.29.202)',
   'clang': 'Apple clang version 14.0.0 (clang-1400.0.29.202)',
   'msvc_cl': ''},
  'build_tools': {'cmake': '', 'ninja': '', 'make': 'GNU Make 3.81'},
  'linkers': {'ld_lld': ''}}}

In [50]:
# Commands used to build and run generated C++ (passed to the LLM in the prompt)
compile_command = ["clang++", "-std=c++17", "-Ofast", "-mcpu=native", "-flto=thin", "-fvisibility=hidden", "-DNDEBUG", "main.cpp", "-o", "main"]
run_command = ["./main"]

In [51]:
# Verify C++ toolchain on this computer
_verify_cpp = """
#include <iostream>
int main() {
    std::cout << "C++ is running on this computer.\\n";
    return 0;
}
"""
with open("_verify_cpp.cpp", "w", encoding="utf-8") as f:
    f.write(_verify_cpp)
print("Compiler:", subprocess.run(["clang++", "--version"], capture_output=True, text=True).stdout.split("\\n")[0])
result = subprocess.run(["clang++", "-std=c++17", "_verify_cpp.cpp", "-o", "_verify_cpp_exe"], capture_output=True, text=True, timeout=10)
if result.returncode != 0:
    print("Compile failed:", result.stderr)
else:
    run_result = subprocess.run(["./_verify_cpp_exe"], capture_output=True, text=True, timeout=5)
    print("Run output:", run_result.stdout.strip())
    print("✓ C++ toolchain OK" if run_result.returncode == 0 else "Run failed")

Compiler: Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Run output: C++ is running on this computer.
✓ C++ toolchain OK


In [52]:
# Prompt used for Python→C++ conversion (LLM sees this as system message)
system_prompt = """
Your task is to convert Python code into high performance C++ code.
Respond only with C++ code. Do not provide any explanation other than occasional comments.
The C++ response needs to produce an identical output in the fastest possible time.
"""

def user_prompt_for(python):
    return f"""
Port this Python code to C++ with the fastest possible implementation that produces identical output in the least time.
The system information is:
{system_info}
Your response will be written to a file called main.cpp and then compiled and executed; the compilation command is:
{compile_command}
Respond only with C++ code.
Python code to port:

```python
{python}
```
"""

### Verify C++ on this computer

Quick check that `clang++` is available and can compile/run a small C++ program.

In [53]:
# Build chat messages and call API; port() also writes main.cpp and returns the C++ string
def messages_for(python):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(python)}
    ]
 
def write_output(cpp):
    with open("main.cpp", "w", encoding="utf-8") as f:
        f.write(cpp)

def port(client, model, python):
    reasoning_effort = "high" if 'gpt' in model else None
    response = client.chat.completions.create(model=model, messages=messages_for(python), reasoning_effort=reasoning_effort)
    reply = response.choices[0].message.content
    reply = reply.replace('```cpp','').replace('```','')
    write_output(reply)
    return reply

In [54]:
# Sample Python: pi via series (long enough to measure run time)
pi = """
import time

def calculate(iterations, param1, param2):
    result = 1.0
    for i in range(1, iterations+1):
        j = i * param1 - param2
        result -= (1/j)
        j = i * param1 + param2
        result += (1/j)
    return result

start_time = time.time()
result = calculate(200_000_000, 4, 1) * 4
end_time = time.time()

print(f"Result: {result:.12f}")
print(f"Execution Time: {(end_time - start_time):.6f} seconds")
"""


In [55]:
def run_python(code):
    globals = {"__builtins__": __builtins__}
    exec(code, globals)

In [56]:
run_python(pi)

Result: 3.141592656089
Execution Time: 79.472572 seconds


### Single-model conversion (optional)

The snippet we’ll convert: a **π calculation** that runs long enough to measure execution time. The same code is used as the Python baseline and as input for every model.

In [38]:
port(openrouter, OPENROUTER_MODEL, pi)

## 6. Benchmark: rank models by C++ speed

Runs the **Python baseline** once, then for **each model**: generate C++, compile, run, and time. Results are sorted by C++ execution time; we also report **how many times faster** each C++ is than the Python version.

In [57]:
# Time Python once, then each model: generate C++, compile, run, rank by C++ time and speedup vs Python
import time as time_module

PYTHON_BASELINE_PATH = "_bench_python.py"

def time_python(python_code):
    """Run the Python code in a subprocess and return execution time in seconds."""
    with open(PYTHON_BASELINE_PATH, "w", encoding="utf-8") as f:
        f.write(python_code)
    t0 = time_module.perf_counter()
    subprocess.run([sys.executable, PYTHON_BASELINE_PATH], check=True, capture_output=True, timeout=600)
    return time_module.perf_counter() - t0

def benchmark_models(python_code=pi, models_to_run=None):
    """
    1. Time the original Python code (baseline).
    2. Each model in models_to_run (or all models) generates C++; we compile, run, and time each.
    3. Rank by C++ execution time and show speedup vs Python.
    """
    if models_to_run is None:
        models_to_run = models
    print("=== Python → C++ benchmark" + (f" (retry {len(models_to_run)} models)" if models_to_run is not models else "") + " ===\n")
    print("Timing Python baseline ...", end=" ", flush=True)
    python_time = time_python(python_code)
    print(f"{python_time:.3f}s\n")

    results = []  # (model, cpp_time_seconds, success, error_msg)
    base_compile = ["clang++", "-std=c++17", "-Ofast", "-mcpu=native", "-flto=thin", "-fvisibility=hidden", "-DNDEBUG"]

    for i, model in enumerate(models_to_run):
        slug = model.replace("/", "_")
        cpp_path = f"generated_{slug}.cpp"
        exe_name = f"main_{slug}"
        print(f"[{i+1}/{len(models_to_run)}] {model} ...", end=" ", flush=True)
        try:
            cpp = port(openrouter, model, python_code)
            with open(cpp_path, "w", encoding="utf-8") as f:
                f.write(cpp)
            compile_cmd = base_compile + [cpp_path, "-o", exe_name]
            subprocess.run(compile_cmd, check=True, text=True, capture_output=True, timeout=60)
            t0 = time_module.perf_counter()
            subprocess.run([f"./{exe_name}"], check=True, text=True, capture_output=True, timeout=300, cwd=os.getcwd())
            elapsed = time_module.perf_counter() - t0
            results.append((model, elapsed, True, None))
            speedup = python_time / elapsed
            print(f"{elapsed:.3f}s  ({speedup:.1f}× faster than Python)")
        except subprocess.TimeoutExpired:
            results.append((model, float("inf"), False, "timeout"))
            print("timeout")
        except subprocess.CalledProcessError as e:
            results.append((model, float("inf"), False, (e.stderr or e.stdout or str(e))[:200]))
            print("compile/run failed")
        except Exception as e:
            results.append((model, float("inf"), False, str(e)[:200]))
            print("error")

    results.sort(key=lambda x: x[1])
    return python_time, results

# Run benchmark: Python baseline + each model's C++ → rank by speed
python_time, ranking = benchmark_models(pi)
print("\n" + "="*60)
print("Python baseline:", f"{python_time:.3f}s")
print("\n--- Ranking: C++ generated by each model (fastest first) ---")
for rank, (model, cpp_t, ok, err) in enumerate(ranking, 1):
    if ok:
        speedup = python_time / cpp_t
        print(f"  {rank}. {model}")
        print(f"     C++ time: {cpp_t:.3f}s  →  {speedup:.1f}× faster than Python")
    else:
        print(f"  {rank}. {model}: failed")
if ranking and ranking[0][1] != float("inf"):
    best_model, best_t, _, _ = ranking[0]
    print(f"\nWinner: {best_model} — {python_time/best_t:.1f}× faster than Python ({python_time:.3f}s → {best_t:.3f}s)")

=== Python → C++ benchmark: which model generates the fastest code? ===

Timing Python baseline ... 65.482s

[1/6] google/gemini-3-pro-preview ... compile/run failed
[2/6] openai/gpt-5.2-codex ... 2.226s  (29.4× faster than Python)
[3/6] anthropic/claude-opus-4.6 ... compile/run failed
[4/6] google/gemini-3-flash-preview ... 1.371s  (47.8× faster than Python)
[5/6] moonshotai/kimi-k2.5 ... compile/run failed
[6/6] z-ai/glm-5 ... 1.634s  (40.1× faster than Python)

Python baseline: 65.482s

--- Ranking: C++ generated by each model (fastest first) ---
  1. google/gemini-3-flash-preview
     C++ time: 1.371s  →  47.8× faster than Python
  2. z-ai/glm-5
     C++ time: 1.634s  →  40.1× faster than Python
  3. openai/gpt-5.2-codex
     C++ time: 2.226s  →  29.4× faster than Python
  4. google/gemini-3-pro-preview: failed
  5. anthropic/claude-opus-4.6: failed
  6. moonshotai/kimi-k2.5: failed

Winner: google/gemini-3-flash-preview — 47.8× faster than Python (65.482s → 1.371s)


In [45]:
def compile_and_run():
    subprocess.run(compile_command, check=True, text=True, capture_output=True)
    print(subprocess.run(run_command, check=True, text=True, capture_output=True).stdout)
    print(subprocess.run(run_command, check=True, text=True, capture_output=True).stdout)
    print(subprocess.run(run_command, check=True, text=True, capture_output=True).stdout)

In [39]:
def run_python(code):
    globals_dict = {"__builtins__": __builtins__}

    buffer = io.StringIO()
    old_stdout = sys.stdout
    sys.stdout = buffer

    try:
        exec(code, globals_dict)
        output = buffer.getvalue()
    except Exception as e:
        output = f"Error: {e}"
    finally:
        sys.stdout = old_stdout

    return output

In [40]:
def compile_and_run():
    try:
        subprocess.run(compile_command, check=True, text=True, capture_output=True)
        print(subprocess.run(run_command, check=True, text=True, capture_output=True).stdout)
        print(subprocess.run(run_command, check=True, text=True, capture_output=True).stdout)
        print(subprocess.run(run_command, check=True, text=True, capture_output=True).stdout)
    except subprocess.CalledProcessError as e:
        print(f"An error occurred:\n{e.stderr}")

In [41]:
def port(model: str, python_code: str) -> str:
    """Convert Python code to C++ using the selected OpenRouter model."""
    if not python_code or not python_code.strip():
        return ""
    client = OpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key=openrouter_api_key,
    )
    import json
    sys_str = json.dumps(system_info, indent=2)
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": f"You are a code conversion expert. Convert the user's Python code to idiomatic C++ that compiles on the target system. Only output the C++ code, no explanation. Target system info:\n{sys_str}",
            },
            {"role": "user", "content": python_code},
        ],
    )
    return (response.choices[0].message.content or "").strip()

### Retry only failed models

Run this cell to re-run **only** the models that failed (compile/run failed). Edit `failed_models` to match the ones you want to retry.

In [61]:
# Retry only the models that failed (edit failed_models to match your last run).
# Self-contained: does not depend on benchmark_models() signature.
import time as _time
failed_models = [
    "google/gemini-3-pro-preview",
    "anthropic/claude-opus-4.6",
    "moonshotai/kimi-k2.5",
]
base_compile = ["clang++", "-std=c++17", "-Ofast", "-mcpu=native", "-flto=thin", "-fvisibility=hidden", "-DNDEBUG"]

with open("_bench_python.py", "w", encoding="utf-8") as f:
    f.write(pi)
_t0 = _time.perf_counter()
subprocess.run([sys.executable, "_bench_python.py"], check=True, capture_output=True, timeout=600)
python_time_retry = _time.perf_counter() - _t0
print(f"Python baseline: {python_time_retry:.3f}s\n")

results_retry = []
for i, model in enumerate(failed_models):
    slug = model.replace("/", "_")
    cpp_path = f"generated_{slug}.cpp"
    exe_name = f"main_{slug}"
    print(f"[{i+1}/{len(failed_models)}] {model} ...", end=" ", flush=True)
    try:
        cpp = port(openrouter, model, pi)
        with open(cpp_path, "w", encoding="utf-8") as f:
            f.write(cpp)
        subprocess.run(base_compile + [cpp_path, "-o", exe_name], check=True, text=True, capture_output=True, timeout=60)
        _t0 = _time.perf_counter()
        subprocess.run([f"./{exe_name}"], check=True, text=True, capture_output=True, timeout=300, cwd=os.getcwd())
        elapsed = _time.perf_counter() - _t0
        results_retry.append((model, elapsed, True, None))
        print(f"{elapsed:.3f}s  ({python_time_retry/elapsed:.1f}× faster than Python)")
    except subprocess.TimeoutExpired:
        results_retry.append((model, float("inf"), False, "timeout"))
        print("timeout")
    except subprocess.CalledProcessError as e:
        results_retry.append((model, float("inf"), False, None))
        print("compile/run failed")
    except Exception as e:
        results_retry.append((model, float("inf"), False, None))
        print("error")

results_retry.sort(key=lambda x: x[1])
print("\n--- Retry results (fastest first) ---")
for rank, (model, cpp_t, ok, _) in enumerate(results_retry, 1):
    if ok:
        print(f"  {rank}. {model}: {cpp_t:.3f}s  ({python_time_retry/cpp_t:.1f}× faster than Python)")
    else:
        print(f"  {rank}. {model}: failed")

Python baseline: 68.694s

[1/3] google/gemini-3-pro-preview ... compile/run failed
[2/3] anthropic/claude-opus-4.6 ... compile/run failed
[3/3] moonshotai/kimi-k2.5 ... 3.421s  (20.1× faster than Python)

--- Retry results (fastest first) ---
  1. moonshotai/kimi-k2.5: 3.421s  (20.1× faster than Python)
  2. google/gemini-3-pro-preview: failed
  3. anthropic/claude-opus-4.6: failed


## 7. Gradio UI (optional)

Web interface: pick a model, paste Python code, click **Convert code** to get C++ in the other box.

In [42]:
with gr.Blocks() as ui:
    with gr.Row():
        python = gr.Textbox(label="Python code:", lines=28, value=pi)
        cpp = gr.Textbox(label="C++ code:", lines=28)
    with gr.Row():
        model = gr.Dropdown(models, label="Select model", value=models[0])
        convert = gr.Button("Convert code")

    convert.click(port, inputs=[model, python], outputs=[cpp])

ui.launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7861
* To create a public link, set `share=True` in `launch()`.


