# Code Generator

The requirement: use a Frontier model to generate high performance C++ code from Python code


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder: fetch latest code</h2>
            <span style="color:#f71;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a <a href="https://chatgpt.com/share/6734e705-3270-8012-a074-421661af6ba9">git pull and merge your changes as needed</a>. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/><br/>
            After you've pulled the code, from the llm_engineering directory, in an Anaconda prompt (PC) or Terminal (Mac), run:<br/>
            <code>conda env update --f environment.yml --prune</code><br/>
            Or if you used virtualenv rather than Anaconda, then run this from your activated environment in a Powershell (PC) or Terminal (Mac):<br/>
            <code>pip install -r requirements.txt</code>
            <br/>Then restart the kernel (Kernel menu >> Restart Kernel and Clear Outputs Of All Cells) to pick up the changes.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h1 style="color:#900;">Important Note</h1>
            <span style="color:#900;">
            In this lab, I use GPT-4o and Claude-3.5-Sonnet, which are the slightly higher priced models. The costs are still low, but if you'd prefer to keep costs ultra low, please make the suggested switches to the models (3 cells down from here).
            </span>
        </td>
    </tr>
</table>

In [41]:
import os
from dotenv import load_dotenv

# Load .env file
load_dotenv()

# Get the API key from environment variables
api_key = os.getenv("OPEN_ROUTER_KEY")


In [42]:

api = OpenRouter(api_key)


In [36]:
display(Markdown(full_text))

Sure, I can explain the theory of relativity in simple terms. There are actually two parts to Einstein's theory of relativity:

1. **Special Relativity**: This theory deals with objects moving at constant speeds, particularly those approaching the speed of light. The key ideas are:
   - **The Speed of Light is Constant**: No matter how fast you move, lightspeed appears the same if you're measuring it.
   - **Time Dilation**: Time moves slower for objects moving at high speeds compared to those at rest, assuming someone else in space sees them both.
   - **Length Contraction**: Objects moving at high speeds appear shorter along the direction of motion.
   - **Equivalence of Mass and Energy**: This is expressed by the famous equation \(E = mc^2\), which means energy (E) and mass (m) are interchangeable; c is the speed of light.

2. **General Relativity**: This theory extends the idea of special relativity and deals with gravity:
   - **Gravity is Curvature of Spacetime**: According to general relativity, massive objects like planets and stars bend the fabric of spacetime around them.
   - **Everything Follows the Bend**: Other objects move along the curves in spacetime caused by massive objects, which we perceive as gravity.

In essence, what these theories show us is that our everyday experiences of space, time, and gravity are more complex and interconnected than we might originally think.

In [39]:

cpp="""
// To compile on M1 Mac with clang for maximum performance:
// clang++ -std=c++17 -O3 -march=native -Xpreprocessor -fopenmp -lomp -o fast_subarray fast_subarray.cpp

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdint>
#include <limits>
#include <iomanip>
#include <algorithm>
#include <omp.h>

// LCG constants from the Python code.
constexpr uint64_t LCG_A = 1664525;
constexpr uint64_t LCG_C = 1013904223;

// LCG function that mimics Python's `(a * value + c) % 2**32`.
// Unsigned 32-bit integer arithmetic naturally handles the modulo.
inline uint32_t next_lcg(uint32_t& seed) {
    seed = static_cast<uint32_t>(LCG_A * seed + LCG_C);
    return seed;
}

// Calculates the maximum subarray sum for n generated numbers.
// This uses Kadane's algorithm (O(n)) instead of the O(n^2) brute-force approach.
// Numbers are generated on-the-fly to save memory and improve cache usage.
long long max_subarray_sum(int n, uint32_t seed, int min_val, int max_val) {
    if (n == 0) {
        return 0;
    }

    const long long range = static_cast<long long>(max_val) - min_val + 1;
    uint32_t current_seed = seed;

    // Generate the first number to initialize the algorithm correctly.
    long long first_num = next_lcg(current_seed) % range + min_val;
    long long max_so_far = first_num;
    long long current_max = first_num;

    // Iterate through the rest of the numbers.
    for (int i = 1; i < n; ++i) {
        long long num = next_lcg(current_seed) % range + min_val;
        current_max = std::max(num, current_max + num);
        max_so_far = std::max(max_so_far, current_max);
    }

    return max_so_far;
}

long long total_max_subarray_sum(int n, uint32_t initial_seed, int min_val, int max_val) {
    long long total_sum = 0;
    constexpr int num_runs = 20;

    // Pre-calculate the 20 seeds. This breaks the sequential dependency and
    // allows the 20 runs of max_subarray_sum to be executed in parallel.
    std::vector<uint32_t> seeds(num_runs);
    uint32_t lcg_state = initial_seed;
    for (int i = 0; i < num_runs; ++i) {
        seeds[i] = next_lcg(lcg_state);
    }

    // Use OpenMP to parallelize the 20 independent calculations.
    #pragma omp parallel for reduction(+:total_sum)
    for (int i = 0; i < num_runs; ++i) {
        total_sum += max_subarray_sum(n, seeds[i], min_val, max_val);
    }

    return total_sum;
}

int main() {
    // Fast I/O
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(NULL);

    // Parameters
    constexpr int n = 10000;
    constexpr uint32_t initial_seed = 42;
    constexpr int min_val = -10;
    constexpr int max_val = 10;

    auto start_time = std::chrono::high_resolution_clock::now();

    long long result = total_max_subarray_sum(n, initial_seed, min_val, max_val);

    auto end_time = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> execution_time = end_time - start_time;

    std::cout << "Total Maximum Subarray Sum (20 runs): " << result << std::endl;
    std::cout << "Execution Time: " << std::fixed << std::setprecision(6) << execution_time.count() << " seconds" << std::endl;

    return 0;
}"""

In [40]:
api.chat_with_stream_display(f"convert this {cpp} into a python code ")

Certainly! Below is the equivalent Python code that mimics the behavior of the provided C++ code. Note that Python does not have built-in support for OpenMP, but we can use the `concurrent.futures` module to achieve parallelization.

```python
import time
import random
from concurrent.futures import ThreadPoolExecutor

# LCG constants
LCG_A = 1664525
LCG_C = 1013904223

# LCG function
def next_lcg(seed):
    seed = (LCG_A * seed + LCG_C) & 0xFFFFFFFF
    return seed

# Function to calculate the maximum subarray sum using Kadane's algorithm
def max_subarray_sum(n, seed, min_val, max_val):
    if n == 0:
        return 0

    range_val = max_val - min_val + 1
    current_seed = seed

    # Generate the first number
    first_num = next_lcg(current_seed) % range_val + min_val
    max_so_far = first_num
    current_max = first_num

    # Iterate through the rest of the numbers
    for _ in range(1, n):
        num = next_lcg(current_seed) % range_val + min_val
        current_max = max(num, current_max + num)
        max_so_far = max(max_so_far, current_max)

    return max_so_far

# Function to calculate the total maximum subarray sum using multiple runs
def total_max_subarray_sum(n, initial_seed, min_val, max_val):
    total_sum = 0
    num_runs = 20

    # Pre-calculate the seeds for the 20 runs
    seeds = []
    lcg_state = initial_seed
    for _ in range(num_runs):
        seeds.append(next_lcg(lcg_state))

    # Use a thread pool to parallelize the 20 independent calculations
    with ThreadPoolExecutor(max_workers=num_runs) as executor:
        results = executor.map(lambda seed: max_subarray_sum(n, seed, min_val, max_val), seeds)
        total_sum = sum(results)

    return total_sum

def main():
    # Parameters
    n = 10000
    initial_seed = 42
    min_val = -10
    max_val = 10

    start_time = time.time()

    result = total_max_subarray_sum(n, initial_seed, min_val, max_val)

    end_time = time.time()
    execution_time = end_time - start_time

    print(f"Total Maximum Subarray Sum (20 runs): {result}")
    print(f"Execution Time: {execution_time:.6f} seconds")

if __name__ == "__main__":
    main()
```

### Explanation:
1. **LCG Function**: The LCG function `next_lcg` generates the next value in the sequence using the same constants as the C++ code.
2. **Max Subarray Sum**: The `max_subarray_sum` function implements Kadane's algorithm to find the maximum subarray sum.
3. **Total Max Subarray Sum**: The `total_max_subarray_sum` function generates 20 different seeds and uses a thread pool to parallelize the calculation of the maximum subarray sum for each seed.
4. **Main Function**: The `main` function sets the parameters, measures the execution time, and prints the results.

This Python code should provide similar performance benefits as the C++ version when running on a multi-core machine, thanks to the parallelization using `ThreadPoolExecutor`.


--- Stream finished ---


{'generator': <generator object OpenRouter._stream_generate at 0x7fb7ff6f1210>,
 'text': 'Certainly! Below is the equivalent Python code that mimics the behavior of the provided C++ code. Note that Python does not have built-in support for OpenMP, but we can use the `concurrent.futures` module to achieve parallelization.\n\n```python\nimport time\nimport random\nfrom concurrent.futures import ThreadPoolExecutor\n\n# LCG constants\nLCG_A = 1664525\nLCG_C = 1013904223\n\n# LCG function\ndef next_lcg(seed):\n    seed = (LCG_A * seed + LCG_C) & 0xFFFFFFFF\n    return seed\n\n# Function to calculate the maximum subarray sum using Kadane\'s algorithm\ndef max_subarray_sum(n, seed, min_val, max_val):\n    if n == 0:\n        return 0\n\n    range_val = max_val - min_val + 1\n    current_seed = seed\n\n    # Generate the first number\n    first_num = next_lcg(current_seed) % range_val + min_val\n    max_so_far = first_num\n    current_max = first_num\n\n    # Iterate through the rest of the nu

In [8]:
# imports

import os
import io
import sys
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
import gradio as gr
import subprocess
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage, ToolMessage

import base64

In [9]:
sys.path.append('/run/media/sijanpaudel/New Volume/New folder/llm_engineering')

In [10]:
from key_utils import get_next_key

In [11]:
api_key, _  = get_next_key()

In [12]:
# environment

# load_dotenv(override=True)
# os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
# os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')

In [61]:
# initialize
# NOTE - option to use ultra-low cost models by uncommenting last 2 lines

# openai = OpenAI()
# claude = anthropic.Anthropic()
# OPENAI_MODEL = "gpt-4o"
# CLAUDE_MODEL = "claude-3-5-sonnet-20240620"
GEMINI_PRO_MODEL = "gemini-2.5-pro"
# Want to keep costs ultra-low? Uncomment these lines:
# OPENAI_MODEL = "gpt-4o-mini"
# CLAUDE_MODEL = "claude-3-haiku-20240307"

In [55]:
system_message =( "You are an assistant that reimplements Python code in high performance C++ for an M1 Mac. "
"Respond only with C++ code; use comments sparingly and do not provide any explanation other than occasional comments. "
"The C++ response needs to produce an identical output in the fastest possible time."
                )

In [56]:
def user_prompt_for(python):
    user_prompt = ("Rewrite this Python code in C++ with the fastest possible implementation that produces identical output in the least time. "
    "Respond only with C++ code; do not explain your work other than a few comments. "
    "Pay attention to number types to ensure no int overflows. Remember to #include all necessary C++ packages such as iomanip.\n\n"
    f"{python}"
                  )
    return user_prompt

In [57]:
# def messages_for(python):
#     return [
#         SystemMessage(content=system_message),
#         HumanMessage(content=user_prompt_for(python))
#     ]

In [58]:
# write to a file called optimized.cpp

def write_output(cpp):
    code = cpp.replace("```cpp","").replace("```","")
    with open("optimized.cpp", "w") as f:
        f.write(code)

In [78]:
from LLMHandler import LLMHandler
handler_flash = LLMHandler(system_message=system_message)
handler_pro = LLMHandler(system_message=system_message)
handler_pro.TEXT_MODEL = GEMINI_PRO_MODEL


In [60]:
def optimize_gemini_flash(python):
    import time
    import random
    
    for attempt in range(5):
        try:
            response = handler_flash.stream_and_process(user_input=user_prompt_for(python))
            break
        except Exception as e:
            print(f"Attempt {attempt+1} failed: {e}")
            time.sleep((2 ** attempt) + random.random())
    write_output(response)

In [75]:
handler_pro.TEXT_MODEL =GEMINI_PRO_MODEL

In [79]:
print(handler_flash.TEXT_MODEL)

gemini-2.0-flash


In [80]:
print(handler_pro.TEXT_MODEL)

gemini-2.5-pro


In [81]:
def optimize_gemini_pro(python):
    import time
    import random
    
    for attempt in range(5):
        try:
            response = handler_pro.stream_and_process(user_input=user_prompt_for(python))
            
            break
        except Exception as e:
            print(f"Attempt {attempt+1} failed: {e}")
            time.sleep((2 ** attempt) + random.random())
    write_output(response)

In [64]:
# def optimize_gpt(python):    
#     stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=messages_for(python), stream=True)
#     reply = ""
#     for chunk in stream:
#         fragment = chunk.choices[0].delta.content or ""
#         reply += fragment
#         print(fragment, end='', flush=True)
#     write_output(reply)

In [65]:
# def optimize_claude(python):
#     result = claude.messages.stream(
#         model=CLAUDE_MODEL,
#         max_tokens=2000,
#         system=system_message,
#         messages=[{"role": "user", "content": user_prompt_for(python)}],
#     )
#     reply = ""
#     with result as stream:
#         for text in stream.text_stream:
#             reply += text
#             print(text, end="", flush=True)
#     write_output(reply)

In [82]:
pi = """
import time

def calculate(iterations, param1, param2):
    result = 1.0
    for i in range(1, iterations+1):
        j = i * param1 - param2
        result -= (1/j)
        j = i * param1 + param2
        result += (1/j)
    return result

start_time = time.time()
result = calculate(100_000_000, 4, 1) * 4
end_time = time.time()

print(f"Result: {result:.12f}")
print(f"Execution Time: {(end_time - start_time):.6f} seconds")
"""

In [51]:
exec(pi)

Result: 3.141592658589
Execution Time: 6.566467 seconds


In [83]:
optimize_gemini_flash(pi)

➡️ Attempt 1/11 using key from 'pshreesha30'...
Assistant is responding...


```cpp
#include <iostream>
#include <iomanip>
#include <chrono>

double calculate(long long iterations, int param1, int param2) {
    double result = 1.0;
    for (long long i = 1; i <= iterations; ++i) {
        double j = static_cast<double>(i) * param1 - param2;
        result -= (1.0 / j);
        j = static_cast<double>(i) * param1 + param2;
        result += (1.0 / j);
    }
    return result;
}

int main() {
    auto start_time = std::chrono::high_resolution_clock::now();
    double result = calculate(100000000LL, 4, 1) * 4;
    auto end_time = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    double execution_time = static_cast<double>(duration.count()) / 1000000.0;

    std::cout << std::fixed << std::setprecision(12) << "Result: " << result << std::endl;
    std::cout << std::fixed << std::setprecision(6) << "Execution Time: " << execution_time << " seconds" << std::endl;

    return 0;
}
```


--- Stream finished ---


In [31]:
exec(pi)

Result: 3.141592658589
Execution Time: 7.222917 seconds


# Compiling C++ and executing

This next cell contains the command to compile a C++ file on my M1 Mac.  
It compiles the file `optimized.cpp` into an executable called `optimized`  
Then it runs the program called `optimized`

In the next lab (day4), a student has contributed a full solution that compiles to efficient code on Mac, PC and Linux!

You can wait for this, or you can google (or ask ChatGPT!) for how to do this on your platform, then replace the lines below.
If you're not comfortable with this step, you can skip it for sure - I'll show you exactly how it performs on my Mac.


OR alternatively: student Sandeep K.G. points out that you can run Python and C++ code online to test it out that way. Thank you Sandeep!  
> Not an exact comparison but you can still get the idea of performance difference.
> For example here: https://www.programiz.com/cpp-programming/online-compiler/

In [88]:
#Flash
!clang++ -O3 -std=c++17 -o optimized optimized.cpp
!./optimized


Result: 3.141592658589
Execution Time: 0.180269 seconds


In [89]:
optimize_gemini_pro(pi)

➡️ Attempt 1/11 using key from 'nepalcric4'...
Assistant is responding...





--- Stream finished ---
Attempt 1 failed: 503 The model is overloaded. Please try again later.
➡️ Attempt 1/11 using key from 'prabeshsubedi'...
Assistant is responding...


```cpp
// To compile on M1 Mac with clang for maximum performance:
// clang++ -std=c++17 -O3 -march=native -Xpreprocessor -fopenmp -lomp -o fast_calculator fast_calculator.cpp

#include <iostream>
#include <iomanip>
#include <chrono>
#include <omp.h>

// This function uses an algebraic simplification of the original loop's operations.
// Original: result += (1 / (i*p1 + p2)) - (1 / (i*p1 - p2))
// Simplified: result += (-2 * p2) / ((i*p1)^2 - p2^2)
// This reduces two divisions to one per iteration.
double calculate(long long iterations, int param1, int param2) {
    double result = 1.0;
    const double p2_d = static_cast<double>(param2);
    const double neg_2_p2 = -2.0 * p2_d;
    const double p2_sq = p2_d * p2_d;

    // Parallelize the loop using OpenMP for multi-core processing.
    // The 'reduction' clause ensures that the 'result' from each thread
    // is correctly summed into the final result.
    #pragma omp parallel for reduction(+:result)
    for (long long i = 1; i <= iterations; ++i) {
        double i_p1_d = static_cast<double>(i) * param1;
        result += neg_2_p2 / (i_p1_d * i_p1_d - p2_sq);
    }
    return result;
}

int main() {
    // Ensure standard library streams are fast.
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(NULL);

    const long long iterations = 100'000'000LL;
    const int param1 = 4;
    const int param2 = 1;

    auto start_time = std::chrono::high_resolution_clock::now();

    double result = calculate(iterations, param1, param2) * 4.0;

    auto end_time = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> execution_time = end_time - start_time;

    std::cout << "Result: " << std::fixed << std::setprecision(12) << result << std::endl;
    std::cout << "Execution Time: " << std::fixed << std::setprecision(6) << execution_time.count() << " seconds" << std::endl;

    return 0;
}
```


--- Stream finished ---


In [90]:
# Repeat for Pro - again, use the right approach for your platform
!clang++ -O3 -std=c++17 -o optimized optimized.cpp
!./optimized


Result: 3.141592662604
Execution Time: 0.103814 seconds


In [91]:
python_hard = """# Be careful to support large number sizes

def lcg(seed, a=1664525, c=1013904223, m=2**32):
    value = seed
    while True:
        value = (a * value + c) % m
        yield value
        
def max_subarray_sum(n, seed, min_val, max_val):
    lcg_gen = lcg(seed)
    random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]
    max_sum = float('-inf')
    for i in range(n):
        current_sum = 0
        for j in range(i, n):
            current_sum += random_numbers[j]
            if current_sum > max_sum:
                max_sum = current_sum
    return max_sum

def total_max_subarray_sum(n, initial_seed, min_val, max_val):
    total_sum = 0
    lcg_gen = lcg(initial_seed)
    for _ in range(20):
        seed = next(lcg_gen)
        total_sum += max_subarray_sum(n, seed, min_val, max_val)
    return total_sum

# Parameters
n = 10000         # Number of random numbers
initial_seed = 42 # Initial seed for the LCG
min_val = -10     # Minimum value of random numbers
max_val = 10      # Maximum value of random numbers

# Timing the function
import time
start_time = time.time()
result = total_max_subarray_sum(n, initial_seed, min_val, max_val)
end_time = time.time()

print("Total Maximum Subarray Sum (20 runs):", result)
print("Execution Time: {:.6f} seconds".format(end_time - start_time))
"""

In [92]:
exec(python_hard)

Total Maximum Subarray Sum (20 runs): 10980
Execution Time: 25.137561 seconds


In [93]:
optimize_gemini_flash(python_hard)

➡️ Attempt 1/11 using key from 'sandeshtiwari'...
Assistant is responding...


```cpp
#include <iostream>
#include <vector>
#include <limits>
#include <chrono>
#include <iomanip>

// LCG implementation
uint32_t lcg(uint32_t& seed, uint32_t a, uint32_t c, uint64_t m) {
    seed = (static_cast<uint64_t>(a) * seed + c) % m;
    return seed;
}

// Max subarray sum
long long max_subarray_sum(int n, uint32_t seed, int min_val, int max_val) {
    std::vector<int> random_numbers(n);
    uint32_t current_seed = seed;
    for (int i = 0; i < n; ++i) {
        random_numbers[i] = lcg(current_seed, 1664525, 1013904223, static_cast<uint64_t>(1) << 32) % (max_val - min_val + 1) + min_val;
    }

    long long max_sum = std::numeric_limits<long long>::min();
    for (int i = 0; i < n; ++i) {
        long long current_sum = 0;
        for (int j = i; j < n; ++j) {
            current_sum += random_numbers[j];
            if (current_sum > max_sum) {
                max_sum = current_sum;
            }
        }
    }
    return max_sum;
}

// Total max subarray sum
long long total_max_subarray_sum(int n, uint32_t initial_seed, int min_val, int max_val) {
    long long total_sum = 0;
    uint32_t current_seed = initial_seed;
    for (int i = 0; i < 20; ++i) {
        uint32_t seed = lcg(current_seed, 1664525, 1013904223, static_cast<uint64_t>(1) << 32);
        total_sum += max_subarray_sum(n, seed, min_val, max_val);
    }
    return total_sum;
}

int main() {
    int n = 10000;
    uint32_t initial_seed = 42;
    int min_val = -10;
    int max_val = 10;

    auto start_time = std::chrono::high_resolution_clock::now();
    long long result = total_max_subarray_sum(n, initial_seed, min_val, max_val);
    auto end_time = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    double execution_time = static_cast<double>(duration.count()) / 1000000.0;

    std::cout << "Total Maximum Subarray Sum (20 runs): " << result << std::endl;
    std::cout << std::fixed << std::setprecision(6) << "Execution Time: " << execution_time << " seconds" << std::endl;

    return 0;
}
```


--- Stream finished ---


In [94]:
# Replace this with the right C++ compile + execute command for your platform

!clang++ -O3 -std=c++17 -o optimized optimized.cpp
!./optimized


Total Maximum Subarray Sum (20 runs): 10980
Execution Time: 0.268983 seconds


In [96]:
optimize_gemini_pro(python_hard)

➡️ Attempt 1/11 using key from 'iamengineer'...
Assistant is responding...





--- Stream finished ---
Attempt 1 failed: 503 The model is overloaded. Please try again later.
➡️ Attempt 1/11 using key from 'birajmohanta'...
Assistant is responding...





--- Stream finished ---
Attempt 2 failed: 503 The model is overloaded. Please try again later.
➡️ Attempt 1/11 using key from 'saharamanandhar'...
Assistant is responding...





--- Stream finished ---
Attempt 3 failed: 503 The model is overloaded. Please try again later.
➡️ Attempt 1/11 using key from 'sijanpaudel'...
Assistant is responding...





--- Stream finished ---
Attempt 4 failed: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 50
Please retry in 6.546221566s. [violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-pro"
  }
  quota_value: 50
}
, links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, retry_delay {
  seconds: 6
}
]
➡️ Attempt 1/11 using key from 'sijan.paudel10'...
Assistant is responding...


```cpp
// To compile on M1 Mac with clang for maximum performance:
// clang++ -std=c++17 -O3 -march=native -Xpreprocessor -fopenmp -lomp -o fast_subarray fast_subarray.cpp

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdint>
#include <limits>
#include <iomanip>
#include <algorithm>
#include <omp.h>

// LCG constants from the Python code.
constexpr uint64_t LCG_A = 1664525;
constexpr uint64_t LCG_C = 1013904223;

// LCG function that mimics Python's `(a * value + c) % 2**32`.
// Unsigned 32-bit integer arithmetic naturally handles the modulo.
inline uint32_t next_lcg(uint32_t& seed) {
    seed = static_cast<uint32_t>(LCG_A * seed + LCG_C);
    return seed;
}

// Calculates the maximum subarray sum for n generated numbers.
// This uses Kadane's algorithm (O(n)) instead of the O(n^2) brute-force approach.
// Numbers are generated on-the-fly to save memory and improve cache usage.
long long max_subarray_sum(int n, uint32_t seed, int min_val, int max_val) {
    if (n == 0) {
        return 0;
    }

    const long long range = static_cast<long long>(max_val) - min_val + 1;
    uint32_t current_seed = seed;

    // Generate the first number to initialize the algorithm correctly.
    long long first_num = next_lcg(current_seed) % range + min_val;
    long long max_so_far = first_num;
    long long current_max = first_num;

    // Iterate through the rest of the numbers.
    for (int i = 1; i < n; ++i) {
        long long num = next_lcg(current_seed) % range + min_val;
        current_max = std::max(num, current_max + num);
        max_so_far = std::max(max_so_far, current_max);
    }

    return max_so_far;
}

long long total_max_subarray_sum(int n, uint32_t initial_seed, int min_val, int max_val) {
    long long total_sum = 0;
    constexpr int num_runs = 20;

    // Pre-calculate the 20 seeds. This breaks the sequential dependency and
    // allows the 20 runs of max_subarray_sum to be executed in parallel.
    std::vector<uint32_t> seeds(num_runs);
    uint32_t lcg_state = initial_seed;
    for (int i = 0; i < num_runs; ++i) {
        seeds[i] = next_lcg(lcg_state);
    }

    // Use OpenMP to parallelize the 20 independent calculations.
    #pragma omp parallel for reduction(+:total_sum)
    for (int i = 0; i < num_runs; ++i) {
        total_sum += max_subarray_sum(n, seeds[i], min_val, max_val);
    }

    return total_sum;
}

int main() {
    // Fast I/O
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(NULL);

    // Parameters
    constexpr int n = 10000;
    constexpr uint32_t initial_seed = 42;
    constexpr int min_val = -10;
    constexpr int max_val = 10;

    auto start_time = std::chrono::high_resolution_clock::now();

    long long result = total_max_subarray_sum(n, initial_seed, min_val, max_val);

    auto end_time = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> execution_time = end_time - start_time;

    std::cout << "Total Maximum Subarray Sum (20 runs): " << result << std::endl;
    std::cout << "Execution Time: " << std::fixed << std::setprecision(6) << execution_time.count() << " seconds" << std::endl;

    return 0;
}
```


--- Stream finished ---


In [97]:
# Replace this with the right C++ compile + execute command for your platform

!clang++ -O3 -std=c++17 -o optimized optimized.cpp
!./optimized


Total Maximum Subarray Sum (20 runs): 10980
Execution Time: 0.000550 seconds


In [116]:
def stream_gemini_flash(python):    
    stream = handler_flash.run(user_prompt_for(python),stream=True)
    reply = ""
    for chunk in stream:
        fragment = chunk.content or ""
        reply += fragment
        yield reply.replace('```cpp\n','').replace('```','')

In [117]:
def stream_gemini_pro(python):
    stream = handler_pro.run(user_prompt_for(python),stream=True)
    reply = ""
    for chunk in stream:
        fragment = chunk.content or ""
        reply += fragment
        yield reply.replace('```cpp\n','').replace('```','')

In [118]:
def optimize(python, model):
    if model=="GEMINI_FLASH":
        result = stream_gemini_flash(python)
    elif model=="GEMINI_PRO":
        result = stream_gemini_pro(python)
    else:
        raise ValueError("Unknown model")
    for stream_so_far in result:
        yield stream_so_far        

In [None]:
with gr.Blocks() as ui:
    with gr.Row():
        python = gr.Textbox(label="Python code:", lines=10, value=python_hard)
        cpp = gr.Textbox(label="C++ code:", lines=10)
    with gr.Row():
        model = gr.Dropdown(["GEMINI_FLASH", "GEMINI_PRO"], label="Select model", value="GEMINI_FLASH")
        convert = gr.Button("Convert code")

    convert.click(optimize, inputs=[python, model], outputs=[cpp])

ui.launch(inbrowser=True)

In [120]:
def execute_python(code):
    try:
        output = io.StringIO()
        sys.stdout = output
        exec(code)
    finally:
        sys.stdout = sys.__stdout__
    return output.getvalue()

In [125]:
# You'll need to change the code in the try block to compile the C++ code for your platform
# I pasted this into Claude's chat UI with a request for it to give me a version for an Intel PC,
# and it responded with something that looks perfect - you can try a similar approach for your platform.

# M1 Mac version to compile and execute optimized C++ code:

def execute_cpp(code):
        write_output(code)
        try:
            compile_cmd = [
    "clang++",
    "-Ofast",
    "-std=c++17",
    "-march=native",   # use CPU of current machine
    "-o",
    "optimized",
    "optimized.cpp"
]
            compile_result = subprocess.run(compile_cmd, check=True, text=True, capture_output=True)
            run_cmd = ["./optimized"]
            run_result = subprocess.run(run_cmd, check=True, text=True, capture_output=True)
            return run_result.stdout
        except subprocess.CalledProcessError as e:
            return f"An error occurred:\n{e.stderr}"

In [126]:
css = """
.python {background-color: #306998;}
.cpp {background-color: #050;}
"""

In [127]:
with gr.Blocks(css=css) as ui:
    gr.Markdown("## Convert code from Python to C++")
    with gr.Row():
        python = gr.Textbox(label="Python code:", value=python_hard, lines=10)
        cpp = gr.Textbox(label="C++ code:", lines=10)
    with gr.Row():
        model = gr.Dropdown(["GEMINI_FLASH", "GEMINI_PRO"], label="Select model", value="GEMINI_FLASH")
    with gr.Row():
        convert = gr.Button("Convert code")
    with gr.Row():
        python_run = gr.Button("Run Python")
        cpp_run = gr.Button("Run C++")
    with gr.Row():
        python_out = gr.TextArea(label="Python result:", elem_classes=["python"])
        cpp_out = gr.TextArea(label="C++ result:", elem_classes=["cpp"])

    convert.click(optimize, inputs=[python, model], outputs=[cpp])
    python_run.click(execute_python, inputs=[python], outputs=[python_out])
    cpp_run.click(execute_cpp, inputs=[cpp], outputs=[cpp_out])

ui.launch(inbrowser=True)

