# Project Overview: R to Python Code Convertor

The goal of this project is to
*   Convert R to Python Code
*   Maintain desired outputs and speed up time to completion when possible

Why this is useful
*   Applies to almost any current data science business use case regarding modernization
*   Many companies are transitioning R code to Python, this helps speed up the process

# STEP 0: Installs, Imports, API Setup

## Installs
Since we're using a Google Colab to run this, we need to install R and rpy2

In [None]:
!apt-get update -qq
!apt-get install -y r-base r-base-dev

!pip install rpy2

# Install commonly used R packages (optional)
!Rscript -e "install.packages(c('dplyr', 'ggplot2'), repos='https://cloud.r-project.org', quiet=TRUE)"

## Imports

In [None]:
import os
import io
import sys
import time
from dotenv import load_dotenv
from google.colab import drive, userdata
from huggingface_hub import login

from openai import OpenAI
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

from IPython.display import Markdown, display

Was having issues with rpy2 import and pandas2ri; meaning the user couldn't run R code in the Gradio UI

This error handling should let the user know if they can execute R code

In [None]:
try:
    import rpy2.robjects as robjects
    R_AVAILABLE = True
    print("SUCCESS! rpy2 successfully imported - R code execution available")

    # Try to activate pandas conversion (optional, not critical)
    try:
        from rpy2.robjects import pandas2ri
        pandas2ri.activate()
        print("pandas2ri activated")
    except (ImportError, AttributeError):
        try:
            # Try alternative import for newer versions
            import rpy2.robjects.conversion
            print("rpy2 conversion available")
        except ImportError:
            print("pandas2ri not available, but basic R execution will work")
except ImportError as e:
    R_AVAILABLE = False
    robjects = None
    print("Warning: rpy2 not installed. R code execution will be disabled.")
    print("\nTo enable R execution in Google Colab, run these commands:")
    print("  !apt-get update -qq")
    print("  !apt-get install -y r-base r-base-dev")
    print("  !pip install rpy2")

## Sign into HuggingFace Hub

In [None]:
hf_token = userdata.get('HF_TOKEN')
if hf_token and hf_token.startswith("hf_"):
  print("HF key looks good so far")
else:
  print("HF key is not set - please click the key in the left sidebar")

login(hf_token, add_to_git_credential=True)

## Get other API Keys

In [None]:
gemini_token = userdata.get('GEMINI_API_KEY')
if gemini_token and gemini_token.startswith("AIza"):
  print("Gemini key looks good so far")
else:
  print("Gemini key is not set - please click the key in the left sidebar")

openai_token = userdata.get('OPENAI_API_KEY')
if openai_token and openai_token.startswith("sk-"):
  print("OpenAI key looks good so far")
else:
  print("OpenAI key is not set - please click the key in the left sidebar")

groq_token = userdata.get('GROQ_API_KEY')
if groq_token and groq_token.startswith("gsk"):
  print("Groq key looks good so far")
else:
  print("Groq key is not set - please click the key in the left sidebar")

## Connect to Client Libraries

In [None]:
openai = OpenAI(api_key=openai_token)

gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
groq_url = "https://api.groq.com/openai/v1"
ollama_url = "http://localhost:11434/v1"

gemini = OpenAI(api_key=gemini_token, base_url=gemini_url)
groq = OpenAI(api_key=groq_token, base_url=groq_url)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

## Set up Model-Client Mapping

**Note**: Used Artificial Analysis Leaderboard and filtered to cheap/open-source models that were best at Coding Intelligence
* LiveCodeBench and SciCode metrics


**Future Work**: If I was willing to pay more, I would have leveraged these additional models:
- Gemini 3 Pro
- Gemini 3 Flash
- GPT 5.2
- Claude Opus 4.5

In [None]:
models = ["openai/gpt-oss-120b", "gpt-5-nano", "gemini_2.5-flash-lite", "qwen2.5-coder", "deepseek-coder-v2", "gpt-oss:20b"]

clients = {
    "gpt-5-nano": openai,
    "gemini_2.5-flash-lite": gemini,
    "openai/gpt-oss-120b": groq,
    "qwen2.5-coder": ollama,
    "deepseek-coder-v2": ollama,
    "gpt-oss:20b": ollama
}

# STEP 1: Design the Prompt Engine
Allows us to tell the LLM *what to do*

## System Prompt

In [None]:
system_prompt = """
Your task is to convert R code into optimized, high-performance Python code.
Follow these guidelines:
1. Use efficient Python libraries (numpy, pandas, scipy, etc.)
2. Vectorize operations where possible
3. Maintain identical functionality and output
4. Use Pythonic idioms and best practices
5. Include necessary imports
6. Add brief comments explaining key conversions
7. If the user is writing a DataFrame or CSV, print out that CSV output

Respond only with Python code that can be executed directly.
"""


def user_prompt_for(r_code):
    return f"""
Convert this R code to optimized Python code that produces identical output with better performance.

Requirements:
- Use numpy, pandas, and other efficient libraries
- Vectorize operations where possible
- Include all necessary imports at the top
- Maintain the same output format
- Optimize for speed

R code to convert:

```r
{r_code}
```

Respond with executable Python code only.
"""

In [None]:
def messages_for(r_code):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(r_code)}
    ]


# STEP 3: Core Code Generation Logic

## Helper Function

In [None]:
# Cleans markdown code blocks from the LLM response
def clean_code_response(response_text):
    response_text = response_text.replace('```python', '').replace('```r', '').replace('```', '')
    return response_text.strip()

## Conversion Function

In [None]:
# Converts R code to Python using the model of choice
def convert_r_to_python(model, r_code):
    try:
        client = clients[model]
        if model == "gpt-5-nano":
          response = client.chat.completions.create(
            model=model,
            messages=messages_for(r_code)
        )
        else:
          response = client.chat.completions.create(
              model=model,
              messages=messages_for(r_code),
              temperature=0.3  # Lower temperature for more consistent code generation
          )
        python_code = response.choices[0].message.content
        python_code = clean_code_response(python_code)
        return python_code
    except Exception as e:
        return f"# Error during conversion:\n# {str(e)}"

## Execution Functions

In [None]:
# Runs the R code and returns the output with a timing metric
def run_r_code(code):
    # Need to handle if R isn't available
    if not R_AVAILABLE:
        return "R execution not available (rpy2 not installed)\nPlease install R and rpy2 to run R code.", 0.0

    try:
        import tempfile
        import os

        # Create a temporary file to capture output
        with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
            temp_path = f.name

        # Use forward slashes for R path (works on all platforms)
        r_temp_path = temp_path.replace('\\', '/')

        # Wrap the R code to capture all output including cat() statements
        wrapped_code = f"""
        con <- file("{r_temp_path}", open="wt")
        sink(con, type="output")
        sink(con, type="message")
        {code}
        sink(type="message")
        sink(type="output")
        close(con)
        """

        start_time = time.time()
        robjects.r(wrapped_code)
        end_time = time.time()

        execution_time = end_time - start_time

        # Read the captured output
        with open(temp_path, 'r') as f:
            output = f.read()

        # Clean up temp file
        try:
            os.unlink(temp_path)
        except:
            pass

        if not output.strip():
            output = "(No output produced)"

        return output, execution_time
    except Exception as e:
        import traceback
        return f"Error executing R code:\n{str(e)}\n\nTraceback:\n{traceback.format_exc()}", 0.0


In [None]:
# Runs the Python code and returns the output with a timing metric
def run_python_code(code):
    globals_dict = {
        "__builtins__": __builtins__,
        "np": None,  # Numpy will be imported by the code
        "pd": None,  # Pandas will be imported by the code
        "time": time # Time will be imported by the code
    }

    buffer = io.StringIO()
    old_stdout = sys.stdout
    sys.stdout = buffer

    try:
        start_time = time.time()
        exec(code, globals_dict)
        end_time = time.time()

        output = buffer.getvalue()
        execution_time = end_time - start_time

        return output, execution_time
    except Exception as e:
        return f"Error executing Python code:\n{str(e)}", 0.0
    finally:
        sys.stdout = old_stdout

## Comparison Function

In [None]:
# Runs both R and Python code and compares the output and timing metrics
def compare_execution(r_code, python_code):
    r_output, r_time = run_r_code(r_code)
    py_output, py_time = run_python_code(python_code)

    # Timing Comparison
    if r_time > 0 and py_time > 0:
        speedup = r_time / py_time
        speedup_text = f"{'='*50}\nPerformance Comparison:\n{'='*50}\n"                     # Nice header
        speedup_text += f"R execution time: {r_time:.6f} seconds\n"
        speedup_text += f"Python execution time: {py_time:.6f} seconds\n"
        speedup_text += f"Speedup: {speedup:.2f}x {'faster' if speedup > 1 else 'slower'}\n"
        speedup_text += f"{'='*50}"                                                             # Nice footer
    else:
        speedup_text = "\n\nTiming information not available."

    return r_output, py_output, speedup_text

# STEP 4: Gradio User Interface
Wrap everything in Gradio for publication



## Sample R DataFrame

In [None]:
sample_r_football_df = """
library(tibble)

df <- tibble::tribble(
  ~player,   ~team, ~week, ~attempts, ~passing_yards, ~pass_td, ~rush_td,
  "Stafford","LAR",   9,      27,          240,          2,        0,
  "Stafford","LAR",  10,      34,          310,          3,        0,
  "Stafford","LAR",  11,      31,          198,          1,        0,
  "Young",   "CAR",   9,      29,          179,          1,        0,
  "Young",   "CAR",  10,      33,          205,          1,        0,
  "Young",   "CAR",  11,      28,           NA,          0,        0,
  "Allen",   "BUF",  10,      38,          287,          2,        1,
  "Allen",   "BUF",  11,      35,          312,          3,        1
)

teams <- tibble::tribble(
  ~team, ~division,
  "LAR", "NFC West",
  "CAR", "NFC South",
  "BUF", "AFC East"
)

"""

## Sample R Code


In [None]:
sample_r_basic = sample_r_football_df + """
library(dplyr)

out <- df %>%
  select(player, team, yds = passing_yards)

print(out)
"""

sample_r_filter = sample_r_football_df + """
library(dplyr)

out <- df %>%
  filter(team == "LAR", week >= 10, passing_yards > 200)

print(out)
"""

sample_r_mutate = sample_r_football_df + """
library(dplyr)

out <- df %>%
  mutate(
    total_td = pass_td + rush_td,
    ypa = passing_yards / attempts
  ) %>%
  select(player, week, total_td, ypa)

print(out)
"""

sample_r_arrange = sample_r_football_df + """
library(dplyr)

out <- df %>%
  arrange(desc(passing_yards), week)

print(out)
"""

sample_r_groupby = sample_r_football_df + """
library(dplyr)

out <- df %>%
  group_by(team) %>%
  summarise(
    games = n(),
    avg_yds = mean(passing_yards),
    max_yds = max(passing_yards)
  ) %>%
  arrange(desc(avg_yds))

print(out)
"""

sample_r_window = sample_r_football_df + """
library(dplyr)

out <- df %>%
  group_by(team) %>%
  mutate(team_avg_yds = mean(passing_yards)) %>%
  ungroup() %>%
  select(player, team, week, passing_yards, team_avg_yds)

print(out)
"""

sample_r_distinct = sample_r_football_df + """
library(dplyr)

out <- df %>%
  distinct(team, player)

print(out)
"""

sample_r_summarize = sample_r_football_df + """
library(dplyr)

out <- df %>%
  group_by(team) %>%
  summarise(
    unique_players = n_distinct(player),
    avg_yds_no_na = mean(passing_yards, na.rm = TRUE)
  )

print(out)
"""

## Gradio App Function

In [None]:
with gr.Blocks(theme=gr.themes.Monochrome(), title="R to Python Converter") as ui:
    gr.Markdown("# Nikhil Gavini's R to Python Code Converter")
    gr.Markdown("### LLM converts R code to optimized Python and compares performance")
    gr.Markdown("Note: You should always verify the output instead of blindly trusting the results!")

    with gr.Row(equal_height=True):
        with gr.Column(scale=6):
            r_code = gr.Code(
                label="R Code (Original)",
                value=sample_r_basic,
                language="r",
                lines=20
            )
        with gr.Column(scale=6):
            python_code = gr.Code(
                label="Python Code (Generated)",
                value="",
                language="python",
                lines=20
            )

    with gr.Row(elem_classes=["controls"]):
        r_run_btn = gr.Button("Run R", elem_classes=["run-btn"])
        model_dropdown = gr.Dropdown(
            models,
            value=models[0] if models else None,
            label="Model",
            scale=2
        )
        convert_btn = gr.Button("Convert to Python", elem_classes=["convert-btn"], scale=2)
        py_run_btn = gr.Button("Run Python", elem_classes=["run-btn"])
        compare_btn = gr.Button("Compare Both", elem_classes=["run-btn"], scale=2)

    with gr.Row():
        with gr.Column(scale=4):
            gr.Markdown("### Example R Code Templates")
            with gr.Row():
                basic_btn = gr.Button("Basic Select", size="sm")
                filter_btn = gr.Button("Filter DataFrame", size="sm")
                mutate_btn = gr.Button("Mutate DataFrame", size="sm")
                arrange_btn = gr.Button("Arrange", size="sm")
                groupby_btn = gr.Button("Group By", size="sm")
                window_btn = gr.Button("Window", size="sm")
                distinct_btn = gr.Button("Distinct", size="sm")
                summarize_btn = gr.Button("Summarize", size="sm")

    with gr.Row(equal_height=True):
        with gr.Column(scale=6):
            r_output = gr.TextArea(
                label="R Output",
                lines=10,
                elem_classes=["r-out"]
            )
        with gr.Column(scale=6):
            py_output = gr.TextArea(
                label="Python Output",
                lines=10,
                elem_classes=["py-out"]
            )

    with gr.Row():
        comparison_output = gr.TextArea(
            label="Performance Comparison",
            lines=6,
            elem_classes=["py-out"]
        )

    # Event handlers
    convert_btn.click(
        fn=convert_r_to_python,
        inputs=[model_dropdown, r_code],
        outputs=[python_code]
    )

    r_run_btn.click(
        fn=lambda code: run_r_code(code)[0],
        inputs=[r_code],
        outputs=[r_output]
    )

    py_run_btn.click(
        fn=lambda code: run_python_code(code)[0],
        inputs=[python_code],
        outputs=[py_output]
    )

    compare_btn.click(
        fn=compare_execution,
        inputs=[r_code, python_code],
        outputs=[r_output, py_output, comparison_output]
    )

    # Example buttons
    basic_btn.click(lambda: sample_r_basic, outputs=[r_code])
    filter_btn.click(lambda: sample_r_filter, outputs=[r_code])
    mutate_btn.click(lambda: sample_r_mutate, outputs=[r_code])
    arrange_btn.click(lambda: sample_r_arrange, outputs=[r_code])
    groupby_btn.click(lambda: sample_r_groupby, outputs=[r_code])
    window_btn.click(lambda: sample_r_window, outputs=[r_code])
    distinct_btn.click(lambda: sample_r_distinct, outputs=[r_code])
    summarize_btn.click(lambda: sample_r_summarize, outputs=[r_code])

ui.launch(share = False, inbrowser=True)

# Step 5: Testing Results

Ran the comparison 3 times and took the average of each

**Verdict**: openai/gpt-oss-120b consistently yielded the best Speedup Factor

**Test Case**: Basic Select

- openai/gpt-oss-120b Speedup Factor: 6.61
- gpt-5-nano Speedup Factor: 6.50

**Test Case**: Filter

- openai/gpt-oss-120b Speedup Factor: 10.44
- gpt-5-nano Speedup Factor: 8.58

**Test Case**: Mutate

- openai/gpt-oss-120b Speedup Factor: 10.35
- gpt-5-nano Speedup Factor: 7.39

**Test Case**: Arrange

- openai/gpt-oss-120b Speedup Factor: 10.86
- gpt-5-nano Speedup Factor: 4.04

**Test Case**: Group By

- openai/gpt-oss-120b Speedup Factor: 4.10
- gpt-5-nano Speedup Factor: 3.31

**Test Case**: Window

- openai/gpt-oss-120b Speedup Factor: 5.92
- gpt-5-nano Speedup Factor: 5.69

**Test Case**: Distinct

- openai/gpt-oss-120b Speedup Factor: 5.75
- gpt-5-nano Speedup Factor: 4.05

**Test Case**: Summarize

- openai/gpt-oss-120b Speedup Factor: 3.24
- gpt-5-nano Speedup Factor: 2.33