# Step 1 - Get and Prepare Input Datasets



## Loading MCEval-Instruct Dataset

In [2]:
import pandas as pd

Mceval = pd.read_json("hf://datasets/Multilingual-Multimodal-NLP/McEval-Instruct/McEval-Instruct.json")

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
Mceval.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35943 entries, 0 to 35942
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   instruction  35943 non-null  object
 1   output       35943 non-null  object
 2   language     35943 non-null  object
 3   source       35943 non-null  object
dtypes: object(4)
memory usage: 1.1+ MB


In [4]:
Mceval.head()

Unnamed: 0,instruction,output,language,source
0,\n\nWrite a Fortran subroutine that applies a ...,\n\n```fortran\n! Import necessary modules\nim...,FORTRAN,McEval-Instruct
1,\nDesign a function that finds the shortest pa...,\n```javascript\nclass Maze {\n constructor(\...,JavaScript,McEval-Instruct
2,\n\nCreate a React component called `DeviceDet...,\n\n```jsx\nimport React from 'react';\nimport...,JavaScript,McEval-Instruct
3,\n\nCreate a React component named `EditAvaila...,"\n\n```jsx\nimport React, { useState } from 'r...",JavaScript,McEval-Instruct
4,\n\nCreate a JavaScript class named `ItemsMana...,\n\n```javascript\nimport axios from 'axios';\...,JavaScript,McEval-Instruct


In [5]:
python_mceval = Mceval[Mceval['language'] == 'Python']
display(python_mceval.head())

Unnamed: 0,instruction,output,language,source
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct


In [6]:
python_mceval.to_csv("../../data/raw//McEval_Instruct.csv")

## Get the Test Dataset for constraint generation from github

In [7]:
!wget https://raw.github.ibm.com/shagutt1/sravani-internship-2025/refs/heads/main/data/outputs/constraint_category_data.csv?token=GHSAT0AAAAAAACSCL5HQAVP6M7WIKJYEZSC2C5M6CA -O "constraint_category_initial_data.csv"

zsh:1: no matches found: https://raw.github.ibm.com/shagutt1/sravani-internship-2025/refs/heads/main/data/outputs/constraint_category_data.csv?token=GHSAT0AAAAAAACSCL5HQAVP6M7WIKJYEZSC2C5M6CA


In [8]:
df = pd.read_csv("../../data/raw/constraint_category.csv")
df.head()

Unnamed: 0,dataset,instruction,code,test,Characteristics,constraints
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,,1) Logic is modularized using calculate_distan...,['Implement the distance calculation in a sepa...
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...",,1) Embed the HTML tag using an f-string to dyn...,"[""Provide a concise solution using Python's st..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,,1) The function factorial is implemented recur...,['The program must use recursion to calculate ...
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,,"The class defines six attributes: species, nam...","['Include detailed docstrings for each method,..."
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...",,The function is named decryptJokes and accepts...,['Provide a complete Python function implement...


In [9]:
df = df[["dataset","instruction","code"]]
df.head()

Unnamed: 0,dataset,instruction,code
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t..."


## Decide Dataset Sampling Ratios Based on Real-World Complexity
- Skipping for now

## Write transformation scripts for each dataset format

## Create a benchmark dataset of 1000–1200 examples



# Step 2 - Define Constraint Categories and clean it

In [10]:
categories = [
    "Code Structure and Modularity",
    "Input and Output Handling",
    "Error Handling and Robustness",
    "Data Processing and Transformation",
    "Performance and Optimization",
    "Library and API Usage",
    "Testing and Debugging",
    "Documentation and Readability",
    "Security and Privacy",
    "Reproducibility and Consistency",
    "Mathematical Computation",
    "File and Data Management",
    "UI and Interaction",
]

# Step 3 Generating Constraints

In [11]:
import pandas as pd
import time
from openai import OpenAI
import os


openai_api_key = os.getenv("OPENAI_API_KEY")  # Replace with your actual key
client = OpenAI(api_key=openai_api_key)


In [12]:
SYSTEM_PROMPT = f"""
You are a helpful assistant. You will be given a programming instruction and the corresponding code.
"""

In [13]:
def get_response(user_prompt,max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.3
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Error on attempt {attempt+1}: {e}")
            time.sleep(2)
    return "[]"

## Map Categories to the Instructions

In [14]:


def get_relevant_categories(instruction, code, max_retries=3):
    user_prompt = f"""You are a coding assistant designed to classify natural language code generation instructions into appropriate high-level constraint categories.

You must choose all applicable categories from the following 13 supercategories based on the instruction and the type of code that is expected to be written:

1. Code Structure and Modularity
2. Input and Output Handling
3. Error Handling and Robustness
4. Data Processing and Transformation
5. Performance and Optimization
6. Library and API Usage
7. Testing and Debugging
8. Documentation and Readability
9. Security and Privacy
10. Reproducibility and Consistency
11. Mathematical Computation
12. File and Data Management
13. UI and Interaction

Task:
Given the instruction, select all relevant supercategories that:
- Help define the expected constraints.
- Reflect either the code behavior or user expectation.
- Include at least 2 relevant categories per instruction.
- Understand the domain of the problem and if any code looks like might need privacy and security considerations include that category as well


Output Format:
Respond with only a list of strings as shown in the example output.
Each string should be the exact name of the matching supercategory.

Example Output:
["Mathematical Computation", "Input and Output Handling", "Documentation and Readability"]

Instruction:\n
{instruction}

Code:\n
{code}

Relevant categories:"""

    categories = get_response(user_prompt,max_retries)
    return categories






In [15]:
get_relevant_categories(df.iloc[0]["instruction"],df.iloc[0]["code"])

'["Mathematical Computation", "Input and Output Handling", "Documentation and Readability"]'

In [16]:
# get_relevant_categories(test_mceval.iloc[0]["instruction"],test_mceval.iloc[0]["code"])

In [17]:
# Run on first 20 rows (you can change this limit)
def map_categories(df,output_pth,input_col1,input_col2,output_col):
    results = []
    for i, row in df.iterrows():
        print(f"Processing row {i}")
        categories = get_relevant_categories(row[input_col1], row[input_col2])
        results.append(categories)

    df[output_col] = results
    df.to_csv(output_pth, index=False)
    print("Saved the file successfully")

In [18]:
map_categories(df,"step3_with_relevant_categories.csv","instruction","code","relevant_categories")

Processing row 0
Processing row 1
Processing row 2
Processing row 3
Processing row 4
Processing row 5
Processing row 6
Processing row 7
Processing row 8
Processing row 9
Processing row 10
Processing row 11
Processing row 12
Processing row 13
Processing row 14
Processing row 15
Processing row 16
Processing row 17
Processing row 18
Processing row 19
Processing row 20
Processing row 21
Processing row 22
Processing row 23
Processing row 24
Processing row 25
Processing row 26
Processing row 27
Processing row 28
Processing row 29
Processing row 30
Processing row 31
Processing row 32
Processing row 33
Processing row 34
Processing row 35
Processing row 36
Processing row 37
Processing row 38
Processing row 39
Processing row 40
Processing row 41
Processing row 42
Processing row 43
Processing row 44
Processing row 45
Processing row 46
Processing row 47
Processing row 48
Processing row 49
Processing row 50
Processing row 51
Processing row 52
Processing row 53
Processing row 54
Processing row 55
Pr

In [19]:
df.head(59)

Unnamed: 0,dataset,instruction,code,relevant_categories
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,"[""Mathematical Computation"", ""Input and Output..."
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...","[""Code Structure and Modularity"", ""Input and O..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,"[""Error Handling and Robustness"", ""Mathematica..."
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,"[""Code Structure and Modularity"", ""Documentati..."
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...","[""Data Processing and Transformation"", ""Input ..."
5,ajibawa-2023/Python-Code-23k-ShareGPT,What is the purpose of the 'enumerate' keyword...,The purpose of the 'enumerate' keyword in Pyth...,"[""Code Structure and Modularity"", ""Error Handl..."
6,ajibawa-2023/Python-Code-23k-ShareGPT,Can you provide an example of mutation testing...,"Sure, here's an example of mutation testing in...","[""Testing and Debugging"", ""Code Structure and ..."
7,ajibawa-2023/Python-Code-23k-ShareGPT,Edit the given code to fix the bug and increas...,```python\ndef calculate_tax(price):\n tax_...,"[""Mathematical Computation"", ""Error Handling a..."
8,ajibawa-2023/Python-Code-23k-ShareGPT,Generate a random password that meets the foll...,Here is a Python function that generates a ran...,"[""Input and Output Handling"", ""Error Handling ..."
9,ajibawa-2023/Python-Code-23k-ShareGPT,Create a program to combine the elements of tw...,Here's a Python program that combines the elem...,"[""Code Structure and Modularity"", ""Data Proces..."


In [20]:
from collections import Counter
import ast

# Convert the string representation of lists to actual lists
df['relevant_categories'] = df['relevant_categories'].apply(ast.literal_eval)

# Flatten the list of lists and count occurrences
all_categories = [category for sublist in df['relevant_categories'] for category in sublist]
category_counts = Counter(all_categories)

# Create a DataFrame from the counts
category_df = pd.DataFrame.from_dict(category_counts, orient='index', columns=['Count'])

# Sort the categories by count
category_df = category_df.sort_values(by='Count', ascending=False)

display(category_df)

Unnamed: 0,Count
Input and Output Handling,36
Documentation and Readability,33
Code Structure and Modularity,29
Data Processing and Transformation,28
Error Handling and Robustness,21
Library and API Usage,12
Testing and Debugging,8
Mathematical Computation,7
Security and Privacy,7
UI and Interaction,6


## Prompt to generate the constraints

In [21]:
def get_prompt(instruction,code,relevant_categories):
    prompt = f""" "task": "You are an expert in generating constraints for code generation tasks by analyzing a natural language instruction and the associated code. Your objective is to identify the relevant characteristics of the task based on the instruction and code, and then generate meaningful constraints that guide a model to correctly implement or reason about the task in line with predefined constraint categories.",

    "context": "You are provided with three inputs: (1) a natural language `instruction` describing a coding task, (2) the actual `code` implementing or partially implementing that instruction, and (3) a list of `relevant_constraint_categories` which represent different high-level aspects of software development such as performance, error handling, modularity, etc. that are relevant to the instruction.“,

    "goal": "Your job is to examine the instruction, code and relevant constraint categories, and generate 8 to 10 natural language constraints that align with the relevant categories. These constraints should help an LLM generate code that is correct, complete, robust, and aligned with good development practices. The constraints should be actionable, precise, and reflective of both the explicit and implicit requirements inferred from the instruction and code.",

    "JSON Response Format": {{
        "Analysis on Characteristics": "Briefly describe the nature of the task and what kind of constraints are necessary to ensure proper execution. For example, should the solution handle edge cases? Use efficient data structures? Log errors? Include testability or modularity considerations?",
        "Constraints": [
        "List 8 to 10 detailed and well-formed constraints that are aligned with the relevant categories. Each constraint should be phrased clearly in natural language and should guide the model to include or avoid specific design decisions, implementation patterns, or output behaviors."
        ]
    }},

    "Inputs Required": {{
        "instruction": {instruction}
        "code": {code}
        "relevant_constraint_categories": {relevant_categories}

    }} """
    return prompt

In [22]:
prompt = get_prompt(df.iloc[0]["instruction"],df.iloc[0]["code"],df.iloc[0]["relevant_categories"])
print(prompt)

 "task": "You are an expert in generating constraints for code generation tasks by analyzing a natural language instruction and the associated code. Your objective is to identify the relevant characteristics of the task based on the instruction and code, and then generate meaningful constraints that guide a model to correctly implement or reason about the task in line with predefined constraint categories.",

    "context": "You are provided with three inputs: (1) a natural language `instruction` describing a coding task, (2) the actual `code` implementing or partially implementing that instruction, and (3) a list of `relevant_constraint_categories` which represent different high-level aspects of software development such as performance, error handling, modularity, etc. that are relevant to the instruction.“,

    "goal": "Your job is to examine the instruction, code and relevant constraint categories, and generate 8 to 10 natural language constraints that align with the relevant categ

In [None]:
instruction = df.iloc[0]["instruction"]
print(instruction)
response = get_response(prompt)
print(response)

Calculate the distance between two points located in the 3-dimensional space. The points are represented as tuples of three integers. The distance should be rounded to two decimal places. Additionally, the program should also display the angle between the vector components of the resulting distance and the x-axis in degrees, rounded to two decimal places.

Example Input:
Point 1: (3, 4, 5)
Point 2: (2, 6, -1)

Example Output:
Distance: 2.45
Vector components: (1, 2, -6)
Angle with x-axis: 32.94 degrees


In [None]:
def generate_constraints(df,output_pth,input_col1,input_col2,input_col3,output_col):
    results = []
    for i, row in df.iterrows():
        print(f"Processing row {i}")
        prompt = get_prompt(row[input_col1], row[input_col2], row[input_col3])
        constraints = get_response(prompt)
        results.append(constraints)

    df[output_col] = results
    df.to_csv(output_pth, index=False)
    print("Saved the file successfully")
    return df

In [None]:
test_df = df.head(20).copy()
test_df = generate_constraints(test_df,"step4_with_constraints.csv","instruction","code","relevant_categories","constraints")
test_df.head()

Processing row 0
Processing row 1
Processing row 2
Processing row 3
Processing row 4
Processing row 5
Processing row 6
Processing row 7
Processing row 8
Processing row 9
Processing row 10
Processing row 11
Processing row 12
Processing row 13
Processing row 14
Processing row 15
Processing row 16
Processing row 17
Processing row 18
Processing row 19
Saved the file successfully


Unnamed: 0,dataset,instruction,code,relevant_categories,constraints
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,"[Mathematical Computation, Input and Output Ha...","{\n ""Analysis on Characteristics"": ""The tas..."
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...","[Code Structure and Modularity, Input and Outp...","{\n ""Analysis on Characteristics"": ""The tas..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,"[Error Handling and Robustness, Mathematical C...","{\n ""Analysis on Characteristics"": ""The tas..."
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,"[Code Structure and Modularity, Documentation ...","{\n ""Analysis on Characteristics"": ""The tas..."
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...","[Code Structure and Modularity, Input and Outp...","{\n ""Analysis on Characteristics"": ""The tas..."


In [None]:
import json

def extract_constraints(constraint_string):
    try:
        json_string = constraint_string.strip().replace('```json\n', '', 1).replace('\n```', '', 1)
        constraint_json = json.loads(json_string)
        # print(constraint_json.get("Constraints", []))
        return constraint_json.get("Constraints", [])
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON: {e} in string: {constraint_string}")
        return []
    except AttributeError as e:
        print(f"Attribute error: {e} in string: {constraint_string}")
        return []



In [None]:
extract_constraints(test_df.iloc[0]["constraints"])

['Ensure that the distance calculation correctly handles cases where the two points are identical, preventing division by zero in the angle calculation.',
 'Round the distance and angle results to two decimal places as specified in the instruction, ensuring consistent output formatting.',
 'Include error handling to manage invalid input types, such as non-integer values or tuples of incorrect length.',
 'Document each function with clear docstrings explaining the parameters, return values, and any exceptions that may be raised.',
 'Use descriptive variable names to enhance code readability and maintainability, avoiding single-letter variable names.',
 'Implement input validation to ensure that the points provided are tuples of three integers before performing calculations.',
 'Clearly separate the calculation logic from the output logic to enhance modularity and testability of the code.',
 'Provide example usage in comments to demonstrate how the functions can be called with sample inp

In [None]:
test_df['extracted_constraints'] = test_df['constraints'].apply(extract_constraints)
display(test_df.head())

Unnamed: 0,dataset,instruction,code,relevant_categories,constraints,extracted_constraints
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,"[Mathematical Computation, Input and Output Ha...","{\n ""Analysis on Characteristics"": ""The tas...",[Ensure that the distance calculation correctl...
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...","[Code Structure and Modularity, Input and Outp...","{\n ""Analysis on Characteristics"": ""The tas...",[The code should be structured in a modular wa...
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,"[Error Handling and Robustness, Mathematical C...","{\n ""Analysis on Characteristics"": ""The tas...",[The program must validate the input to ensure...
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,"[Code Structure and Modularity, Documentation ...","{\n ""Analysis on Characteristics"": ""The tas...",[Ensure that all attributes of the Animal clas...
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...","[Code Structure and Modularity, Input and Outp...","{\n ""Analysis on Characteristics"": ""The tas...","[The function should be modular, with separate..."


# Step 4 - Compare New Constraints with Previous Ones for Quality Improvement
- Skipping for now

# Step 6 - Generate constraints for each row in the benchmark dataset.


In [None]:
python_mceval = python_mceval.rename(columns={'instruction': 'instruction', 'output': 'code'})
display(python_mceval.head())

Unnamed: 0,instruction,code,language,source
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct


In [None]:
python_mceval.info()

<class 'pandas.core.frame.DataFrame'>
Index: 927 entries, 1610 to 8664
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   instruction  927 non-null    object
 1   code         927 non-null    object
 2   language     927 non-null    object
 3   source       927 non-null    object
dtypes: object(4)
memory usage: 36.2+ KB


In [None]:
test_mceval = python_mceval.head(20).copy()
map_categories(test_mceval,"Mceval_relevant_categories.csv","instruction","code","relevant_categories")
test_mceval.info()

Processing row 1610
Processing row 1611
Processing row 1612
Processing row 1613
Processing row 1614
Processing row 1615
Processing row 1616
Processing row 1617
Processing row 1618
Processing row 1619
Processing row 1620
Processing row 1621
Processing row 1622
Processing row 1623
Processing row 1624
Processing row 1625
Processing row 1626
Processing row 1627
Processing row 1628
Processing row 1629
Saved the file successfully
<class 'pandas.core.frame.DataFrame'>
Index: 20 entries, 1610 to 1629
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   instruction          20 non-null     object
 1   code                 20 non-null     object
 2   language             20 non-null     object
 3   source               20 non-null     object
 4   relevant_categories  20 non-null     object
dtypes: object(5)
memory usage: 960.0+ bytes


In [None]:
get_relevant_categories(test_mceval.iloc[0]["instruction"],test_mceval.iloc[0]["code"])

'["Code Structure and Modularity", "Input and Output Handling", "Performance and Optimization", "Error Handling and Robustness", "Documentation and Readability"]'

In [None]:
test_mceval.head()

Unnamed: 0,instruction,code,language,source,relevant_categories
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O..."
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Library and..."
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct,"[""UI and Interaction"", ""Library and API Usage""..."
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O..."
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O..."


In [None]:
test_mceval = generate_constraints(test_mceval,"Mceval_constraints.csv","instruction","code","relevant_categories","constraints")
test_mceval.info()

Processing row 1610
Processing row 1611
Processing row 1612
Processing row 1613
Processing row 1614
Processing row 1615
Processing row 1616
Processing row 1617
Processing row 1618
Processing row 1619
Processing row 1620
Processing row 1621
Processing row 1622
Processing row 1623
Processing row 1624
Processing row 1625
Processing row 1626
Processing row 1627
Processing row 1628
Processing row 1629
Saved the file successfully
<class 'pandas.core.frame.DataFrame'>
Index: 20 entries, 1610 to 1629
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   instruction          20 non-null     object
 1   code                 20 non-null     object
 2   language             20 non-null     object
 3   source               20 non-null     object
 4   relevant_categories  20 non-null     object
 5   constraints          20 non-null     object
dtypes: object(6)
memory usage: 1.1+ KB


In [None]:
test_mceval["extract_constraints"] = test_mceval["constraints"].apply(extract_constraints)
test_mceval.head()

Unnamed: 0,instruction,code,language,source,relevant_categories,constraints,extract_constraints
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The function must check for CUDA availability...
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Library and...","```json\n{\n ""Analysis on Characteristics"":...",[Ensure that the class methods are modular and...
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct,"[""UI and Interaction"", ""Library and API Usage""...","{\n ""Analysis on Characteristics"": ""The tas...","[Ensure that the function handles edge cases, ..."
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The API should be designed with a clear separ...
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","{\n ""Analysis on Characteristics"": ""The tas...","[The function should be modular, allowing for ..."


In [None]:
test_mceval.to_csv("../../data/raw/Mceval_constraints.csv",index=False)

# Step 7 - Validate the generated constraints.


# Step 8 - Generate code from Instruction and Generated Constraints

In [None]:
def get_codegeneration_prompt(instruction,constraints):
    prompt = f""" You are a skilled Python programmer. Based on the following natural language instruction and a set of implementation constraints, generate Python code that satisfies the instruction and fully adheres to all constraints.

    ### Instruction:
    {instruction}

    ### Constraints:
    {constraints}

    ### Requirements:
    - Ensure the code is clean, correct, and follows Python best practices.
    - Strictly follow all the constraints, even if they are not explicitly stated in the instruction.
    - Do not include any explanatory text; return only the code block.

    ### Output Format:
    Return a single Python code block that solves the task.

 
    # Your code here


    """
    return prompt


print(get_codegeneration_prompt(test_mceval.iloc[0,0],test_mceval.iloc[0,6]))

 You are a skilled Python programmer. Based on the following natural language instruction and a set of implementation constraints, generate Python code that satisfies the instruction and fully adheres to all constraints.

    ### Instruction:
    

In the context of a medical imaging analysis application, you are tasked with developing a CycleGAN-based model for domain adaptation between two different types of medical images (e.g., MRI and CT scans). The CycleGAN consists of two generators and two discriminators. The generators are responsible for translating images from one domain to another and vice versa, while the discriminators aim to distinguish between real and generated images.

The given code snippet provides a setup for initializing the CycleGAN components using PyTorch, including the generators (`G_AB` and `G_BA`), discriminators (`D_A` and `D_B`), and a Graph Neural Network (`model_gnn`) for feature extraction. Additionally, loss functions and optimizers are defined for tra

In [None]:
prompt = get_codegeneration_prompt(test_mceval.iloc[0,0],test_mceval.iloc[0,6])
code = get_response(prompt)
print(code)

```python
import torch
import torch.nn as nn
import torch.optim as optim
from typing import Dict

class Options:
    def __init__(self, lr: float, beta1: float, beta2: float, lambda_idt: float, lambda_cycle: float):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.lambda_idt = lambda_idt
        self.lambda_cycle = lambda_cycle

class Generator(nn.Module):
    # Placeholder for the generator model
    def __init__(self):
        super(Generator, self).__init__()
        # Define layers here

    def forward(self, x):
        # Define forward pass
        return x

class Discriminator(nn.Module):
    # Placeholder for the discriminator model
    def __init__(self):
        super(Discriminator, self).__init__()
        # Define layers here

    def forward(self, x):
        # Define forward pass
        return x

class GraphNN(nn.Module):
    # Placeholder for the Graph Neural Network model
    def __init__(self):
        super(GraphNN, self).__init

In [None]:
def generate_code(row,input_col1="instruction",input_col2="extract_constraints"):
    print(row[input_col1][:10])
    prompt = get_codegeneration_prompt(row[input_col1],row[input_col2])
    code = get_response(prompt)
    
    return code 
generate_code(test_mceval.iloc[0])



In the c


'```python\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\nfrom typing import Dict\n\n# Assuming the existence of the following classes based on the context\n# class Generator(nn.Module): ...\n# class Discriminator(nn.Module): ...\n# class GraphNN(nn.Module): ...\n\ndef initialize_cyclegan_components(options: \'Options\') -> Dict[str, nn.Module]:\n    """\n    Initializes the CycleGAN components including generators, discriminators, and optimizers.\n\n    Args:\n        options (Options): An object containing hyperparameters for the model components.\n\n    Returns:\n        Dict[str, nn.Module]: A dictionary containing initialized components of the CycleGAN.\n    """\n    # Check for CUDA availability\n    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")\n\n    try:\n        # Validate options\n        if not hasattr(options, \'num_channels\') or not hasattr(options, \'lr\') or not hasattr(options, \'beta1\'):\n            raise ValueError("Opti

In [None]:
from tqdm import tqdm
import pandas as pd

# Enable tqdm for pandas apply
tqdm.pandas()
test_mceval["generated_code"] = test_mceval.apply(generate_code,axis=1)



In the c


Design a


Create a

Design a 


Write a 


Write a 


Write a 


Design a


Create a

Write a P


Implemen


Write a 


Write a 


You are 

Design a 


Create a


Given a 


Design a


Design a


Write a 


In [None]:
test_mceval.head()

Unnamed: 0,instruction,code,language,source,relevant_categories,constraints,extract_constraints,generated_code
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The function must check for CUDA availability...,```python\nimport torch\nimport torch.nn as nn...
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Library and...","```json\n{\n ""Analysis on Characteristics"":...",[Ensure that the class methods are modular and...,```python\nimport sqlite3\nimport pandas as pd...
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct,"[""UI and Interaction"", ""Library and API Usage""...","{\n ""Analysis on Characteristics"": ""The tas...","[Ensure that the function handles edge cases, ...",```python\nimport logging\nfrom ephyviewer imp...
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The API should be designed with a clear separ...,"```python\nfrom flask import Flask, request, j..."
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","{\n ""Analysis on Characteristics"": ""The tas...","[The function should be modular, allowing for ...",```python\nfrom typing import List\n\ndef allo...


In [None]:
test_mceval.to_csv("../../data/processed/Mceval_generated_code.csv",index=False)

In [None]:
def generate_code_without_constraints(row,input_col1="instruction"):
    print(row[input_col1][:10])
    prompt = f"""You are a skilled Python programmer. Based on the following natural language instruction, generate Python code that satisfies the instruction.

    ### Instruction:
    {row[input_col1]}

    ### Requirements:
    - Ensure the code is clean, correct, and follows Python best practices.
    - Do not include any explanatory text; return only the code block.

    ### Output Format:
    Return a single Python code block that solves the task.

 
    # Your code here


    """
    code = get_response(prompt)
    
    return code

code1= generate_code_without_constraints(test_mceval.iloc[0])



In the c


In [None]:
print(code1)

```python
import torch
import torch.nn as nn
import torch.optim as optim

def initialize_cyclegan_components(options):
    # Initialize generators
    G_AB = Generator(options)
    G_BA = Generator(options)
    
    # Initialize discriminators
    D_A = Discriminator(options)
    D_B = Discriminator(options)
    
    # Initialize Graph Neural Network for feature extraction
    model_gnn = GraphNN(options)
    
    # Initialize loss functions
    criterionIdt = nn.L1Loss()
    criterionCycle = nn.L1Loss()
    criterionGEN = nn.MSELoss()
    
    # Initialize optimizers
    optimizer_G = optim.Adam(itertools.chain(G_AB.parameters(), G_BA.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_D = optim.Adam(itertools.chain(D_A.parameters(), D_B.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_M = optim.Adam(model_gnn.parameters(), lr=options.lr, betas=(options.beta1, 0.999))
    
    # Move components to GPU if available
    device = torch.devic

In [None]:
code2 = generate_code(test_mceval.iloc[0])
print(code2)



In the c
```python
import torch
import torch.nn as nn
import torch.optim as optim
from typing import Dict

class Options:
    def __init__(self, lr: float, beta1: float, beta2: float, lambda_cycle: float, lambda_identity: float):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.lambda_cycle = lambda_cycle
        self.lambda_identity = lambda_identity

# Example generator and discriminator classes
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        # Define generator layers here

    def forward(self, x):
        # Define forward pass
        return x

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        # Define discriminator layers here

    def forward(self, x):
        # Define forward pass
        return x

def initialize_cyclegan_components(options: Options) -> Dict[str, nn.Module]:
    # Check for CUDA availability
    device = torch

In [None]:
from IPython.display import Markdown, display
def md(text):
    display(Markdown(text))
md(code1)


```python
import torch
import torch.nn as nn
import torch.optim as optim

def initialize_cyclegan_components(options):
    # Initialize generators
    G_AB = Generator(options)
    G_BA = Generator(options)
    
    # Initialize discriminators
    D_A = Discriminator(options)
    D_B = Discriminator(options)
    
    # Initialize Graph Neural Network for feature extraction
    model_gnn = GraphNN(options)
    
    # Initialize loss functions
    criterionIdt = nn.L1Loss()
    criterionCycle = nn.L1Loss()
    criterionGEN = nn.MSELoss()
    
    # Initialize optimizers
    optimizer_G = optim.Adam(itertools.chain(G_AB.parameters(), G_BA.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_D = optim.Adam(itertools.chain(D_A.parameters(), D_B.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_M = optim.Adam(model_gnn.parameters(), lr=options.lr, betas=(options.beta1, 0.999))
    
    # Move components to GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    G_AB.to(device)
    G_BA.to(device)
    D_A.to(device)
    D_B.to(device)
    model_gnn.to(device)
    criterionIdt.to(device)
    criterionCycle.to(device)
    criterionGEN.to(device)
    
    return {
        'G_AB': G_AB,
        'G_BA': G_BA,
        'D_A': D_A,
        'D_B': D_B,
        'model_gnn': model_gnn,
        'criterionIdt': criterionIdt,
        'criterionCycle': criterionCycle,
        'criterionGEN': criterionGEN,
        'optimizer_G': optimizer_G,
        'optimizer_D': optimizer_D,
        'optimizer_M': optimizer_M
    }
```

In [None]:
md(code2)

```python
import torch
import torch.nn as nn
import torch.optim as optim
from typing import Dict

class Options:
    def __init__(self, lr: float, beta1: float, beta2: float, lambda_cycle: float, lambda_identity: float):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.lambda_cycle = lambda_cycle
        self.lambda_identity = lambda_identity

# Example generator and discriminator classes
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        # Define generator layers here

    def forward(self, x):
        # Define forward pass
        return x

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        # Define discriminator layers here

    def forward(self, x):
        # Define forward pass
        return x

def initialize_cyclegan_components(options: Options) -> Dict[str, nn.Module]:
    # Check for CUDA availability
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    try:
        # Initialize generators
        G_AB = nn.DataParallel(Generator()).to(device)  # Generator A to B
        G_BA = nn.DataParallel(Generator()).to(device)  # Generator B to A
        
        # Initialize discriminators
        D_A = nn.DataParallel(Discriminator()).to(device)  # Discriminator for domain A
        D_B = nn.DataParallel(Discriminator()).to(device)  # Discriminator for domain B
        
        # Initialize loss functions
        criterionIdt = nn.L1Loss().to(device)  # Identity loss
        criterionCycle = nn.L1Loss().to(device)  # Cycle consistency loss
        criterionGEN = nn.MSELoss().to(device)  # Generative loss
        
        # Initialize optimizers
        optimizer_G = optim.Adam(itertools.chain(G_AB.parameters(), G_BA.parameters()), 
                                  lr=options.lr, betas=(options.beta1, options.beta2))
        optimizer_D = optim.Adam(itertools.chain(D_A.parameters(), D_B.parameters()), 
                                  lr=options.lr, betas=(options.beta1, options.beta2))
        optimizer_M = optim.Adam(G_AB.parameters(), lr=options.lr, betas=(options.beta1, options.beta2))  # Example optimizer for G_AB
        
        # Return all components in a dictionary
        return {
            'G_AB': G_AB,
            'G_BA': G_BA,
            'D_A': D_A,
            'D_B': D_B,
            'model_gnn': None,  # Placeholder for Graph Neural Network
            'criterionIdt': criterionIdt,
            'criterionCycle': criterionCycle,
            'criterionGEN': criterionGEN,
            'optimizer_G': optimizer_G,
            'optimizer_D': optimizer_D,
            'optimizer_M': optimizer_M
        }
    
    except Exception as e:
        print(f"Error during initialization: {e}")
        raise

# Unit tests can be added here to verify the correctness of the initialization function
```

In [None]:
from tqdm import tqdm
import pandas as pd

# Enable tqdm for pandas apply
tqdm.pandas()
test_mceval["generated_code_without_constraints"] = test_mceval.progress_apply(generate_code, axis=1)

NameError: name 'test_mceval' is not defined

# Step 9 - Evaluate Generated Code and Calculate Metrics

In [None]:
import difflib

code1_lines = code1.strip().splitlines()
code2_lines = code2.strip().splitlines()

# Get the diff
diff = difflib.unified_diff(code1_lines, code2_lines, fromfile='code1.py', tofile='code2.py', lineterm='')

# Print the diff
print("\n".join(diff))


--- code1.py
+++ code2.py
@@ -2,51 +2,78 @@
 import torch
 import torch.nn as nn
 import torch.optim as optim
+from typing import Dict
 
-def initialize_cyclegan_components(options):
-    # Initialize generators
-    G_AB = Generator(options)
-    G_BA = Generator(options)
+class Options:
+    def __init__(self, lr: float, beta1: float, beta2: float, lambda_cycle: float, lambda_identity: float):
+        self.lr = lr
+        self.beta1 = beta1
+        self.beta2 = beta2
+        self.lambda_cycle = lambda_cycle
+        self.lambda_identity = lambda_identity
+
+# Example generator and discriminator classes
+class Generator(nn.Module):
+    def __init__(self):
+        super(Generator, self).__init__()
+        # Define generator layers here
+
+    def forward(self, x):
+        # Define forward pass
+        return x
+
+class Discriminator(nn.Module):
+    def __init__(self):
+        super(Discriminator, self).__init__()
+        # Define discriminator layers here
+
+    def forward

# Mceval-instruct evaluation

In [1]:
!pip install datasets


You should consider upgrading via the '/dccstor/shanmukh/sravani_internship/CIF_Benchmark/internship/bin/python3 -m pip install --upgrade pip' command.[0m


In [13]:
from datasets import load_dataset
import tqdm as tqdm

dataset = load_dataset("Multilingual-Multimodal-NLP/McEval-Instruct")
dataset = dataset['train']
dataset = dataset.rename_columns({
    'instruction': 'instruction',
    'output': 'code',
    'language': 'language'
})
py_dataset = dataset.filter(lambda x: x['language'] == 'Python')
py_dataset = py_dataset.rename_columns({
    'instruction': 'instruction',
    'code': 'code'
})
py_dataset = py_dataset.select_columns(['instruction', 'code'])
py_dataset = py_dataset.to_pandas()
py_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 927 entries, 0 to 926
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   instruction  927 non-null    object
 1   code         927 non-null    object
dtypes: object(2)
memory usage: 14.6+ KB


In [19]:
import openai
import pandas as pd
from datasets import load_dataset
import os
from dotenv import load_dotenv
load_dotenv()  
from openai import OpenAI
from utils.openai_utils import get_response
api_key = os.getenv("OPENAI_API_KEY") 
client =  OpenAI(api_key=api_key)


dataset = load_dataset("Multilingual-Multimodal-NLP/McEval-Instruct", split="train")
python_data = dataset.filter(lambda x: x['language'] == 'Python')
df = pd.DataFrame(python_data)

# Prompt template
def build_prompt(instruction):
    return f"""You are an expert at understanding programming tasks. Given a user instruction, decide if it is asking for code generation (i.e., writing or implementing code). Respond only with "True" if it's a code generation task, or "False" if it's about code explanation, analysis, or purpose.

Instruction: {instruction}

Is this a code generation task? Respond with only "True" or "False"."""

# Query OpenAI API for classification
def is_code_generation(instruction):
    prompt = build_prompt(instruction)
    model = "gpt-4o-mini"  # or "gpt-4o-mini" if you prefer a smaller model

    reply = get_response(client,model,prompt,temperature=0)
    if "True" in reply:
        return True
    elif "False" in reply:
        print(f"Instruction is not a code generation task: {instruction}")
        return False
    else:
        print(f"Unexpected response: {reply}")
        return None


# Apply on a subset for testing first
subset = df.copy()  # adjust as needed
subset["is_code_generation"] = subset["instruction"].apply(lambda x: is_code_generation(x))

# Filter only code-generation rows
final_df = subset[subset["is_code_generation"] == True]

# Save or inspect
final_df.to_csv("filtered_codegen_mceval.csv", index=False)
print("✅ Filtered file saved as 'filtered_codegen_mceval.csv'")



✅ Filtered file saved as 'filtered_codegen_mceval.csv'


In [22]:
instruction = """
Provide a concise natural language explanation of the Python function below.
def find_median(nums):
    nums.sort()
    n = len(nums)
    if n % 2 == 1:
        return nums[n // 2]
    else:
        m1 = nums[n // 2 - 1]
        m2 = nums[n // 2]
        return (m1 + m2) / 2
"""
is_code_generation(instruction)

Instruction is not a code generation task: 
Provide a concise natural language explanation of the Python function below.
def find_median(nums):
    nums.sort()
    n = len(nums)
    if n % 2 == 1:
        return nums[n // 2]
    else:
        m1 = nums[n // 2 - 1]
        m2 = nums[n // 2]
        return (m1 + m2) / 2



False

In [20]:
final_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 927 entries, 0 to 926
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   language            927 non-null    object
 1   instruction         927 non-null    object
 2   source              927 non-null    object
 3   output              927 non-null    object
 4   is_code_generation  927 non-null    bool  
dtypes: bool(1), object(4)
memory usage: 30.0+ KB


In [11]:
explanations = dataset.filter(is_python).filter(lambda x: not is_code_gen(x))
for i, item in enumerate(explanations):
    print(f"\nExample #{i}:\nInstruction: {item['instruction']}\nCode: {item['code']}\n{'—'*40}")
    if i >= 10:
        break  # Inspect top 10 to confirm


Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4230/4230 [00:00<00:00, 57764.69 examples/s]


Example #0:
Instruction: 

Create a Python function named `display_spike_trains` that uses the `ephyviewer` library to visualize spike train data for neuroscientific analysis. The function should take a list of spike train data, where each spike train is represented as a list of spike times (in seconds). The function should display an interactive window with the spike trains plotted, allowing users to visually inspect the activity of neurons over time.

The function should adhere to the following specifications:

1. The function should be named `display_spike_trains`.
2. The function should accept a single parameter `spike_trains_list`, which is a list of lists. Each inner list represents a spike train with spike times.
3. The function should create a fake spike train source using the provided spike train data.
4. The function should create an interactive window using `ephyviewer` that displays the spike trains.
5. The function should not return any value; its purpose is to create and


