# Step 1 - Get and Prepare Input Datasets



## Loading MCEval-Instruct Dataset

In [2]:
import pandas as pd

Mceval = pd.read_json("hf://datasets/Multilingual-Multimodal-NLP/McEval-Instruct/McEval-Instruct.json")

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
Mceval.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35943 entries, 0 to 35942
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   instruction  35943 non-null  object
 1   output       35943 non-null  object
 2   language     35943 non-null  object
 3   source       35943 non-null  object
dtypes: object(4)
memory usage: 1.1+ MB


In [4]:
Mceval.head()

Unnamed: 0,instruction,output,language,source
0,\n\nWrite a Fortran subroutine that applies a ...,\n\n```fortran\n! Import necessary modules\nim...,FORTRAN,McEval-Instruct
1,\nDesign a function that finds the shortest pa...,\n```javascript\nclass Maze {\n constructor(\...,JavaScript,McEval-Instruct
2,\n\nCreate a React component called `DeviceDet...,\n\n```jsx\nimport React from 'react';\nimport...,JavaScript,McEval-Instruct
3,\n\nCreate a React component named `EditAvaila...,"\n\n```jsx\nimport React, { useState } from 'r...",JavaScript,McEval-Instruct
4,\n\nCreate a JavaScript class named `ItemsMana...,\n\n```javascript\nimport axios from 'axios';\...,JavaScript,McEval-Instruct


In [5]:
python_mceval = Mceval[Mceval['language'] == 'Python']
display(python_mceval.head())

Unnamed: 0,instruction,output,language,source
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct


In [6]:
python_mceval.to_csv("../../data/raw//McEval_Instruct.csv")

## Get the Test Dataset for constraint generation from github

In [7]:
!wget https://raw.github.ibm.com/shagutt1/sravani-internship-2025/refs/heads/main/data/outputs/constraint_category_data.csv?token=GHSAT0AAAAAAACSCL5HQAVP6M7WIKJYEZSC2C5M6CA -O "constraint_category_initial_data.csv"

zsh:1: no matches found: https://raw.github.ibm.com/shagutt1/sravani-internship-2025/refs/heads/main/data/outputs/constraint_category_data.csv?token=GHSAT0AAAAAAACSCL5HQAVP6M7WIKJYEZSC2C5M6CA


In [8]:
df = pd.read_csv("../../data/raw/constraint_category.csv")
df.head()

Unnamed: 0,dataset,instruction,code,test,Characteristics,constraints
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,,1) Logic is modularized using calculate_distan...,['Implement the distance calculation in a sepa...
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...",,1) Embed the HTML tag using an f-string to dyn...,"[""Provide a concise solution using Python's st..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,,1) The function factorial is implemented recur...,['The program must use recursion to calculate ...
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,,"The class defines six attributes: species, nam...","['Include detailed docstrings for each method,..."
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...",,The function is named decryptJokes and accepts...,['Provide a complete Python function implement...


In [9]:
df = df[["dataset","instruction","code"]]
df.head()

Unnamed: 0,dataset,instruction,code
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t..."


## Decide Dataset Sampling Ratios Based on Real-World Complexity
- Skipping for now

## Write transformation scripts for each dataset format

## Create a benchmark dataset of 1000–1200 examples



# Step 2 - Define Constraint Categories and clean it

In [10]:
categories = [
    "Code Structure and Modularity",
    "Input and Output Handling",
    "Error Handling and Robustness",
    "Data Processing and Transformation",
    "Performance and Optimization",
    "Library and API Usage",
    "Testing and Debugging",
    "Documentation and Readability",
    "Security and Privacy",
    "Reproducibility and Consistency",
    "Mathematical Computation",
    "File and Data Management",
    "UI and Interaction",
]

# Step 3 Generating Constraints

In [11]:
import pandas as pd
import time
from openai import OpenAI
import os


openai_api_key = os.getenv("OPENAI_API_KEY")  # Replace with your actual key
client = OpenAI(api_key=openai_api_key)


In [12]:
SYSTEM_PROMPT = f"""
You are a helpful assistant. You will be given a programming instruction and the corresponding code.
"""

In [77]:
def get_response(user_prompt,system_prompt=SYSTEM_PROMPT,max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.3
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Error on attempt {attempt+1}: {e}")
            time.sleep(2)
    return "[]"

## Map Categories to the Instructions

In [14]:


def get_relevant_categories(instruction, code, max_retries=3):
    user_prompt = f"""You are a coding assistant designed to classify natural language code generation instructions into appropriate high-level constraint categories.

You must choose all applicable categories from the following 13 supercategories based on the instruction and the type of code that is expected to be written:

1. Code Structure and Modularity
2. Input and Output Handling
3. Error Handling and Robustness
4. Data Processing and Transformation
5. Performance and Optimization
6. Library and API Usage
7. Testing and Debugging
8. Documentation and Readability
9. Security and Privacy
10. Reproducibility and Consistency
11. Mathematical Computation
12. File and Data Management
13. UI and Interaction

Task:
Given the instruction, select all relevant supercategories that:
- Help define the expected constraints.
- Reflect either the code behavior or user expectation.
- Include at least 2 relevant categories per instruction.
- Understand the domain of the problem and if any code looks like might need privacy and security considerations include that category as well


Output Format:
Respond with only a list of strings as shown in the example output.
Each string should be the exact name of the matching supercategory.

Example Output:
["Mathematical Computation", "Input and Output Handling", "Documentation and Readability"]

Instruction:\n
{instruction}

Code:\n
{code}

Relevant categories:"""

    categories = get_response(user_prompt,max_retries)
    return categories






In [15]:
get_relevant_categories(df.iloc[0]["instruction"],df.iloc[0]["code"])

'["Mathematical Computation", "Input and Output Handling", "Documentation and Readability"]'

In [16]:
# get_relevant_categories(test_mceval.iloc[0]["instruction"],test_mceval.iloc[0]["code"])

In [17]:
# Run on first 20 rows (you can change this limit)
def map_categories(df,output_pth,input_col1,input_col2,output_col):
    results = []
    for i, row in df.iterrows():
        print(f"Processing row {i}")
        categories = get_relevant_categories(row[input_col1], row[input_col2])
        results.append(categories)

    df[output_col] = results
    df.to_csv(output_pth, index=False)
    print("Saved the file successfully")

In [18]:
map_categories(df,"step3_with_relevant_categories.csv","instruction","code","relevant_categories")

Processing row 0
Processing row 1
Processing row 2
Processing row 3
Processing row 4
Processing row 5
Processing row 6
Processing row 7
Processing row 8
Processing row 9
Processing row 10
Processing row 11
Processing row 12
Processing row 13
Processing row 14
Processing row 15
Processing row 16
Processing row 17
Processing row 18
Processing row 19
Processing row 20
Processing row 21
Processing row 22
Processing row 23
Processing row 24
Processing row 25
Processing row 26
Processing row 27
Processing row 28
Processing row 29
Processing row 30
Processing row 31
Processing row 32
Processing row 33
Processing row 34
Processing row 35
Processing row 36
Processing row 37
Processing row 38
Processing row 39
Processing row 40
Processing row 41
Processing row 42
Processing row 43
Processing row 44
Processing row 45
Processing row 46
Processing row 47
Processing row 48
Processing row 49
Processing row 50
Processing row 51
Processing row 52
Processing row 53
Processing row 54
Processing row 55
Pr

In [19]:
df.head(59)

Unnamed: 0,dataset,instruction,code,relevant_categories
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,"[""Mathematical Computation"", ""Input and Output..."
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...","[""Code Structure and Modularity"", ""Input and O..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,"[""Error Handling and Robustness"", ""Mathematica..."
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,"[""Code Structure and Modularity"", ""Documentati..."
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...","[""Data Processing and Transformation"", ""Input ..."
5,ajibawa-2023/Python-Code-23k-ShareGPT,What is the purpose of the 'enumerate' keyword...,The purpose of the 'enumerate' keyword in Pyth...,"[""Code Structure and Modularity"", ""Error Handl..."
6,ajibawa-2023/Python-Code-23k-ShareGPT,Can you provide an example of mutation testing...,"Sure, here's an example of mutation testing in...","[""Testing and Debugging"", ""Code Structure and ..."
7,ajibawa-2023/Python-Code-23k-ShareGPT,Edit the given code to fix the bug and increas...,```python\ndef calculate_tax(price):\n tax_...,"[""Mathematical Computation"", ""Error Handling a..."
8,ajibawa-2023/Python-Code-23k-ShareGPT,Generate a random password that meets the foll...,Here is a Python function that generates a ran...,"[""Input and Output Handling"", ""Error Handling ..."
9,ajibawa-2023/Python-Code-23k-ShareGPT,Create a program to combine the elements of tw...,Here's a Python program that combines the elem...,"[""Code Structure and Modularity"", ""Data Proces..."


In [20]:
from collections import Counter
import ast

# Convert the string representation of lists to actual lists
df['relevant_categories'] = df['relevant_categories'].apply(ast.literal_eval)

# Flatten the list of lists and count occurrences
all_categories = [category for sublist in df['relevant_categories'] for category in sublist]
category_counts = Counter(all_categories)

# Create a DataFrame from the counts
category_df = pd.DataFrame.from_dict(category_counts, orient='index', columns=['Count'])

# Sort the categories by count
category_df = category_df.sort_values(by='Count', ascending=False)

display(category_df)

Unnamed: 0,Count
Input and Output Handling,36
Documentation and Readability,33
Code Structure and Modularity,29
Data Processing and Transformation,28
Error Handling and Robustness,21
Library and API Usage,12
Testing and Debugging,8
Mathematical Computation,7
Security and Privacy,7
UI and Interaction,6


Have an additional block of filter for relevant categories

## Prompt to generate the constraints

In [21]:
def get_prompt(instruction,code,relevant_categories):
    prompt = f""" "task": "You are an expert in generating constraints for code generation tasks by analyzing a natural language instruction and the associated code. Your objective is to identify the relevant characteristics of the task based on the instruction and code, and then generate meaningful constraints that guide a model to correctly implement or reason about the task in line with predefined constraint categories.",

    "context": "You are provided with three inputs: (1) a natural language `instruction` describing a coding task, (2) the actual `code` implementing or partially implementing that instruction, and (3) a list of `relevant_constraint_categories` which represent different high-level aspects of software development such as performance, error handling, modularity, etc. that are relevant to the instruction.“,

    "goal": "Your job is to examine the instruction, code and relevant constraint categories, and generate 8 to 10 natural language constraints that align with the relevant categories. These constraints should help an LLM generate code that is correct, complete, robust, and aligned with good development practices. The constraints should be actionable, precise, and reflective of both the explicit and implicit requirements inferred from the instruction and code.",

    "JSON Response Format": {{
        "Analysis on Characteristics": "Briefly describe the nature of the task and what kind of constraints are necessary to ensure proper execution. For example, should the solution handle edge cases? Use efficient data structures? Log errors? Include testability or modularity considerations?",
        "Constraints": [
        "List 8 to 10 detailed and well-formed constraints that are aligned with the relevant categories. Each constraint should be phrased clearly in natural language and should guide the model to include or avoid specific design decisions, implementation patterns, or output behaviors."
        ]
    }},

    "Inputs Required": {{
        "instruction": {instruction}
        "code": {code}
        "relevant_constraint_categories": {relevant_categories}

    }} """
    return prompt

In [22]:
prompt = get_prompt(df.iloc[0]["instruction"],df.iloc[0]["code"],df.iloc[0]["relevant_categories"])
print(prompt)

 "task": "You are an expert in generating constraints for code generation tasks by analyzing a natural language instruction and the associated code. Your objective is to identify the relevant characteristics of the task based on the instruction and code, and then generate meaningful constraints that guide a model to correctly implement or reason about the task in line with predefined constraint categories.",

    "context": "You are provided with three inputs: (1) a natural language `instruction` describing a coding task, (2) the actual `code` implementing or partially implementing that instruction, and (3) a list of `relevant_constraint_categories` which represent different high-level aspects of software development such as performance, error handling, modularity, etc. that are relevant to the instruction.“,

    "goal": "Your job is to examine the instruction, code and relevant constraint categories, and generate 8 to 10 natural language constraints that align with the relevant categ

In [23]:
instruction = df.iloc[0]["instruction"]
print(instruction)
response = get_response(prompt)
print(response)

Calculate the distance between two points located in the 3-dimensional space. The points are represented as tuples of three integers. The distance should be rounded to two decimal places. Additionally, the program should also display the angle between the vector components of the resulting distance and the x-axis in degrees, rounded to two decimal places.

Example Input:
Point 1: (3, 4, 5)
Point 2: (2, 6, -1)

Example Output:
Distance: 2.45
Vector components: (1, 2, -6)
Angle with x-axis: 32.94 degrees
{
    "Analysis on Characteristics": "The task involves mathematical computations to calculate the distance and angle between two points in 3D space. Constraints should ensure that the calculations are accurate and handle edge cases, such as points being identical or having zero distance. Input and output handling must be robust, ensuring that the function can handle various types of input and provide clear outputs. Documentation and readability are crucial for maintainability, so the co

In [24]:
def generate_constraints(df,output_pth,input_col1,input_col2,input_col3,output_col):
    results = []
    for i, row in df.iterrows():
        print(f"Processing row {i}")
        prompt = get_prompt(row[input_col1], row[input_col2], row[input_col3])
        constraints = get_response(prompt)
        results.append(constraints)

    df[output_col] = results
    df.to_csv(output_pth, index=False)
    print("Saved the file successfully")
    return df

In [25]:
test_df = df.head(20).copy()
test_df = generate_constraints(test_df,"step4_with_constraints.csv","instruction","code","relevant_categories","constraints")
test_df.head()

Processing row 0
Processing row 1
Processing row 2
Processing row 3
Processing row 4
Processing row 5
Processing row 6
Processing row 7
Processing row 8
Processing row 9
Processing row 10
Processing row 11
Processing row 12
Processing row 13
Processing row 14
Processing row 15
Processing row 16
Processing row 17
Processing row 18
Processing row 19
Saved the file successfully


Unnamed: 0,dataset,instruction,code,relevant_categories,constraints
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,"[Mathematical Computation, Input and Output Ha...","{\n ""Analysis on Characteristics"": ""The tas..."
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...","[Code Structure and Modularity, Input and Outp...","{\n ""Analysis on Characteristics"": ""The tas..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,"[Error Handling and Robustness, Mathematical C...","```json\n{\n ""Analysis on Characteristics"":..."
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,"[Code Structure and Modularity, Documentation ...","{\n ""Analysis on Characteristics"": ""The tas..."
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...","[Data Processing and Transformation, Input and...","{\n ""Analysis on Characteristics"": ""The tas..."


In [26]:
import json

def extract_constraints(constraint_string):
    try:
        json_string = constraint_string.strip().replace('```json\n', '', 1).replace('\n```', '', 1)
        constraint_json = json.loads(json_string)
        # print(constraint_json.get("Constraints", []))
        return constraint_json.get("Constraints", [])
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON: {e} in string: {constraint_string}")
        return []
    except AttributeError as e:
        print(f"Attribute error: {e} in string: {constraint_string}")
        return []



In [27]:
extract_constraints(test_df.iloc[0]["constraints"])

['Ensure that the distance calculation correctly handles the case where both points are the same, returning a distance of 0.00.',
 'The distance should be rounded to two decimal places before being returned and printed, ensuring consistent output formatting.',
 'When calculating the angle, ensure that the distance is not zero to avoid division by zero errors, and handle this case appropriately.',
 'The angle calculation should use the correct formula and ensure that the result is rounded to two decimal places before being returned.',
 'Include docstrings for each function to describe their purpose, parameters, and return values, enhancing documentation and readability.',
 'Use meaningful variable names that clearly indicate their purpose to improve code readability and maintainability.',
 'Ensure that the output format for both distance and angle matches the specified requirements, including units and decimal precision.',
 'Implement input validation to check that the input points are 

In [28]:
test_df['extracted_constraints'] = test_df['constraints'].apply(extract_constraints)
display(test_df.head())

Unnamed: 0,dataset,instruction,code,relevant_categories,constraints,extracted_constraints
0,ajibawa-2023/Python-Code-23k-ShareGPT,Calculate the distance between two points loca...,To calculate the distance between two points i...,"[Mathematical Computation, Input and Output Ha...","{\n ""Analysis on Characteristics"": ""The tas...",[Ensure that the distance calculation correctl...
1,ajibawa-2023/Python-Code-23k-ShareGPT,Assuming that the given sentence is stored in ...,"To achieve this, you can use Python's string m...","[Code Structure and Modularity, Input and Outp...","{\n ""Analysis on Characteristics"": ""The tas...","[The code should be modular, encapsulating the..."
2,ajibawa-2023/Python-Code-23k-ShareGPT,Write a program in Python to find the factoria...,Here is a program in Python that finds the fac...,"[Error Handling and Robustness, Mathematical C...","```json\n{\n ""Analysis on Characteristics"":...",[Ensure that the function checks if the input ...
3,ajibawa-2023/Python-Code-23k-ShareGPT,Create a Python class for Animal with the foll...,```python\nclass Animal:\n def __init__(sel...,"[Code Structure and Modularity, Documentation ...","{\n ""Analysis on Characteristics"": ""The tas...",[Ensure that the class follows the principles ...
4,ajibawa-2023/Python-Code-23k-ShareGPT,How can I decrypt a set of jokes that are encr...,"Sure, here's a Python function that decrypts t...","[Data Processing and Transformation, Input and...","{\n ""Analysis on Characteristics"": ""The tas...",[The function must handle non-alphabetic chara...


# Step 4 - Compare New Constraints with Previous Ones for Quality Improvement
- Skipping for now

# Step 6 - Generate constraints for each row in the benchmark dataset.


In [29]:
python_mceval = python_mceval.rename(columns={'instruction': 'instruction', 'output': 'code'})
display(python_mceval.head())

Unnamed: 0,instruction,code,language,source
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct


In [30]:
python_mceval.info()

<class 'pandas.core.frame.DataFrame'>
Index: 927 entries, 1610 to 8664
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   instruction  927 non-null    object
 1   code         927 non-null    object
 2   language     927 non-null    object
 3   source       927 non-null    object
dtypes: object(4)
memory usage: 36.2+ KB


In [31]:
test_mceval = python_mceval.head(20).copy()
map_categories(test_mceval,"Mceval_relevant_categories.csv","instruction","code","relevant_categories")
test_mceval.info()

Processing row 1610
Processing row 1611
Processing row 1612
Processing row 1613
Processing row 1614
Processing row 1615
Processing row 1616
Processing row 1617
Processing row 1618
Processing row 1619
Processing row 1620
Processing row 1621
Processing row 1622
Processing row 1623
Processing row 1624
Processing row 1625
Processing row 1626
Processing row 1627
Processing row 1628
Processing row 1629
Saved the file successfully
<class 'pandas.core.frame.DataFrame'>
Index: 20 entries, 1610 to 1629
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   instruction          20 non-null     object
 1   code                 20 non-null     object
 2   language             20 non-null     object
 3   source               20 non-null     object
 4   relevant_categories  20 non-null     object
dtypes: object(5)
memory usage: 960.0+ bytes


In [32]:
get_relevant_categories(test_mceval.iloc[0]["instruction"],test_mceval.iloc[0]["code"])

'["Code Structure and Modularity", "Input and Output Handling", "Performance and Optimization", "Error Handling and Robustness", "Library and API Usage"]'

In [33]:
test_mceval.head()

Unnamed: 0,instruction,code,language,source,relevant_categories
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O..."
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Library and..."
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct,"[""UI and Interaction"", ""Library and API Usage""..."
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O..."
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O..."


In [34]:
test_mceval = generate_constraints(test_mceval,"Mceval_constraints.csv","instruction","code","relevant_categories","constraints")
test_mceval.info()

Processing row 1610
Processing row 1611
Processing row 1612
Processing row 1613
Processing row 1614
Processing row 1615
Processing row 1616
Processing row 1617
Processing row 1618
Processing row 1619
Processing row 1620
Processing row 1621
Processing row 1622
Processing row 1623
Processing row 1624
Processing row 1625
Processing row 1626
Processing row 1627
Processing row 1628
Processing row 1629
Saved the file successfully
<class 'pandas.core.frame.DataFrame'>
Index: 20 entries, 1610 to 1629
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   instruction          20 non-null     object
 1   code                 20 non-null     object
 2   language             20 non-null     object
 3   source               20 non-null     object
 4   relevant_categories  20 non-null     object
 5   constraints          20 non-null     object
dtypes: object(6)
memory usage: 1.1+ KB


In [35]:
test_mceval["extract_constraints"] = test_mceval["constraints"].apply(extract_constraints)
test_mceval.head()

Unnamed: 0,instruction,code,language,source,relevant_categories,constraints,extract_constraints
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The function must take an Options object as i...
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Library and...","{\n ""Analysis on Characteristics"": ""The tas...",[Ensure that all SQL queries are parameterized...
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct,"[""UI and Interaction"", ""Library and API Usage""...","```json\n{\n ""Analysis on Characteristics"":...",[Ensure that the function `display_spike_train...
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The API should be designed with modularity in...
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","{\n ""Analysis on Characteristics"": ""The tas...","[The function should be modular, with a clear ..."


In [36]:
test_mceval.to_csv("../../data/raw/Mceval_constraints.csv",index=False)

# Step 7 - Validate the generated constraints.


# Step 8 - Generate code from Instruction and Generated Constraints

In [37]:
def get_codegeneration_prompt(instruction,constraints):
    prompt = f""" You are a skilled Python programmer. Based on the following natural language instruction and a set of implementation constraints, generate Python code that satisfies the instruction and fully adheres to all constraints.

    ### Instruction:
    {instruction}

    ### Constraints:
    {constraints}

    ### Requirements:
    - Ensure the code is clean, correct, and follows Python best practices.
    - Strictly follow all the constraints, even if they are not explicitly stated in the instruction.
    - Do not include any explanatory text; return only the code block.

    ### Output Format:
    Return a single Python code block that solves the task.

 
    # Your code here


    """
    return prompt


print(get_codegeneration_prompt(test_mceval.iloc[0,0],test_mceval.iloc[0,6]))

 You are a skilled Python programmer. Based on the following natural language instruction and a set of implementation constraints, generate Python code that satisfies the instruction and fully adheres to all constraints.

    ### Instruction:
    

In the context of a medical imaging analysis application, you are tasked with developing a CycleGAN-based model for domain adaptation between two different types of medical images (e.g., MRI and CT scans). The CycleGAN consists of two generators and two discriminators. The generators are responsible for translating images from one domain to another and vice versa, while the discriminators aim to distinguish between real and generated images.

The given code snippet provides a setup for initializing the CycleGAN components using PyTorch, including the generators (`G_AB` and `G_BA`), discriminators (`D_A` and `D_B`), and a Graph Neural Network (`model_gnn`) for feature extraction. Additionally, loss functions and optimizers are defined for tra

In [38]:
prompt = get_codegeneration_prompt(test_mceval.iloc[0,0],test_mceval.iloc[0,6])
code = get_response(prompt)
print(code)

```python
import torch
import torch.nn as nn
import torch.optim as optim

class Options:
    def __init__(self, lr_G, lr_D, lr_M, beta1, beta2):
        self.lr_G = lr_G
        self.lr_D = lr_D
        self.lr_M = lr_M
        self.beta1 = beta1
        self.beta2 = beta2

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        # Define layers for the generator

    def forward(self, x):
        # Forward pass for the generator
        return x

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        # Define layers for the discriminator

    def forward(self, x):
        # Forward pass for the discriminator
        return x

class GraphNN(nn.Module):
    def __init__(self):
        super(GraphNN, self).__init__()
        # Define layers for the GNN

    def forward(self, x):
        # Forward pass for the GNN
        return x

def initialize_cyclegan_components(options):
    try:
     

In [39]:
def generate_code(row,input_col1="instruction",input_col2="extract_constraints"):
    print(row[input_col1][:10])
    prompt = get_codegeneration_prompt(row[input_col1],row[input_col2])
    code = get_response(prompt)
    
    return code 
generate_code(test_mceval.iloc[0])



In the c


'```python\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\n\ndef initialize_cyclegan_components(options):\n    """\n    Initializes the components of the CycleGAN model based on the provided options.\n\n    Args:\n        options: An Options object containing hyperparameters for the model components.\n\n    Returns:\n        A dictionary containing initialized components of the CycleGAN.\n    """\n    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")\n\n    try:\n        # Initialize Generators\n        G_AB = Generator(options).to(device)\n        G_BA = Generator(options).to(device)\n\n        # Initialize Discriminators\n        D_A = Discriminator(options).to(device)\n        D_B = Discriminator(options).to(device)\n\n        # Initialize Graph Neural Network for feature extraction\n        model_gnn = GraphNN(options).to(device)\n\n        # Initialize Loss Functions\n        criterionIdt = nn.L1Loss().to(device)\n        criterionCycle = nn.

In [40]:
from tqdm import tqdm
import pandas as pd

# Enable tqdm for pandas apply
tqdm.pandas()
test_mceval["generated_code"] = test_mceval.apply(generate_code,axis=1)



In the c


Design a


Create a

Design a 


Write a 


Write a 


Write a 


Design a


Create a

Write a P


Implemen


Write a 


Write a 


You are 

Design a 


Create a


Given a 


Design a


Design a


Write a 


In [50]:
test_mceval.head()

Unnamed: 0,instruction,code,language,source,relevant_categories,constraints,extract_constraints,generated_code,generated_code_without_constraints
1610,\n\nIn the context of a medical imaging analys...,\n\n```python\nimport itertools\nimport torch\...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The function must take an Options object as i...,```python\nimport torch\nimport torch.nn as nn...,```python\nimport torch\nimport torch.nn as nn...
1611,\n\nDesign a Python class that provides an Obj...,\n\n```python\nimport sqlite3\nimport pandas a...,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Library and...","{\n ""Analysis on Characteristics"": ""The tas...",[Ensure that all SQL queries are parameterized...,```python\nimport sqlite3\nimport pandas as pd...,```python\nimport sqlite3\nimport pandas as pd...
1612,\n\nCreate a Python function named `display_sp...,\n\n```python\nimport ephyviewer\nfrom ephyvie...,Python,McEval-Instruct,"[""UI and Interaction"", ""Library and API Usage""...","```json\n{\n ""Analysis on Characteristics"":...",[Ensure that the function `display_spike_train...,```python\nimport logging\nfrom ephyviewer imp...,```python\nimport logging\nfrom ephyviewer imp...
1613,\nDesign a search API for a municipal signals ...,\n```python\n# SPDX-License-Identifier: MPL-2....,Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","```json\n{\n ""Analysis on Characteristics"":...",[The API should be designed with modularity in...,"```python\nfrom flask import Flask, request, j...","```python\nfrom flask import Flask, request, j..."
1614,\n\nWrite a function `allocate_budget` that ta...,"\n\n```python\ndef allocate_budget(requests, b...",Python,McEval-Instruct,"[""Code Structure and Modularity"", ""Input and O...","{\n ""Analysis on Characteristics"": ""The tas...","[The function should be modular, with a clear ...","```python\ndef allocate_budget(requests, budge...","```python\ndef allocate_budget(requests, budge..."


In [42]:
test_mceval.to_csv("../../data/processed/Mceval_generated_code.csv",index=False)

In [43]:
def generate_code_without_constraints(row,input_col1="instruction"):
    print(row[input_col1][:10])
    prompt = f"""You are a skilled Python programmer. Based on the following natural language instruction, generate Python code that satisfies the instruction.

    ### Instruction:
    {row[input_col1]}

    ### Requirements:
    - Ensure the code is clean, correct, and follows Python best practices.
    - Do not include any explanatory text; return only the code block.

    ### Output Format:
    Return a single Python code block that solves the task.

 
    # Your code here


    """
    code = get_response(prompt)
    
    return code

code1= generate_code_without_constraints(test_mceval.iloc[0])



In the c


In [44]:
print(code1)

```python
import torch
import torch.nn as nn
import torch.optim as optim

def initialize_cyclegan_components(options):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    # Initialize Generators
    G_AB = Generator(options).to(device)
    G_BA = Generator(options).to(device)
    
    # Initialize Discriminators
    D_A = Discriminator(options).to(device)
    D_B = Discriminator(options).to(device)
    
    # Initialize Graph Neural Network
    model_gnn = GraphNN(options).to(device)
    
    # Initialize Loss Functions
    criterionIdt = nn.L1Loss().to(device)
    criterionCycle = nn.L1Loss().to(device)
    criterionGEN = nn.BCELoss().to(device)
    
    # Initialize Optimizers
    optimizer_G = optim.Adam(itertools.chain(G_AB.parameters(), G_BA.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_D = optim.Adam(itertools.chain(D_A.parameters(), D_B.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_M = optim.A

In [45]:
code2 = generate_code(test_mceval.iloc[0])
print(code2)



In the c
```python
import torch
import torch.nn as nn
import torch.optim as optim

class Options:
    def __init__(self, lr_G, lr_D, lr_M, beta1, beta2):
        self.lr_G = lr_G
        self.lr_D = lr_D
        self.lr_M = lr_M
        self.beta1 = beta1
        self.beta2 = beta2

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        # Define generator architecture here

    def forward(self, x):
        # Define forward pass
        return x

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        # Define discriminator architecture here

    def forward(self, x):
        # Define forward pass
        return x

def initialize_cyclegan_components(options):
    components = {}
    try:
        # Initialize generators
        G_AB = Generator()
        G_BA = Generator()
        
        # Initialize discriminators
        D_A = Discriminator()
        D_B = Discriminator()
        


In [46]:
from IPython.display import Markdown, display
def md(text):
    display(Markdown(text))
md(code1)


```python
import torch
import torch.nn as nn
import torch.optim as optim

def initialize_cyclegan_components(options):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    # Initialize Generators
    G_AB = Generator(options).to(device)
    G_BA = Generator(options).to(device)
    
    # Initialize Discriminators
    D_A = Discriminator(options).to(device)
    D_B = Discriminator(options).to(device)
    
    # Initialize Graph Neural Network
    model_gnn = GraphNN(options).to(device)
    
    # Initialize Loss Functions
    criterionIdt = nn.L1Loss().to(device)
    criterionCycle = nn.L1Loss().to(device)
    criterionGEN = nn.BCELoss().to(device)
    
    # Initialize Optimizers
    optimizer_G = optim.Adam(itertools.chain(G_AB.parameters(), G_BA.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_D = optim.Adam(itertools.chain(D_A.parameters(), D_B.parameters()), lr=options.lr, betas=(options.beta1, 0.999))
    optimizer_M = optim.Adam(model_gnn.parameters(), lr=options.lr, betas=(options.beta1, 0.999))
    
    return {
        'G_AB': G_AB,
        'G_BA': G_BA,
        'D_A': D_A,
        'D_B': D_B,
        'model_gnn': model_gnn,
        'criterionIdt': criterionIdt,
        'criterionCycle': criterionCycle,
        'criterionGEN': criterionGEN,
        'optimizer_G': optimizer_G,
        'optimizer_D': optimizer_D,
        'optimizer_M': optimizer_M
    }
```

In [47]:
md(code2)

```python
import torch
import torch.nn as nn
import torch.optim as optim

class Options:
    def __init__(self, lr_G, lr_D, lr_M, beta1, beta2):
        self.lr_G = lr_G
        self.lr_D = lr_D
        self.lr_M = lr_M
        self.beta1 = beta1
        self.beta2 = beta2

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        # Define generator architecture here

    def forward(self, x):
        # Define forward pass
        return x

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        # Define discriminator architecture here

    def forward(self, x):
        # Define forward pass
        return x

def initialize_cyclegan_components(options):
    components = {}
    try:
        # Initialize generators
        G_AB = Generator()
        G_BA = Generator()
        
        # Initialize discriminators
        D_A = Discriminator()
        D_B = Discriminator()
        
        # Initialize Graph Neural Network for feature extraction
        model_gnn = nn.Module()  # Replace with actual GNN initialization
        
        # Initialize loss functions
        criterionIdt = nn.L1Loss()
        criterionCycle = nn.L1Loss()
        criterionGEN = nn.BCELoss()
        
        # Initialize optimizers
        optimizer_G = optim.Adam(list(G_AB.parameters()) + list(G_BA.parameters()), 
                                  lr=options.lr_G, betas=(options.beta1, options.beta2))
        optimizer_D = optim.Adam(list(D_A.parameters()) + list(D_B.parameters()), 
                                  lr=options.lr_D, betas=(options.beta1, options.beta2))
        optimizer_M = optim.Adam(model_gnn.parameters(), 
                                  lr=options.lr_M, betas=(options.beta1, options.beta2))
        
        # Check for CUDA availability and move components
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        G_AB.to(device)
        G_BA.to(device)
        D_A.to(device)
        D_B.to(device)
        model_gnn.to(device)
        criterionIdt.to(device)
        criterionCycle.to(device)
        criterionGEN.to(device)

        # Store components in the dictionary
        components['G_AB'] = G_AB
        components['G_BA'] = G_BA
        components['D_A'] = D_A
        components['D_B'] = D_B
        components['model_gnn'] = model_gnn
        components['criterionIdt'] = criterionIdt
        components['criterionCycle'] = criterionCycle
        components['criterionGEN'] = criterionGEN
        components['optimizer_G'] = optimizer_G
        components['optimizer_D'] = optimizer_D
        components['optimizer_M'] = optimizer_M

    except Exception as e:
        print(f"Error during initialization: {e}")
    
    return components
```

In [48]:
from tqdm import tqdm
import pandas as pd

# Enable tqdm for pandas apply
tqdm.pandas()
test_mceval["generated_code_without_constraints"] = test_mceval.progress_apply(generate_code, axis=1)

  0%|          | 0/20 [00:00<?, ?it/s]



In the c


 10%|█         | 2/20 [00:09<01:25,  4.78s/it]



Design a


 15%|█▌        | 3/20 [00:26<02:47,  9.88s/it]



Create a


 20%|██        | 4/20 [00:36<02:39,  9.99s/it]


Design a 


 25%|██▌       | 5/20 [00:49<02:44, 10.95s/it]



Write a 


 30%|███       | 6/20 [00:58<02:26, 10.43s/it]



Write a 


 35%|███▌      | 7/20 [01:13<02:31, 11.64s/it]



Write a 


 40%|████      | 8/20 [01:24<02:17, 11.49s/it]



Design a


 45%|████▌     | 9/20 [01:37<02:13, 12.15s/it]



Create a


 50%|█████     | 10/20 [01:54<02:15, 13.57s/it]


Write a P


 55%|█████▌    | 11/20 [02:02<01:46, 11.78s/it]



Implemen


 60%|██████    | 12/20 [02:21<01:52, 14.07s/it]



Write a 


 65%|██████▌   | 13/20 [02:31<01:30, 12.92s/it]



Write a 


 70%|███████   | 14/20 [02:41<01:12, 12.03s/it]



You are 


 75%|███████▌  | 15/20 [03:01<01:10, 14.19s/it]


Design a 


 80%|████████  | 16/20 [03:16<00:58, 14.52s/it]



Create a


 85%|████████▌ | 17/20 [03:26<00:39, 13.27s/it]



Given a 


 90%|█████████ | 18/20 [03:38<00:25, 12.83s/it]



Design a


 95%|█████████▌| 19/20 [03:57<00:14, 14.58s/it]



Design a


100%|██████████| 20/20 [04:13<00:00, 15.20s/it]



Write a 


100%|██████████| 20/20 [04:26<00:00, 13.33s/it]


# Step 9 - Evaluate Generated Code and Calculate Metrics

In [52]:
import difflib

code1_lines = code1.strip().splitlines()
code2_lines = code2.strip().splitlines()

# Get the diff
diff = difflib.unified_diff(code1_lines, code2_lines, fromfile='code1', tofile='code2', lineterm='')

# Print the diff
print("\n".join(diff))


--- code1
+++ code2
@@ -3,41 +3,85 @@
 import torch.nn as nn
 import torch.optim as optim
 
+class Options:
+    def __init__(self, lr_G, lr_D, lr_M, beta1, beta2):
+        self.lr_G = lr_G
+        self.lr_D = lr_D
+        self.lr_M = lr_M
+        self.beta1 = beta1
+        self.beta2 = beta2
+
+class Generator(nn.Module):
+    def __init__(self):
+        super(Generator, self).__init__()
+        # Define generator architecture here
+
+    def forward(self, x):
+        # Define forward pass
+        return x
+
+class Discriminator(nn.Module):
+    def __init__(self):
+        super(Discriminator, self).__init__()
+        # Define discriminator architecture here
+
+    def forward(self, x):
+        # Define forward pass
+        return x
+
 def initialize_cyclegan_components(options):
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    components = {}
+    try:
+        # Initialize generators
+        G_AB = Generator()
+        G_BA = Generator()


## Final Prompt generation


In [57]:
## Prompt for Instruction Generation along with Constraints
def combine_instruction_and_constraints_embedded_style(instruction: str, constraints: list[str]) -> str:
    """
    Combines a natural language instruction and a list of constraints into a single, unified string
    where constraints are presented as implicit requirements or clauses following the main instruction,
    without a dedicated 'Implementation Constraints' heading.

    Args:
        instruction (str): The original programming instruction.
        constraints (list[str]): A list of specific, objective, and atomic constraints.

    Returns:
        str: A single string containing the instruction with constraints "embedded" as additional requirements.
    """
    if not instruction:
        raise ValueError("Instruction cannot be empty.")
    if not isinstance(constraints, list):
        raise TypeError("Constraints must be a list of strings.")

    # Start with the main instruction, making it clear it's the primary task
    combined_text = f"Your main task is to {instruction.strip()}"

    # If there are constraints, append them as additional requirements
    if constraints:
        # Use a linking phrase to transition to the constraints
        combined_text += "\n\nWhen developing the solution, it is crucial to also fulfill the following requirements:\n"

        # Format each constraint clearly, without an explicit heading like "Constraints:"
        # Capitalize the first letter and add a period for sentence structure if needed.
        formatted_constraints_lines = []
        for constraint in constraints:
            line = constraint.strip()
            # Ensure proper sentence casing for flow
            if line: # Avoid empty lines
                if not line.endswith('.'):
                    line += '.'
                formatted_constraints_lines.append(line[0].upper() + line[1:])

        combined_text += "\n".join(formatted_constraints_lines)

    return combined_text.strip()

# Example Usage:
# instruction_example = "Write a Python function to calculate the factorial of a given non-negative integer."
# constraints_example = [
#     "the function must be named 'calculate_factorial'",
#     "it must use recursion for the calculation",
#     "handle input that is not a non-negative integer by raising a ValueError",
#     "include a docstring explaining the function's purpose, parameters, and return value"
# ]

# unified_input_prompt_embedded_style = combine_instruction_and_constraints_embedded_style(instruction_example, constraints_example)
# print(unified_input_prompt_embedded_style)

In [55]:
import pandas as pd
df = pd.read_csv("step5_with_quality_scores_m1.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 11 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   dataset                10 non-null     object 
 1   instruction            10 non-null     object 
 2   code                   10 non-null     object 
 3   relevant_categories    10 non-null     object 
 4   constraints_m1         10 non-null     object 
 5   extracted_constraints  10 non-null     object 
 6   quality_scores         10 non-null     object 
 7   specificity_score      10 non-null     float64
 8   objectivity_score      10 non-null     float64
 9   atomicity_score        10 non-null     float64
 10  unified_quality_score  10 non-null     float64
dtypes: float64(4), object(7)
memory usage: 1012.0+ bytes


In [59]:
df.iloc[0]["extracted_constraints"]

'[{\'type\': \'Code Structure and Modularity\', \'constraint\': "The function must be named \'generate_sine_cosine_diagrams\' and should not contain any hardcoded values outside of the function definition.", \'instruction_part\': \'Generate diagrams for the sine and cosine functions over the interval [0, 2π].\'}, {\'type\': \'Input and Output Handling\', \'constraint\': "The function must accept parameters for the start and end of the interval (e.g., \'start\' and \'end\') and the number of points to plot (e.g., \'num_points\').", \'instruction_part\': \'Generate diagrams for the sine and cosine functions over the interval [0, 2π].\'}, {\'type\': \'Input and Output Handling\', \'constraint\': \'The function must return a tuple containing a Matplotlib Figure object and an ndarray of Matplotlib Axes objects, as specified in the instruction.\', \'instruction_part\': \'The function should output with: Figure: A Matplotlib Figure object containing the plots.\'}, {\'type\': \'Documentation a

In [None]:


# Flatten all constraints from all rows into a single list
all_constraints = []
for row in df["extracted_constraints"]:
    # If the constraints are stored as a string, convert to list of dicts
    if isinstance(row, str):
        constraints = ast.literal_eval(row)
    else:
        constraints = row
    all_constraints.extend(constraints)

print(all_constraints)


['[{\'type\': \'Code Structure and Modularity\', \'constraint\': "The function must be named \'generate_sine_cosine_diagrams\' and should not contain any hardcoded values outside of the function definition.", \'instruction_part\': \'Generate diagrams for the sine and cosine functions over the interval [0, 2π].\'}, {\'type\': \'Input and Output Handling\', \'constraint\': "The function must accept parameters for the start and end of the interval (e.g., \'start\' and \'end\') and the number of points to plot (e.g., \'num_points\').", \'instruction_part\': \'Generate diagrams for the sine and cosine functions over the interval [0, 2π].\'}, {\'type\': \'Input and Output Handling\', \'constraint\': \'The function must return a tuple containing a Matplotlib Figure object and an ndarray of Matplotlib Axes objects, as specified in the instruction.\', \'instruction_part\': \'The function should output with: Figure: A Matplotlib Figure object containing the plots.\'}, {\'type\': \'Documentation 

In [70]:
def generate_constraint_list (constraints):
    con_list = []
    # print(constraints)
    if isinstance(constraints, str):
        constraints = ast.literal_eval(constraints)
    for item in constraints:
        # print(item)
        if isinstance(item, str):
            item = ast.literal_eval(item)
            # print(item)
        if item["constraint"]:
                con_list.append(item["constraint"])
    return con_list
    
generate_constraint_list(df.iloc[0]["extracted_constraints"])

["The function must be named 'generate_sine_cosine_diagrams' and should not contain any hardcoded values outside of the function definition.",
 "The function must accept parameters for the start and end of the interval (e.g., 'start' and 'end') and the number of points to plot (e.g., 'num_points').",
 'The function must return a tuple containing a Matplotlib Figure object and an ndarray of Matplotlib Axes objects, as specified in the instruction.',
 'Include a docstring at the beginning of the function that describes its purpose, parameters, and return values in a clear and concise manner.',
 'All variables used in the function must have descriptive names that clearly indicate their purpose.',
 "Ensure that the sine and cosine functions are computed using NumPy's np.sin() and np.cos() methods, and verify that the input to these functions is a NumPy array.",
 'The plotting code for the sine and cosine functions should be encapsulated in separate helper functions to promote modularity.',

In [72]:
print(combine_instruction_and_constraints_embedded_style(df.iloc[0]["instruction"],generate_constraint_list(df.iloc[0]["extracted_constraints"]) ))

Your main task is to Generate diagrams for the sine and cosine functions over the interval [0, 2π]. This function plots the sine and cosine functions, setting appropriate titles and axis labels. The sine function plot is labeled 'Sine function', with x-axis labeled 'x' and y-axis labeled 'sin(x)'. The cosine function plot is labeled 'Cosine function', with x-axis labeled 'x' and y-axis labeled 'cos(x)'.
The function should output with:
    Figure: A Matplotlib Figure object containing the plots.
    ndarray: An array of Matplotlib Axes objects for the subplots, where:
    The first Axes object contains the sine function plot.
    The second Axes object contains the cosine function plot.
You should write self-contained code starting with:
```
import numpy as np
import matplotlib.pyplot as plt
def task_func():
```

When developing the solution, it is crucial to also fulfill the following requirements:
The function must be named 'generate_sine_cosine_diagrams' and should not contain any h

In [76]:
SYSTEM_PROMPT_EMBEDDING_LLM = """
You are an expert natural language rephraser. Your task is to seamlessly integrate a list of specific programming constraints into a given natural language instruction. The goal is to produce a single, fluid, and grammatically correct prompt where the constraints are an implicit part of the main instruction, without any explicit headings, bullet points, or separate sections for constraints.
"""

def get_embedding_llm_prompt(original_instruction: str, constraints: list[str]) -> str:

    if not original_instruction:
        raise ValueError("Original instruction cannot be empty.")
    if not isinstance(constraints, list) or not all(isinstance(c, str) for c in constraints):
        raise TypeError("Constraints must be a list of strings.")

    # Format constraints as a simple list for the embedding LLM to process
    formatted_constraints_list = "\n".join([f"- {c.strip()}" for c in constraints])

    user_prompt = f"""
Original Instruction:
{original_instruction}

Constraints to Embed:
{formatted_constraints_list}

Your Goal:
Rephrase the 'Original Instruction' to naturally incorporate all 'Constraints to Embed' into a single, cohesive paragraph or a few flowing sentences. The output should sound like a complete, natural programming task description. Do NOT use headings, bullet points, numbered lists, or any explicit separators like 'Constraints:' or 'Requirements:'. The constraints should feel like an organic part of the instruction itself. Ensure the final output is grammatically correct.

Final Embedded Prompt:
"""
    return user_prompt

prompt = get_embedding_llm_prompt(df.iloc[0]["instruction"],generate_constraint_list(df.iloc[0]["extracted_constraints"]))
print(prompt)



Original Instruction:
Generate diagrams for the sine and cosine functions over the interval [0, 2π]. This function plots the sine and cosine functions, setting appropriate titles and axis labels. The sine function plot is labeled 'Sine function', with x-axis labeled 'x' and y-axis labeled 'sin(x)'. The cosine function plot is labeled 'Cosine function', with x-axis labeled 'x' and y-axis labeled 'cos(x)'.
The function should output with:
    Figure: A Matplotlib Figure object containing the plots.
    ndarray: An array of Matplotlib Axes objects for the subplots, where:
    The first Axes object contains the sine function plot.
    The second Axes object contains the cosine function plot.
You should write self-contained code starting with:
```
import numpy as np
import matplotlib.pyplot as plt
def task_func():
```

Constraints to Embed:
- The function must be named 'generate_sine_cosine_diagrams' and should not contain any hardcoded values outside of the function definition.
- The func

In [79]:
final_prompt = get_response(prompt,system_prompt=SYSTEM_PROMPT_EMBEDDING_LLM)
print(final_prompt)

Generate diagrams for the sine and cosine functions over a specified interval by creating a function named 'generate_sine_cosine_diagrams' that accepts parameters for the start and end of the interval, as well as the number of points to plot. This function should include a docstring that clearly describes its purpose, parameters, and return values, and it must return a tuple containing a Matplotlib Figure object and an ndarray of Matplotlib Axes objects. The first Axes object should display the sine function, labeled 'Sine function', while the second Axes object should show the cosine function, labeled 'Cosine function'. Ensure that all variables within the function have descriptive names that indicate their purpose, and compute the sine and cosine values using NumPy's np.sin() and np.cos() methods, confirming that the input to these functions is a NumPy array. The plotting code for both functions should be encapsulated in separate helper functions to enhance modularity, and include in