# CodeQuest Tutorial
***

## Overview

This notebook is a tutorial for CodeQUEST, a novel framework leveraging Large Language Models (LLMs) to evaluate and enhance code quality across multiple dimensions including readability, maintainability, efficiency, and security.  

The framework is divided into two main components: an Evaluator, which assesses code quality across ten dimensions providing both quantitative scores and qualitative summaries, and an Optimizer, which iteratively improves the code based on feedback from the Evaluator.  

Our study demonstrates that CodeQUEST can effectively evaluate code quality, with its assessments aligning closely with established code quality metrics.  

This highlights the potential of LLMs in automating code quality evaluation and improvement processes, presenting a significant advancement toward enhancing software development practices.  

Note that evaluation scores may vary slightly across runs, due to the inherent stochasticity of the LLM. Increased output stability can be achieved by increasing the number of retries per evaluation.

##  Loading 
***

In [None]:
%load_ext autoreload
%autoreload 2

# Load the environment variables if using a .env file at root
# from dotenv import load_dotenv 
# load_dotenv()

import os 
from codequest.evaluator import Evaluator, CodeQUESTEvaluator 
from codequest.optimizer import Optimizer, CodeQUESTOptimizer 
from codequest.codequest import format_dimwise_feedback_for_improver, code_tester, check_syntax, QUESTer 

script_path = "examples/python/code.py"
code = open(script_path, 'r').read().strip() 
testcases_path = "tests/test_cases/python/code.py"
print(code)

class Node: 
	def __init__(self, data): 
		self.data = data 
		self.left = None
		self.right = None
def max_height(node): 
	if node is None: 
		return 0 ; 
	else : 
		left_height = max_height(node.left) 
		right_height = max_height(node.right) 
		if (left_height > right_height): 
			return left_height+1
		else: 
			return right_height+1


## Evaluation 

In this section, we create a *baseline evaluator* based on Chain-of-Thought prompting technique and we also create a *codeQUEST evaluator* based on our novel approach. 
We compare the performance of both evaluations on an example script in the zero-shot setting. 

In [2]:
baseline_evaluator = Evaluator(num_retries=1)
codequest_evaluator = CodeQUESTEvaluator(num_retries=1)

In [3]:
baseline_eval = baseline_evaluator(code)

In [4]:
baseline_eval['report']

Unnamed: 0,code_score,code_insight,code_runtime,code_runcost,code_scores,code_insights
0,4.0,The code is well-structured and correctly impl...,2,0.002,[4],[The code is well-structured and correctly imp...


In [5]:
codequest_eval = codequest_evaluator(code)

In [6]:
codequest_eval['report']

Unnamed: 0,quality_dimension,dimension_score,dimension_insights
0,Readability,1,[The code is generally readable with descripti...
1,Maintainability,5,[The code is logically organized and easy to u...
2,Testability,4,[The code provided is well-structured for test...
3,Efficiency,2,[The code provided is a basic implementation o...
4,Robustness,-4,[The code provided is a basic implementation o...
5,Security,1,[The provided code is a simple implementation ...
6,Documentation,-5,[The code lacks documentation. There are no co...
7,Modularity,-2,[The code provided is simple and functional bu...
8,Scalability,-5,[The code provided is a basic implementation o...
9,Portability,4,[The code provided is highly portable as it do...


In [7]:
codequest_eval['score']

0.1

*Noticed how the baseline eval based on Chain-of-Thought over-estimate the code quality*

## Optimization 

In this section, we create a *baseline optimizer* based on Chain-of-Thought prompting technique and we also create a *codeQUEST optimizer* based on our novel approach. 
We compare the optimized code on an example script in the zero-shot setting. 

In [8]:
baseline_optimizer = Optimizer()
codequest_optimizer = CodeQUESTOptimizer() 

In [9]:
baseline_improved_code = baseline_optimizer(code, baseline_eval['insight'])['code']
print(baseline_improved_code)


class Node:
    def __init__(self, data):
        """
        Initialize a new Node with given data.
        """
        self.data = data
        self.left = None
        self.right = None

def max_height(node):
    """
    Calculate the maximum height of a binary tree.
    
    Args:
    node (Node): The root node of the binary tree.
    
    Returns:
    int: The maximum height of the tree.
    """
    if not isinstance(node, Node) and node is not None:
        raise ValueError("Input must be a Node object or None")
    
    if node is None:
        return 0
    else:
        # Recursively calculate the height of the left and right subtrees
        left_subtree_height = max_height(node.left)
        right_subtree_height = max_height(node.right)
        
        # Return the greater height between the two subtrees, plus one for the current node
        if left_subtree_height > right_subtree_height:
            return left_subtree_height + 1
        else:
            return right_subt

In [10]:
codequest_improved_code = codequest_optimizer(
    code, 
    format_dimwise_feedback_for_improver(codequest_eval['report'])
)['code']
print(codequest_improved_code)


class Node:
    """
    A class to represent a node in a binary tree.
    
    Attributes:
    data : any type
        The data stored in the node.
    left : Node
        The left child node.
    right : Node
        The right child node.
    """
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

def max_height(node, memo=None):
    """
    Calculate the maximum height of a binary tree.
    
    Parameters:
    node : Node
        The root node of the binary tree.
    memo : dict, optional
        A dictionary to store the heights of subtrees to avoid repeated calculations.
    
    Returns:
    int
        The maximum height of the binary tree.
    """
    if memo is None:
        memo = {}
    
    if node is None:
        return 0
    
    if node in memo:
        return memo[node]
    
    try:
        left_height = max_height(node.left, memo)
        right_height = max_height(node.right, memo)
        height = max(left_heig

*Noticed how the codeQUEST Optimizer achieved much well-rounded improvement*

## CodeQUEST Cycle 

Putting Evaluator and Optimizer in an Actor-Critic Loop 

In [11]:
quester = QUESTer(codequest_evaluator, codequest_optimizer) 

In [None]:
quest_result = quester(script_path, testcases_path)

### Final Version 

In [14]:
trajecs = quest_result['trajectories']
final_version = [trajec['code'] for trajec in trajecs if trajec['accepted']][-1]
print(final_version)


class Node:
    """
    A class to represent a node in a binary tree.

    Attributes:
    data : any type
        The data stored in the node.
    left : Node
        The left child node.
    right : Node
        The right child node.
    """
    def __init__(self, data):
        """
        Constructs all the necessary attributes for the node object.

        Parameters:
        data : any type
            The data stored in the node.
        """
        self.data = data
        self.left = None
        self.right = None

def max_height(node):
    """
    Calculate the maximum height of a binary tree.

    Parameters:
    node : Node
        The root node of the binary tree.

    Returns:
    int
        The maximum height of the binary tree.
    """
    if not isinstance(node, Node) and node is not None:
        raise ValueError("Input must be a Node object or None")

    if node is None:
        return 0

    # Use an iterative approach with a stack to avoid deep recursion
    sta