# RepoMap
## Introduction
- Similar to our `repo2file` , we can now enhance the prompt with hints for the llm on what are classes and other code elements.
- Here we split on function definitions and if statements.
- For this we'll use `treesitter` : a tool that can parse code in an AST (Abstract Syntax tree)
- This will allow us to do a more intelligent splitting
- <https://tree-sitter.github.io/tree-sitter/>

Inspired by:
- https://vxrl.medium.com/enhancing-llm-code-generation-with-rag-and-ast-based-chunking-5b81902ae9fc
- https://windsurf.com/blog/using-code-syntax-parsing-for-generative-ai

```
...
| def foo():
|     return 42| z = foo()
...
| if z > 0:
|     print("Positive")
```

Some notes from `Aider` further optimizing the map:
> Of course, for large repositories even just the repo map might be too large for GPT’s context window. Aider solves this problem by sending just the most relevant portions of the repo map. It does this by analyzing the full repo map using a graph ranking algorithm, computed on a graph where each source file is a node and edges connect files which have dependencies. Aider optimizes the repo map by selecting the most important parts of the codebase which will fit into the token budget assigned by the user (via the --map-tokens switch, which defaults to 1k tokens).

> The sample map shown above doesn’t contain every class, method and function from those files. It only includes the most important identifiers, the ones which are most often referenced by other portions of the code. These are the key pieces of context that GPT needs to know to understand the overall codebase.

More info via:
- https://aider.chat/2023/10/22/repomap.html
- https://github.com/Aider-AI/aider/blob/main/aider/repomap.py
- https://github.com/MikeyBeez/codemapper

## Setting up Treesitter

In [1]:
%pip install -q tree_sitter tree_sitter-python


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
from tree_sitter import Language, Parser
import tree_sitter_python

PYTHON_LANGUAGE = Language(tree_sitter_python.language())

parser = Parser(PYTHON_LANGUAGE)

## Formatting code using the AST
- We'll use the AST parser to output the files in smarter prompt.

In [3]:
def format_ast(code: str) -> str:
    tree = parser.parse(bytes(code, "utf8"))

    root = tree.root_node
    terminal = [
    'function_definition',
    'if_statement']

    result = []
    for child in root.children:
        if child.type in terminal:
            result.append("\n...\n")
            #print(child.type, " -> ", code[child.start_byte:child.end_byte])
        snippet = code[child.start_byte:child.end_byte]
        # add a character to every beginning of the line
        snippet = '\n'.join('| ' + line for line in snippet.split('\n'))

        result.append(snippet)
    return ''.join(result)


with open("repo/lib/hello.py", "r") as f:
    sample_code = f.read()

output = format_ast(sample_code)

print(output)



...
| def hello(text):
|     """
|     This function prints 'Hello, World!' to the console.
|     """
|     # Print the message
|     if text == "patrick":
|         hello_patrick()
|     else:
|         print(text)
...
| def hello_patrick():
|     # Print the message
|     print("Oh no ! Not Patrick!")


In [4]:
import os 
def scan_folder_ast(start_path: str, debug = False) -> None:
    output = []
    first = True
    for root, dirs, files in os.walk(start_path):
        rel_path = os.path.relpath(root, start_path)
        
        for file in files:
            file_rel_path = os.path.join(rel_path, file)

            file_path = os.path.join(root, file)
            
            if file.endswith('.pyc'):
                if debug:
                    print(f"Skipping compiled file: {file_rel_path}")
                continue
            if debug:
                print(f"Processing: {file_rel_path}")
            if first:
                output.append(f"{file_rel_path} :\n")
                first = False

            else:
                output.append(f"{file_rel_path} :")

            try:
                with open(file_path, 'r', encoding='utf-8') as in_file:
                    content = in_file.read()
                    output.append(format_ast(content))
            except Exception as e:
                print(f"Error reading file {file_rel_path}: {str(e)}. Skipping.")
                #output.append(f"Error reading file: {str(e)}. Content skipped.\n")
            
            output.append("\n\n")
    return ''.join(output)

repomap = scan_folder_ast('repo')
print(40 * '=')
print(repomap)
print(40 * '=')


./main.py :
| from lib.hello import hello
...
| def main():
|     hello("Hello, World!")
...
| if __name__ == "__main__":
|     main()

lib/hello.py :
...
| def hello(text):
|     """
|     This function prints 'Hello, World!' to the console.
|     """
|     # Print the message
|     if text == "patrick":
|         hello_patrick()
|     else:
|         print(text)
...
| def hello_patrick():
|     # Print the message
|     print("Oh no ! Not Patrick!")

lib/__init__.py :




In [5]:
%pip install -q openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [6]:
# We import the relevant library for OpenAI
from langchain_openai import ChatOpenAI

# The question we're asking
question = """You are an expert in Python: 
- Change the function hello() to greet()
"""

extras = [
    "Refactor the code so that it is more readable and efficient",
    "Output the full code in markdown blocks",
    "Make sure you show all code blocks.",
    "Don't add any comments or explanations.",
    "Make sure to change filenames and function names to be more descriptive."
]

suffix = """
The code is:
"""

# The model we select
model = "gpt-4o-mini"

# ChatOpenAI refers to OpenAI chat model
llm = ChatOpenAI(model=model)

# Now we ask the model to complete the question
for extra in extras:
    question += "- " + extra + "\n"
    print("==========================")
    print("Question: " +question)
    print("==========================")
    print("Answer:")
    answer = llm.invoke(question+suffix + repomap)
    print(answer.content)

Question: You are an expert in Python: 
- Change the function hello() to greet()
- Refactor the code so that it is more readable and efficient

Answer:
To change the function from `hello()` to `greet()` and refactor the code for better readability and efficiency, I will make the necessary modifications in both `main.py` and `lib/hello.py`. Additionally, the documentation string will be updated to reflect the new function name, and I'll ensure that the code organization is clear and concise.

Here are the updated files:

### Updated `main.py`:
```python
from lib.hello import greet

def main():
    greet("Hello, World!")

if __name__ == "__main__":
    main()
```

### Updated `lib/hello.py`:
```python
def greet(text):
    """
    This function prints a greeting message to the console.
    Special case for 'patrick' is handled separately.
    """
    if text.lower() == "patrick":
        _greet_patrick()
    else:
        print(text)

def _greet_patrick():
    """Prints a special message 