In [1]:
import logging

from automata.cli.commands import reconfigure_logging
from automata.config.base import AgentConfigName
from automata.config.openai_agent import OpenAIAutomataAgentConfigBuilder
from automata.agent.providers import OpenAIAutomataAgent
from automata.singletons.dependency_factory import dependency_factory
from automata.singletons.py_module_loader import py_module_loader
from automata.tools.factory import AgentToolFactory

logger = logging.getLogger(__name__)
reconfigure_logging("DEBUG")

py_module_loader.initialize()

[32mINFO:automata.singletons.py_module_loader:Loading modules with root path: /Users/ocolegrove/automata_fresh_2/automata/core/../.. and py path: /Users/ocolegrove/automata_fresh_2/automata/core/../../automata[0m


In [2]:
# Construct the set of all dependencies that will be used to build the tools
toolkit_list = ["context-oracle"]
tool_dependencies = dependency_factory.build_dependencies_for_tools(toolkit_list)

[32mINFO:automata.singletons.dependency_factory:Building dependencies for toolkit_list ['context-oracle']...[0m
[32mINFO:automata.singletons.dependency_factory:Building symbol_search...[0m
[32mINFO:automata.singletons.dependency_factory:Creating dependency symbol_search[0m
[32mINFO:automata.singletons.dependency_factory:Creating dependency symbol_graph[0m
[32mINFO:automata.singletons.dependency_factory:Creating dependency symbol_code_embedding_handler[0m
[32mINFO:chromadb.telemetry.posthog:Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.[0m
[32mINFO:clickhouse_connect.driver.ctypes:Successfully imported ClickHouse Connect C data optimizations[0m
[36mDEBUG:clickhouse_connect.driver.ctypes:Successfully import ClickHouse Connect C/Numpy optimizations[0m
[32mINFO:clickhouse_connect.json_impl:Using python library for writing JSON byte strings[0m
[32mINFO:chromadb.db.duckdb:loaded in 963 embeddings[0m
[32mINFO:chromadb.db.duckd

In [3]:
# Build the tools
tools = AgentToolFactory.build_tools(toolkit_list, **tool_dependencies)

In [4]:
# Build the agent config
config_name = AgentConfigName("automata-main")

agent_config = (
    OpenAIAutomataAgentConfigBuilder.from_name(config_name)
    .with_tools(tools)
    .with_model("gpt-4")
    .with_max_iterations(2)
    .build()
)

[32mINFO:automata.singletons.dependency_factory:Creating dependency symbol_rank[0m
[32mINFO:automata.singletons.dependency_factory:Creating dependency subgraph[0m
[32mINFO:automata.symbol.graph.navigator:Pre-computing bounding boxes for all rankable symbols[0m
[32mINFO:automata.symbol.graph.navigator:Finished pre-computing bounding boxes for all rankable symbols in 1.0803437232971191 seconds[0m
[32mINFO:automata.symbol.graph.symbol_graph:Building the rankable symbol subgraph...[0m
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 633/633 [00:01<00:00, 430.51it/s]
[32mINFO:automata.symbol.graph.symbol_graph:Built the rankable symbol subgraph[0m


In [11]:
# Initialize the agent
# instructions = "Explain how OpenAI is used by the codebase. Note in particularly how it is used in embeddings and how agents consume it"
instructions = "Provide a comprehensive explanation of how SymbolRank and SymbolSearch work. Return your answer as a valid JSON string."
import textwrap
instructions = textwrap.dedent('''
Your task is to solve the following Github Issue

Title:
Specify list of supported symbols for document generation, add override flag to `EmbeddingHandler`

Body:
The goal is to modify the `run_doc_embedding` script to accept an arbitrary list of new symbols to regenerate [preferably in some human readable form that is parsed]. These symbols should then overwrite existing results in the database with new documentation if necessary. To implement the second step a flag will need to be added to the EmbeddingHandler.


1. Understand the Project: Review the Automata project to get a clear understanding of its goals and how it operates. This includes understanding how the document generation pipeline works, the role of indices and embeddings, and how they interplay to allow for efficient code and documentation generation.

2. Setup the Environment: Clone the Automata project into the "../repo_store/automata" directory relative to your local working directory.

    Code: `git clone git@github.com:emrgnt-cmplxty/Automata.git ../repo_store/automata`

3. Generate New Indices: Once you have cloned the Automata project, navigate to the "scripts" directory and run the `generate_indices.sh` script to generate new indices.

    Code: 
    ```bash
    cd scripts
    ./generate_indices.sh
    ```

    You can verify the creation of the indices by navigating to the `automata-embedding-data` directory and running `git status`.

    Code: 
    ```bash
    cd automata-embedding-data
    git status
    ```
4. Refresh Code Embeddings: From your main directory, run the `run-code-embedding` command. This will refresh the embeddings in the database, rolling the commit hash forward where the symbol source code hasn't changed and recalculating the index where necessary.

    Code: `poetry run automata run-code-embedding`

5. Run Document Embedding: Next, run the `run-doc-embedding` command locally to see it in action.

    Code: `poetry run automata run-doc-embedding`

    This command generates new docs for newly added symbols and moves forward with the symbols that are in the index and map onto the database. This is a crucial part of the pipeline which generates the Automata docs.

6. Understand the Codebase: As you carry out these steps, ensure to take mental notes to understand the workings of the codebase. Insert print statements and other debug aids to get a sense of what's happening in the pipeline.

7. Modify `run_doc_embedding`: Once you are confident with your understanding of the pipeline, your goal is to modify `run_doc_embedding` to accept a list of new symbols to regenerate, and overwrite existing results in the database with new documentation if necessary.

8. Seek Help: If you encounter any issues or have questions, don't hesitate to ask for help. Once you have successfully carried out these tasks, check in for the next steps.

Remember, understanding the pipeline and how everything fits together is the key to this task. 

Begin now.
''')
agent = OpenAIAutomataAgent(instructions, config=agent_config)

[36mDEBUG:automata.agent.providers:Setting up agent with tools = [OpenAITool(function=<bound method ContextOracleToolkitBuilder._get_context of <automata.tools.builders.context_oracle.ContextOracleOpenAIToolkitBuilder object at 0x106d66f10>>, name='context-oracle', description="This tool utilizes the EmbeddingSimilarityCalculator and SymbolSearch to provide context for a given query by computing semantic similarity between the query and all available symbols' documentation and code. The symbol with the highest combined similarity score is identified, with its source code and documentation summary forming the primary context. Additionally, if enabled, the documentation summaries of related symbols (those next most similar to the query) are included.", coroutine=None, properties={'query': {'type': 'string', 'description': 'The query string to search for.'}, 'max_additional_related_symbols': {'type': 'integer', 'description': 'The maximum number of additional related symbols to return do

[36mDEBUG:automata.agent.providers:
------------------------------------------------------------
Session ID: 8728958f-7b7b-4472-bfcd-83f795c7f58d
------------------------------------------------------------

[0m


In [12]:
# Run the agent
result = agent.run()

[36mDEBUG:root:
------------------------------------------------------------------------------------------------------------------------
Latest Assistant Message -- 
[0m
[36mDEBUG:automata.llm.providers.openai:Approximately 3129 tokens were consumed prior to completion generation.[0m


[32mIn[0m [32morder[0m [32mto[0m [32mmodify[0m [32mthe[0m [32m`run-doc-embedding`[0m [32mscript[0m [32mto[0m [32maccept[0m [32man[0m [32marbitrary[0m [32mlist[0m [32mof[0m [32mnew[0m [32msymbols[0m [32mand[0m [32moverwrite[0m [32mexisting[0m [32mresults,[0m [32mwe[0m [32mfirst[0m [32mmust[0m [32munderstand[0m [32mhow[0m [32mthe[0m [32mdocumentation[0m [32mgeneration[0m [32mpipeline[0m [32mworks[0m [32min[0m [32mthe[0m [32mAutomata[0m [32mcodebase.[0m [32m

Particularly,[0m [32mwe[0m [32mare[0m [32minterested[0m [32min[0m [32mthe[0m [32m`EmbeddingHandler`[0m [32mand[0m [32mits[0m [32mrole[0m [32min[0m [32mthe[0m [32mpipeline,[0m [32mas[0m [32mit[0m [32mappears[0m [32mto[0m [32mbe[0m [32mresponsible[0m [32mfor[0m [32mcreating[0m [32mand[0m [32mupdating[0m [32mthe[0m [32membeddings[0m [32mused[0m [32mfor[0m [32mcode[0m [32mand[0m [32mdocumentation[0m [32mgeneratio

[36mDEBUG:automata.llm.providers.openai:Approximately 3302 tokens were after adding the latest message.[0m
[36mDEBUG:root:
------------------------------------------------------------------------------------------------------------------------[0m
[36mDEBUG:automata.agent.providers:Latest User Message -- 
user:
content=Execution Result:



class SymbolDocEmbeddingBuilder(EmbeddingBuilder):
    'Builds `Symbol` documentation embeddings.'

    def __init__(self, embedding_provider: EmbeddingVectorProvider, completion_provider: LLMChatCompletionProvider, symbol_search: SymbolSearch, handler: PyContextHandler) -> None:
        super().__init__(embedding_provider)
        self.completion_provider = completion_provider
        self.symbol_search = symbol_search
        self.handler = handler

    def build(self, source_code: str, symbol: Symbol) -> SymbolDocEmbedding:
        prompt = self._build_prompt(symbol)
        document = self._build_class_document(prompt)
        summary = self.

[36mDEBUG:automata.llm.providers.openai:Approximately 7457 tokens were after adding the latest message.[0m
[36mDEBUG:root:
------------------------------------------------------------------------------------------------------------------------[0m
[36mDEBUG:root:
------------------------------------------------------------------------------------------------------------------------
Latest Assistant Message -- 
[0m
[36mDEBUG:automata.llm.providers.openai:Approximately 7457 tokens were consumed prior to completion generation.[0m


[32mFunction[0m [32mCall:
call_termination

Arguments:
{"result":[0m [32m"To[0m [32msolve[0m [32mthe[0m [32missue,[0m [32mwe[0m [32mneed[0m [32mto[0m [32mrefactor[0m [32mthe[0m [32m`run_doc_embedding`[0m [32mscript[0m [32mand[0m [32mthe[0m [32m`EmbeddingHandler`[0m [32mclass[0m [32mas[0m [32mfollows:

1.[0m [32m**run_doc_embedding[0m [32mscript:**
The[0m [32mscript[0m [32mneeds[0m [32mto[0m [32maccept[0m [32man[0m [32minput[0m [32mlist[0m [32m(array)[0m [32mof[0m [32msymbols[0m [32mthat[0m [32mneed[0m [32mto[0m [32mbe[0m [32mregenerated.[0m [32mIt[0m [32malso[0m [32mneeds[0m [32mto[0m [32mprocess[0m [32mthis[0m [32mlist[0m [32m(in[0m [32maddition[0m [32mto[0m [32mwhatever[0m [32mit[0m [32malready[0m [32mdoes)[0m [32mand[0m [32monly[0m [32mregenerate[0m [32mthese[0m [32mspecified[0m [32msymbols.[0m [32mAn[0m [32mexample[0m [32mof[0m [32mthis[0m [32mcould[0m [32mbe[

[36mDEBUG:automata.llm.providers.openai:Approximately 7969 tokens were after adding the latest message.[0m
[36mDEBUG:root:
------------------------------------------------------------------------------------------------------------------------[0m
[36mDEBUG:automata.agent.providers:Latest User Message -- 
user:
content=Execution Result:

To solve the issue, we need to refactor the `run_doc_embedding` script and the `EmbeddingHandler` class as follows:

1. **run_doc_embedding script:**
The script needs to accept an input list (array) of symbols that need to be regenerated. It also needs to process this list (in addition to whatever it already does) and only regenerate these specified symbols. An example of this could be having the script accept a command line argument that points to a json or txt file containing a list of symbols, which it then parses.

2. **EmbeddingHandler class:**
Add a flag (boolean) property named something like 'overwrite_existing'. When this flag is set to tr

In [7]:
# Print the result
print(f"Result:\n{result}")

Result:
Execution Result:

{"SymbolRank": {"description": "SymbolRank class applies the PageRank algorithm on a graph to rank symbols such as methods and classes based on their semantic context and structural relationships within a software. Symbols are the classes, methods or other elements in a code corpus. A SymbolGraph is constructed where each symbol forms a node and dependencies between symbols form edges. This SymbolGraph maps structural information from the codebase and helps explore symbol dependencies, relationships and hierarchy. Finally, a prepared similarity dictionary between symbols is used in combination with the SymbolGraph to compute their SymbolRanks. This is performed using an iterative computation analogous to Google's PageRank algorithm, considering symbols' similarity scores and their connectivity within the graph.", "methods": {"__init__": "Initializes a SymbolRank instance with a given graph and a SymbolRankConfig. If config is not provided, a default SymbolRan

In [None]:
result