In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain.text_splitter import Language
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser

In [27]:
repo_path = "C:\\Users\\esmba\\OneDrive\\Documents\\CodeRepos\\ContextReference\\codeinterpreter-api"
# repo_path = "C:\\Users\\esmba\\OneDrive\\Documents\\CodeRepos\\ContextReference\\langchain\\libs\\langchain\\langchain"

In [28]:
# Load
loader = GenericLoader.from_filesystem(
    repo_path,
    glob="**/*",
    suffixes=[".py"],
    parser=LanguageParser(language=Language.PYTHON, parser_threshold=500)
)
documents = loader.load()
len(documents)

34

In [29]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
python_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.PYTHON, 
                                                               chunk_size=2000, 
                                                               chunk_overlap=200)
texts = python_splitter.split_documents(documents)
len(texts)

59

In [20]:
OpenAI_API_key = os.getenv("OPENAI_API_KEY")

In [30]:
db = Chroma.from_documents(texts, OpenAIEmbeddings(disallowed_special=()))
retriever = db.as_retriever(
    search_type = "mmr",
    search_kwargs = {"k": 8},
)

In [23]:
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationalRetrievalChain
llm = ChatOpenAI(model_name="gpt-4") 
memory = ConversationSummaryMemory(llm=llm,memory_key="chat_history",return_messages=True)
qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)

In [10]:
question = "Give me an overview of the python module codeinterpreter-api"
result = qa(question)

In [11]:
print(result['answer'])

The "codeinterpreter-api" appears to be a Python module designed to aid with various data science, data analysis, and file manipulation tasks. 

Key features of this module include the following:

1. **Code Interpreter Session**: The module provides a `CodeInterpreterSession` class, which allows for the creation of a session for interpreting and executing Python code.

2. **File Handling**: It has a `File` class that helps in file manipulations, including file conversion and direct file manipulation.

3. **User Requests**: The `UserRequest` class is used to encapsulate user requests, which include a content attribute and a list of files.

4. **Code Interpreter Response**: This module provides a `CodeInterpreterResponse` class to encapsulate the response from the code interpreter agent. This response may contain a content attribute, a list of files, and a code log.

5. **Agents and Language Models**: The module uses various agents and language models to process and handle user requests.

In [12]:
question = "Suggest a series of steps for how I can use codeinterpreter-api for the following task. I want to have it reference a codebase, then produce code that will work with that code base."
result = qa(question)
print(result['answer'])

Here are the general steps on how to use the codeinterpreter-api module to reference a codebase and subsequently produce code that will work with that code base:

1. Import the necessary classes and functions from the codeinterpreter-api module:

```python
from codeinterpreterapi import CodeInterpreterSession, File
```

2. Create an instance of the CodeInterpreterSession. This can be done either synchronously or asynchronously:

```python
session = CodeInterpreterSession()
```

or

```python
async with CodeInterpreterSession() as session:
```

3. Define your user request. This is a string that describes what you want the AI to do with the codebase:

```python
user_request = "Analyze this dataset and plot something interesting about it."
```

4. If you have any files related to your request (such as datasets), add them to a list of files:

```python
files = [
    File.from_path("examples/assets/iris.csv"),
]
```

5. Generate the code response. This can be done either synchronously or as

In [13]:
question = "Can you verify that the above code will run in a Jupyter Notebook?"
result = qa(question)
print(result['answer'])

The assistant can execute code within a sandboxed Jupyter kernel environment. However, it's important to note that while the assistant can execute code, it cannot guarantee that the code will execute correctly every time, as this depends on the specifics of the code itself. If the code is written correctly and does not contain any errors, it should execute correctly in the assistant's environment. If errors are encountered, the assistant will output that there was an error with the prompt after two unsuccessful attempts.


In [22]:
question = "What LangChain tools does codeinterpreter-api use in this case? Could it look things up on the internet in addition to looking at the files? Does it use the provided files by implementing Retrieval Augmented Generation?"
result = qa(question)
print(result['answer'])

The `codeinterpreter-api` Python module utilizes several tools from LangChain. In this case, the `BaseTool` class is used as a base class for the `ExampleKnowledgeBaseTool` class, which is a tool for getting salary data of company employees. 

The `CodeInterpreterSession` class from `codeinterpreterapi` utilizes the `additional_tools` parameter to include additional tools, such as `ExampleKnowledgeBaseTool`. The `CodeInterpreterSession` also uses `BaseLanguageModel` and `ChatAnthropic` from LangChain for language model operations.

The `codeinterpreter-api` module is capable of searching the internet. As mentioned in the context, "the code interpreter has internet access so it can download the bitcoin chart from yahoo finance and plot it for you".

However, there's no explicit indication in the provided context that the `codeinterpreter-api` module uses Retrieval Augmented Generation (RAG) for processing the provided files. RAG generally involves querying a database of documents to enh

In [23]:
question = "Can you explain how it would process code files, if I provided them using File.from_path(path_to_codebase)"
result = qa(question)
print(result['answer'])

The `codeinterpreter-api` uses the `File.from_path(path_to_codebase)` method to load a file from a specified path. This function creates an instance of the `File` class where the file's name and content are defined based on the file located at the given path. 

After the file is loaded, it's passed on to the `CodeInterpreterSession` where it can be used in the code interpretation process. The `CodeInterpreterSession` class is responsible for managing a session of code interpretation. In a session, the user provides a code request (or prompt) and possibly some files that are required for the code execution.

The `generate_response(user_request, files)` method is then used to generate a response based on the provided user request and files. This method processes the user's request, executes the required code, and generates an appropriate response.

In the provided context, the file `iris.csv` is loaded using `File.from_path("examples/assets/iris.csv")` and passed to `session.generate_res

In [19]:
question = "Write a function using codeinterpreter-api such that I can call asyncio.run(main()) in a Jupyter Notebook cell and see the output. The function should take in a directory containing code files, and output new code that will work with the code files based on a user prompt."
result = qa(question)
print(result['answer'])

You can use the codeinterpreter-api to create a function that you can call in a Jupyter Notebook cell. Here's a basic example of how you can do this:

```python
import os
import asyncio
from codeinterpreterapi import CodeInterpreterSession, File

async def analyze_code_files(directory, user_prompt):
    # Initialize a list to hold File objects
    code_files = []

    # Walk through directory to find all code files
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.py'):  # or other code file extensions
                file_path = os.path.join(root, file)
                with open(file_path, 'rb') as f:
                    content = f.read()
                code_file = File(name=file, content=content)
                code_files.append(code_file)

    # Create a new CodeInterpreterSession
    async with CodeInterpreterSession() as session:
        # Generate a response based on the user prompt and code files
        response = awai

In [25]:
question = "How might I add Retrieval Augmented Generation to codeinterpreter-api"
result = qa(question)
print(result['answer'])

The provided context does not contain specific information on how to incorporate Retrieval Augmented Generation (RAG) into the `codeinterpreter-api` module.


In [24]:
question = "Give me an overview of the python module LangChain"
result = qa(question)
print(result['answer'])

LangChain is a Python module that is primarily used for managing the LangChain API. This module is intended for LangChain developers and not for general users. It includes various chains, utilities, helper functions, and tracers.

Chains in LangChain include LLMRequestsChain, LLMSummarizationCheckerChain, MapReduceChain, OpenAIModerationChain, NatBotChain, QAWithSourcesChain, RetrievalQAWithSourcesChain, VectorDBQAWithSourcesChain, QAGenerationChain, and many others. These chains are for different functions in LangChain.

LangChain also provides various utility interfaces to interact with third-party systems and packages. Some of these include AlphaVantageAPIWrapper, ApifyWrapper, BibtexparserWrapper, BingSearchAPIWrapper, GooglePlacesAPIWrapper, GraphQLAPIWrapper, MetaphorSearchAPIWrapper, PythonREPL, SparkSQL, and others.


The module also includes utility functions that are not dependent on any other LangChain module. Some of the utility functions include StrictFormatter, check_pack

In [25]:
question = "I want you to imagine yourself as a python developer with extensive knowledge of applications built on large language models. I want you to suggest a series of steps to use Langchain to complete the following task: 'Produce a streamlit app where a user can upload a directory containing python code. After uploading the code, the app will process it, turning it into embeddings and placing it in a vector store for Retrieval Augmented Generation. The app should have a chat window where the user can type messages. The messages should be used as a prompt to a LLM on the back end. LangChain will query the LLM in addition to using any appropriate agents or tools in the LangChain library as well as the previously uploaded code directory, which has been made available for Retrieval Augmented Generation. The user will then see the response to the query in the messaging window."
result = qa(question)
print(result['answer'])

Here's a series of steps on how you might use LangChain to produce a Streamlit application for the task described:

1. **Setup LangChain**: First, you need to import and setup the necessary components from the LangChain library. This includes importing the LLMChain, language models, and any other necessary modules. You will also need to set up the prompt templates and LLM chains.

2. **Setup Streamlit Interface**: Use the Streamlit library to create the user interface for your application. This should include a file uploader for the user to upload their Python code directory, a text input field for the user to enter messages, and a text output field to display the response from the LLM.

3. **File Upload and Processing**: Use the Streamlit file uploader to allow the user to upload a directory of Python code. You will need to write a function to process these files into embeddings. This could involve using a language model to generate the embeddings, which are then stored in a vector st

In [26]:
question = "Can LangChain handle jupyter nootbook .ipynb files?"
result = qa(question)
print(result['answer'])

The provided context does not include information about LangChain's ability to handle Jupyter notebook .ipynb files.


In [31]:
question = "What can you tell me about the different options for AgentType when initializing an agent?"
result = qa(question)
print(result['answer'])

The `AgentType` is an enumerator which defines the type of agent. Here are the different options available:

1. `ZERO_SHOT_REACT_DESCRIPTION`: This represents a Zero-Shot React Description agent.

2. `REACT_DOCSTORE`: This represents a React Document Store agent.

3. `SELF_ASK_WITH_SEARCH`: This represents a Self-Ask-With-Search agent.

4. `CONVERSATIONAL_REACT_DESCRIPTION`: This represents a Conversational React Description agent.

5. `CHAT_ZERO_SHOT_REACT_DESCRIPTION`: This represents a Chat Zero-Shot React Description agent.

6. `CHAT_CONVERSATIONAL_REACT_DESCRIPTION`: This represents a Chat Conversational React Description agent.

7. `STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION`: This represents a Structured Chat Zero-Shot React Description agent.

8. `OPENAI_FUNCTIONS`: This represents an OpenAI Functions agent.

9. `OPENAI_MULTI_FUNCTIONS`: This represents an OpenAI Multi-Functions agent.

Each of these types is associated with a specific type of agent, providing specific functio

In [32]:
question = "Can you further explain the 'SELF_ASK_WITH_SEARCH' type?"
result = qa(question)
print(result['answer'])

The 'SELF_ASK_WITH_SEARCH' type refers to a class in the provided code that implements a specific type of agent for a question-answering system. This agent, as the name suggests, uses a "self-ask-with-search" strategy, which is a method where the agent asks itself questions and uses those answers to gather information for the final response.

The agent is defined as a class 'SelfAskWithSearchAgent' that inherits from a base 'Agent' class. It has several properties and methods including:

1. output_parser: This is an instance of AgentOutputParser class that is used to parse the agent's output.
   
2. _get_default_output_parser: This is a class method that returns an instance of SelfAskOutputParser class.
   
3. _agent_type: This property returns the identifier of an agent type. In this case, it returns 'SELF_ASK_WITH_SEARCH'.
   
4. create_prompt: This is a class method that returns a prompt for the agent, which does not depend on any tools.
   
5. _validate_tools: This method validates