# Getting hands-on experience with LLMs

It seems like it will plausibly be valuable to be able to run LLMs locally on my laptop, or being able to hook into them for parts of tasks.

Here's the rough kind of idea of what I want to learn in this project

- Learn how to download, install and interact with a LLM (Llama3) hosted locally on my computer, sending it text directly and asking it simple questions
- Learn how to interact with ChatGPT via an API, so I can do more automation and use it in coding projects
- Understand environments better (through the course of debugging all this stuff) -- added post hoc lol - I screwed up my base conda installation when I was trying to follow one of the videos at the start of this process
- Understand embeddings better, and build skills in visualisation to illustrate the distance between different words
- Make a basic RAG that can read a larger document and answer basic questions from the text (from a file in .md or .txt format) - following [this video](https://www.youtube.com/watch?v=tcqEUSNCn8I). 
 - Tried out both openai embeddings and ollama embeddings 
    - <span style="color:red">Got negative similarity scores when doing a retrieval search from the database with ollama embeddings, which shouldn't be a thing (Should be between zero and 1)!. I found this [git issue](https://github.com/langchain-ai/langchain/issues/10864) logged on the langchain github with people mentioning this issue with a bunch of models (including 1 or 2 talking about getting the issue with locall llama in the last week). It does not yet appear to be resolved. Going to stick with OpenAI embeddings for now, and maybe check back in a while and see if anyone has a solution.</span>
    - [This video](https://www.youtube.com/watch?v=2TJxpyO3ei4) might also help for the ollama version once the negative distance issue is resolved
- extend to be able to read pdfs or arbitrary filetypes, using [this video](https://www.youtube.com/watch?v=2TJxpyO3ei4) then maybe [this video](https://www.youtube.com/watch?v=svzd5d1LXGk) -- or maybe another one entirely. [This documentation](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/) might also be helpful

[this link](https://github.com/langchain-ai/langchain/issues/14872) might help if I get Chroma readonly issues again

## Interacting with ChatGPT via API

This feels like it will be useful in a bunch of different projects. I've done some exploration of this (chat completion, embedding, image generation and text-to-speech) in `openai-test.ipynb`

## Interacting with Llama3 (no embedding)
This next part is just me trying to interact with the model directly and feeding it a text file (no embedding etc), to see whether it'll provide sensible responses. 
I have already downloaded llama3 (the 4GB version - the 40GB version is way way too slow, basically doesn't run). Now I want to see if I can interact with it with the llama package.



In [3]:
import ollama
# note -- extremely bizarre that "import ollama" failed
# after a successful-looking "conda install ollama" and required 
# me to "pip install ollama" in order to work??
import os # will need this later

In [4]:
# I noticed that llama tends to print really long lines so I need to scroll sideways. I'm not enjoying that, so I'm making
# a wrapped print function to fix it

import textwrap

def wprint(text, width = 120):
  wrapped_text = textwrap.fill(text, width=width)
  print(wrapped_text)
  

I've downloaded a transcript of a YouTube video essay about how sound design is used in the recent Batman movie (see `data/personal/batman_sound_video_essay.txt`), and am prompting the model to answer basic questions about it

#### Defining function to take a question input and answer it with information from a specified file

In [5]:
def llama_read_and_respond(input_file, question, print_prompt_with_data = False):
    with open(input_file,'r') as file:
        data = file.read()


    #debugging statement to confirm file was loaded
    if data:
        print("File loaded successfully")
    else:
        print("Load in a file")


    prompt_01 = f"{data} #### From this text, {question}"

    if print_prompt_with_data:
        wprint("Prompt: "+prompt_01)

    print("Generating a response: ")



    response = ollama.chat(model = 'llama3',
                            messages = [{
                            "role":"user",
                            #    "content":"tell me about a cool species of frog"
                            "content": prompt_01
                        }])

    wprint(response["message"]["content"])



#### Questions from Batman video essay

In [6]:
llama_read_and_respond(input_file='data/personal/batman_sound_video_essay.txt', 
                       question = 'tell me about how the sound of rain is used in the movie')


File loaded successfully
Generating a response: 
According to the text, the sound of rain is used in the movie in a way that creates ambiguity and tension. The author
mentions that at first, we can clearly hear the rain, but then when gunfire breaks out, the rain suddenly falls away
into silence. The author provides an example of this by creating a modified version of the scene where the sound of the
rain remains audible even after the gunshots start, making it seem like the gunshots are quieter than they actually are.
This manipulation of the sound of rain is used to create a sense of realism and to immerse the audience in the story.
The author suggests that this technique is not about creating an accurate representation of reality but rather about
creating an impressionistic and expressionist sound that feels right for the scene. By using this technique, the
filmmakers are able to manipulate our perception of the sounds in the scene, adding to the tension and ambiguity of the
moment.

In [7]:
llama_read_and_respond(input_file='data/personal/batman_sound_video_essay.txt', 
                       question = 'who sponsored the video?')

File loaded successfully
Generating a response: 


According to the text, the sponsor of the video is Nebula, a streaming platform. The video also mentions a "Curiosity
Stream" bundle offer, which seems to be related to Nebula as well.


Ok, this seems to be working. Now I'd like it to try reading something from my CV, because it seemed to be struggling with that when I was running it from the terminal. I've just changed the extension from a .tex file to .txt, and I want to see if it can answer basic questions (e.g. about dates of employment). This might be harder for it to do because it's still got all of these latex formatting things in there

#### Questions from CV

In [8]:
llama_read_and_respond(input_file='data/personal/Nik_Mitchell_CV_2024_07_21.txt', 
                       question = 'what is the most recent job on that list, and what did I do in that job?')

File loaded successfully
Generating a response: 
According to the text, the most recent job listed is:  **NZ Royal Commission Inquiry - COVID-19 Lessons Learned**
**Principal Data Analyst (May 2024 -- July 2024)**  In this role, you created high-quality visualizations to support the
Inquiry, including:  1. Visualizations that contextualized pandemic trends (COVID-19 cases, hospitalizations, deaths,
and vaccinations) in New Zealand against policy decisions (e.g., lockdowns, border closures) and pandemic trends in
other countries. 2. Analyses and visualizations highlighting the disparate impact of COVID-19 on Māori and Pacific
ethnic groups and people living in areas of higher socioeconomic deprivation. 3. Worked closely with the Chair of the
Commission to discuss how to tell the story of the COVID pandemic through these visualizations, drawing out lessons for
future pandemics.


This is the correct answer but it seems to really directly copy-paste exactly what I wrote in my bullet points here. Next, asking it to be more concise & summarise a bit.

In [9]:
llama_read_and_respond(input_file='data/personal/Nik_Mitchell_CV_2024_07_21.txt', 
                       question = 'what is the most recent job on that list, and what did I do in that job? Please be concise and summarise the responsibilities rather than copying the whole description')

File loaded successfully
Generating a response: 
The most recent job listed is "Principal Data Analyst" at the NZ Royal Commission Inquiry - COVID-19 Lessons Learned,
which took place from May 2024 to July 2024.  In this role, I was responsible for:  * Creating high-quality
visualizations to support the inquiry * Conducting analyses and creating visualizations to highlight pandemic trends and
disparities in Māori and Pacific ethnic groups and people living in areas of higher socioeconomic deprivation * Working
closely with the Chair of the Commission to discuss how to tell the story of the COVID-19 pandemic through
visualizations  Please note that this summary is based on the provided LaTeX code, which may not accurately reflect my
actual responsibilities or experiences.


This has shaved off a few words without changing the meaning.

# RAG (Retrieval-Augmented Generator)

Why would we want to create a RAG? The above seemed to work just fine.

I have a suspicion that the issue here is to do with context windows. When making a RAG, we're first going to create a database by chunking up all the inputs into manageable-sized pieces (with overlap between chunks) and then using particular embeddings to encode the meaning of the chunks as vectors. Once we have that, we can use the same embeddings on the input question, and then retrieve the top few chunks that have the most similar meaning vectors (e.g. smallest euclidean distance apart) and use this subset of data to construct the answer from.

I suspect that the reason for creating a RAG is this is a context window limitation. The LLM needs to know which information to focus on, so having a method for retrieving the most relevant data allows it to work much more efficiently with a large amount of data.

Now working through [this video](https://www.youtube.com/watch?v=tcqEUSNCn8I)(about how to make a RAG) - will use OpenAI embeddings here rather than Llama.

Has an associated [git repo](https://github.com/pixegami/langchain-rag-tutorial) - might clone this.

I've grabbed a version of the Wizard of Oz from the Gutenberg Project website [link](https://www.gutenberg.org/ebooks/55)



## Planning investigation

I'm kinda curious to try to build something a bit more flexible here, and use that to investigate a few questions
- Does it make a difference if you use OpenAIEmbeddings() or OllamaEmbeddings()?
- Can I build several different chromadbs with different embeddings for different datasets
    - Wizard of Oz
    - Alice in Wonderland
    - My personal files (CV, batman video essay)
        - does it matter if I mash these together into a single database, even though they're about totally different things?
- do you get better performance with bigger chunks?
- can I extend this to read PDF files?

I'm a bit worried about doing this if it's not on the mainline to being able to do AI safety work, but I also think that just being curious and following my nose and making functions to output different things and label files and folders appropriately in python etc is going to be valuable.



#### Getting packages

In [1]:
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
# from langchain.embeddings import OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
import openai 
# from dotenv import load_dotenv
import os
import shutil




In [2]:
def get_chroma_path(data_description, embeddings_description):
    CHROMA_PATH=os.path.join("chroma",data_description, embeddings_description)
    return CHROMA_PATH

def get_data_path(data_description):
    DATA_PATH =os.path.join("data",data_description)
    return DATA_PATH

def get_embedding_function(embeddings_description):
    if embeddings_description == "openai_embeddings":
        embedding_function = OpenAIEmbeddings()
    elif embeddings_description == "ollama_embeddings":
        embedding_function = OllamaEmbeddings(model="nomic-embed-text")
    else:
        print("please specify either 'openai_embeddings' or 'ollama_embeddings'")
    return embedding_function



def generate_data_store(data_description, embeddings_description):

    CHROMA_PATH= get_chroma_path(data_description, embeddings_description)
    DATA_PATH =  get_data_path(data_description)

    print(f"Data source: {data_description}, Embeddings: {embeddings_description}")

    # print(f"CHROMA_PATH is {CHROMA_PATH}")
    # print(f"DATA_PATH is {DATA_PATH}")
    
    


    documents = load_documents(data_path=DATA_PATH)
    chunks = split_text(documents)
    save_to_chroma(chunks, get_embedding_function(embeddings_description), chroma_path= CHROMA_PATH)


def load_documents(data_path):
    loader = DirectoryLoader(data_path, glob="*.md")
    documents = loader.load()
    return documents


def split_text(documents: list[Document]):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=300,
        chunk_overlap=100,
        length_function=len,
        add_start_index=True,
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Split {len(documents)} documents into {len(chunks)} chunks.")
    ## print test example
    # document = chunks[10]
    # print(document.page_content)
    # print(document.metadata)

    return chunks


def save_to_chroma(chunks: list[Document], embedding_function, chroma_path):
    # Clear out the database first.
    if os.path.exists(chroma_path):
        shutil.rmtree(chroma_path)

    # Create a new DB from the documents.
    db = Chroma.from_documents(
        chunks, embedding_function, persist_directory=chroma_path
    )
    db.persist()
    print(f"Saved {len(chunks)} chunks to {chroma_path}.")


# generate_data_store(data_description       = data_descriptions[0],
#                     embeddings_description = embeddings_descriptions[1])

In [3]:
import itertools
# data_descriptions = ["wizard_of_oz","alice_in_wonderland","personal"] ## Commenting out personal for now because it does't use markdown files
data_descriptions = ["wizard_of_oz","alice_in_wonderland"]
# embeddings_descriptions = ["openai_embeddings","ollama_embeddings"]
embeddings_descriptions = ["openai_embeddings"] # Commenting out ollama for now because it's not working

for data_description, embeddings_description in itertools.product(data_descriptions, embeddings_descriptions):
    generate_data_store(data_description, embeddings_description)

Data source: wizard_of_oz, Embeddings: openai_embeddings
Split 1 documents into 1127 chunks.


  warn_deprecated(


Saved 1127 chunks to chroma/wizard_of_oz/openai_embeddings.
Data source: alice_in_wonderland, Embeddings: openai_embeddings
Split 1 documents into 801 chunks.
Saved 801 chunks to chroma/alice_in_wonderland/openai_embeddings.


I have already downloaded llama3 (the 4GB version - the 40GB version is way way too slow, basically doesn't run). Now I want to see if I can interact with it with the llama package

Yay, that works. This is exciting. I should get the question-asking part running up soon too.


## Embedding investigation

I'm also curious now about the embeddings, and how they work for ollama versus openai. So I looked into it in `embeddings_investigation.ipynb`

## Answering questions

Initially we just have the code to do the openai embeddings and chat to openAI. Is it possible to use the OpenAI embeddings and generate the response with Llama3? My guess is yes, but also that the quality of the answers will depend primarily on the quality of the embeddings, since the AI model can't answer correctly if the correct information isn't retrieved.

### Function to ask several basic question questions to pull info from the database

In [4]:
import os
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
import argparse
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate


PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---

Answer the question based on the above context: {question}
"""



# Define a dictionary to hold lists of questions for each book
question_dict = {
    "wizard_of_oz": [
        "How does Dorothy get back home?",
        "What is the role of the Tin Man?",
        "What obstacles do Dorothy and her friends face on their journey?"
    ],
    "alice_in_wonderland": [
        "How does Alice end up in Wonderland?",
        "What characters does Alice meet along her journey?",
        "What are the key events in Alice's adventure?"
    ]
}

def get_query_list(data_description):
    """
    Return a list of questions for the specified book description.
    """
    return question_dict.get(data_description, [])

def answer_query_from_database(data_description, embeddings_description, show_source_passages=False):
    """
    Answer queries from the database based on the book description and embeddings description.
    Optionally show source passages if show_source_passages is True.
    """
    chroma_path = get_chroma_path(data_description, embeddings_description)
    embedding_function = get_embedding_function(embeddings_description)

    db = Chroma(
        persist_directory=chroma_path,
        embedding_function=embedding_function
    )
    
    print(f"Loading the Chroma database from {chroma_path}, using the {embeddings_description} embedding function.")

    # Get the list of questions for the specified book
    query_list = get_query_list(data_description)
    
    # Initialize an empty string to store responses
    all_responses = ""

    for query_text in query_list:
        # Search the DB
        results = db.similarity_search_with_relevance_scores(query_text, k=8)
        
        if len(results) == 0:  # Removed score check
        # if len(results) == 0 or results[0][1] < 0.7:
            print(f"Unable to find matching results for the query: {query_text}")
            continue

        # Prepare the context text
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query_text)

        # print(prompt)

        # Get response from the model
        model = ChatOpenAI()
        response_text = model.predict(prompt)

        # Retrieve sources
        if show_source_passages:
            sources = [doc.metadata.get("source", None) for doc, _score in results]
            formatted_response = f"Question: {query_text}\nResponse: {response_text}\nSources: {sources}"
        else:
            formatted_response = f"Question: {query_text}\nResponse: {response_text}"

        # Append the response to the all_responses string
        all_responses += formatted_response + "\n\n\n"

    # Print the final responses
    print(all_responses)


### Wizard of Oz Qs

#### OpenAI embeddings

In [5]:
# def get_chroma_path(data_description, embeddings_description):
#     CHROMA_PATH=os.path.join("chroma",data_description, embeddings_description)
#     return CHROMA_PATH

# def get_embedding_function(embeddings_description):
#     if embeddings_description == "openai_embeddings":
#         embedding_function = OpenAIEmbeddings()
#     elif embeddings_description == "ollama_embeddings":
#         embedding_function = OllamaEmbeddings(model="nomic-embed-text")
#     else:
#         print("please specify either 'openai_embeddings' or 'ollama_embeddings'")
#     return embedding_function


In [6]:
data_description = "wizard_of_oz" 
embeddings_description = "openai_embeddings"
answer_query_from_database(data_description, embeddings_description, show_source_passages=False)  



Loading the Chroma database from chroma/wizard_of_oz/openai_embeddings, using the openai_embeddings embedding function.


  warn_deprecated(


Question: How does Dorothy get back home?
Response: Dorothy gets back home by clicking the heels of her shoes together three times and saying, "Take me home to Aunt Em!" This magic transports her back to Kansas, but her Silver Shoes fall off during the journey and are lost forever.


Question: What is the role of the Tin Man?
Response: The role of the Tin Woodman is to be rescued by his friends and to eventually be sent for by Oz.


Question: What obstacles do Dorothy and her friends face on their journey?
Response: Dorothy and her friends face obstacles such as a long and sometimes dangerous journey through pleasant and dark terrain, trees that seem to be fighting them and trying to stop their journey, rough and difficult roads with uneven and broken yellow bricks, and the challenge of getting to the Witch's castle.





#### Ollama embeddings

In [7]:
data_description = "wizard_of_oz" 
embeddings_description = "ollama_embeddings"
answer_query_from_database(data_description, embeddings_description, show_source_passages=False)  



Loading the Chroma database from chroma/wizard_of_oz/ollama_embeddings, using the ollama_embeddings embedding function.




Question: How does Dorothy get back home?
Response: Dorothy gets back home by asking the Great Oz to send her back to Kansas.


Question: What is the role of the Tin Man?
Response: The role of the Tin Woodman is to seek a heart from the Great and Terrible Oz so that he can experience love and happiness like other men.


Question: What obstacles do Dorothy and her friends face on their journey?
Response: Dorothy and her friends face obstacles such as a long and sometimes dangerous journey, the uncertainty of finding their way back home, the challenge of crossing the desert, the intimidating presence of the Great Oz, the need to seek out and destroy the Wicked Witch, and the struggle to find courage in the face of adversity.





<span style="color:red">NOTE THE WARNINGS ABOUT HOW RELEVANCE SCORES SHOULD BE BETWEEN 0 AND 1</span>. It seems like the answers aren't that much worse than the OpenAI ones though? So maybe whatever calculation it is doing is actually preserving order?

### Alice in wonderland Qs

#### OpenAI embeddings

In [8]:
data_description = "alice_in_wonderland" 
embeddings_description = "openai_embeddings"
answer_query_from_database(data_description, embeddings_description, show_source_passages=False)  

Loading the Chroma database from chroma/alice_in_wonderland/openai_embeddings, using the openai_embeddings embedding function.
Question: How does Alice end up in Wonderland?
Response: Alice ends up in Wonderland by following a White Rabbit down a rabbit-hole, which leads her to fall into a deep well.


Question: What characters does Alice meet along her journey?
Response: Alice meets the White Rabbit, the Mouse, the March Hare, the Queen, the Knave of Hearts, the Hatter, and the footmen with powdered hair.


Question: What are the key events in Alice's adventure?
Response: 1. Alice trying to find her way out of the hall with locked doors.
2. Alice deciding to grow to her right size and find her way into the garden.
3. Alice being asked to settle a question by three characters.
4. Alice telling the two creatures about her adventures with the White Rabbit.
5. Alice finding a tiny golden key on a glass table.
6. Alice forgetting the key when trying to enter the garden.
7. Alice explaining

#### Ollama embeddings

In [9]:
data_description = "alice_in_wonderland" 
embeddings_description = "ollama_embeddings"
answer_query_from_database(data_description, embeddings_description, show_source_passages=False)  

Loading the Chroma database from chroma/alice_in_wonderland/ollama_embeddings, using the ollama_embeddings embedding function.




Question: How does Alice end up in Wonderland?
Response: Alice ends up in Wonderland by following the direction in which the March Hare was said to live.


Question: What characters does Alice meet along her journey?
Response: Alice meets the March Hare, the Hatter, the Duchess, the Caterpillar, and the Cheshire Cat along her journey.


Question: What are the key events in Alice's adventure?
Response: Some key events in Alice's adventure include attending a mad tea-party with the Hatter, March Hare, and other characters, playing a curious game of croquet with the Queen, listening to the Mock Turtle's story, questioning the concept of lessons that lessen each day, being asked to tell a story by the March Hare and Hatter, and being chosen to give out prizes by the Dodo.





<span style="color:red">NOTE THE WARNINGS ABOUT HOW RELEVANCE SCORES SHOULD BE BETWEEN 0 AND 1</span>. It seems like the answers aren't that much worse than the OpenAI ones though? So maybe whatever calculation it is doing is actually preserving order?