# Comparing two embedding models
In this notebook we will compare two different embedding models, `nomic-embed-text` and `mxbai-embed-large`, by running the same questions on both of them and comparing the results. We compared the results of the two models to determine which one provides better answers to the questions. Then we asked chatGPT to compare the two answers and provide feedback on which one is better. At the end we chose the best model based on the comparison and feedback.

In [5]:
import logging
import os

import chromadb
import httpx
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph_sdk import get_client

load_dotenv("../../.env.research")

True

In [6]:
def load_pdfs_from_directory(directory_path):
    all_documents = []
    for filename in os.listdir(directory_path):
        if filename.endswith(".pdf"):
            file_path = os.path.join(directory_path, filename)
            loader = PDFPlumberLoader(file_path=file_path)
            documents = loader.load()
            all_documents.extend(documents)
    return all_documents

In [7]:
client = chromadb.HttpClient(
    host=os.getenv("CHROMA_HOST"), port=int(os.getenv("CHROMA_PORT"))
)

In [8]:
pdf_docs = load_pdfs_from_directory(os.getenv("DATA_DIR"))
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(pdf_docs)
vectorstore = None
client.delete_collection("rummikub_rules_mxbai")
try:
    vectorstore = Chroma(
        client=client,
        collection_name="rummikub_rules_mxbai",
        create_collection_if_not_exists=True,
        embedding_function=OllamaEmbeddings(
            model="mxbai-embed-large", base_url=os.getenv("OLLAMA_URI")
        ),
    )
    vectorstore.add_documents(documents=all_splits)
except httpx.ConnectError as e:
    logging.error(f"Could not connect to Chroma: {e}")
    vectorstore.delete_collection()

In [9]:
pdf_docs = load_pdfs_from_directory(os.getenv("DATA_DIR"))
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(pdf_docs)
vectorstore = None
client.delete_collection("rummikub_rules_nomic")
try:
    vectorstore = Chroma(
        client=client,
        collection_name="rummikub_rules_nomic",
        embedding_function=OllamaEmbeddings(
            model="nomic-embed-text", base_url=os.getenv("OLLAMA_URI")
        ),
    )
    vectorstore.add_documents(documents=all_splits)
except httpx.ConnectError as e:
    logging.error(f"Could not connect to Chroma: {e}")
    vectorstore.delete_collection()

In [20]:
url = "http://localhost:8123"
client = get_client(url=url)

### Question 1
"How do I win the game?"

In [21]:
question = "How do I win the game?"

llama_3_1_nomic_thread = await client.threads.create()
llama_3_1_mxbai_thread = await client.threads.create()

llama_3_1_nomic_graph = "chatbot_3.1_nomic"
llama_3_1_mxbai_graph = "chatbot_3.1_mxbai"

In [22]:
async for chunk in client.runs.stream(
    llama_3_1_nomic_thread["thread_id"],
    llama_3_1_nomic_graph,
    input={"question": question},
    stream_mode="messages-tuple",
):
    if chunk.event == "messages":
        print(
            "".join(
                data_item["content"]
                for data_item in chunk.data
                if "content" in data_item
            ),
            end="",
            flush=True,
        )

{"datasource": "vectorstore"}{

"binary_score": "yes"

}{

"binary_score": "yes"

}{

"binary_score": "yes"

} 

 
 To win the game of Rummikub, you must be the first player to play all your tiles. If no one has gone out and the pool of tiles is empty, the player with the lowest tile count on their rack wins the round. In a tie, the player with the highest score is declared the winner.{
  "binary_score": "yes"
}{
  "binary_score": "yes"
}

In [23]:
async for chunk in client.runs.stream(
    llama_3_1_mxbai_thread["thread_id"],
    llama_3_1_mxbai_graph,
    input={"question": question},
    stream_mode="messages-tuple",
):
    if chunk.event == "messages":
        print(
            "".join(
                data_item["content"]
                for data_item in chunk.data
                if "content" in data_item
            ),
            end="",
            flush=True,
        )

{"datasource": "vectorstore"}{

"binary_score": "yes"

} 

 
 {

"binary_score": "yes"

}{

"binary_score": "yes"

} 

 
 To win the game of Rummikub, you can achieve a win in one of three ways:

1. **Open Hand**: Win 100 points if no other player has melded.
2. **Foot**: Score 200 points.
3. **Closed Hand**: The descriptions and scores for various types of closed hands are not explicitly stated in the provided context, but it is mentioned that the winner wins a number of points based on the type of winning hand.

Additionally, if you discard a joker as your final discard, all scores for that hand are doubled.{
  "binary_score": "yes"
}{
  "binary_score": "yes"
}

## Comparison first question
### Answers
- **llama3.1_nomic**: "To win the game of Rummikub, you must be the first player to play all your tiles. If no one has gone out and the pool of tiles is empty, the player with the lowest tile count on their rack wins the round. In a tie, the player with the highest score is declared the winner."

- **llama3.1_mxbai**: "To win the game of Rummikub, you can achieve a win in one of three ways:

1. **Open Hand**: Win 100 points if no other player has melded.
2. **Foot**: Score 200 points.
3. **Closed Hand**: The descriptions and scores for various types of closed hands are not explicitly stated in the provided context, but it is mentioned that the winner wins a number of points based on the type of winning hand.

Additionally, if you discard a joker as your final discard, all scores for that hand are doubled."

### My comparison
nomic has a better response because it was clear and concise. mxbai's response was too verbose and did not provide a clear answer to the question. But I think the mxbai response is not that good because the quality of the docs is not that good. I still need to fix that.

### Chatgpt comparison
**Best Answer: Llama3.1_nomic**
Llama3.1_nomic provides a clear and accurate answer that aligns with the standard rules of Rummikub. It succinctly explains the win conditions (playing all tiles or having the lowest tile count if the pool is empty) and mentions the tie-breaking rule. The response is concise, well-structured, and directly relevant to the question.

**Why Others are Less Good:**

- **Llama3.1_mxbai**: This response introduces concepts and scoring mechanics that are unrelated to Rummikub's standard rules, such as "Open Hand," "Foot," and "Closed Hand," which might confuse users. It also lacks clarity and relevance, making it unsuitable for players looking for accurate guidance on winning the game. While the mention of the joker is interesting, it is contextually irrelevant to the core win conditions.

### Question 2
"What is a joker and what can it do?"

In [24]:
question = "What is a joker and what can it do?"

llama_3_1_nomic_thread = await client.threads.create()
llama_3_1_mxbai_thread = await client.threads.create()

llama_3_1_nomic_graph = "chatbot_3.1_nomic"
llama_3_1_mxbai_graph = "chatbot_3.1_mxbai"

In [25]:
async for chunk in client.runs.stream(
    llama_3_1_nomic_thread["thread_id"],
    llama_3_1_nomic_graph,
    input={"question": question},
    stream_mode="messages-tuple",
):
    if chunk.event == "messages":
        print(
            "".join(
                data_item["content"]
                for data_item in chunk.data
                if "content" in data_item
            ),
            end="",
            flush=True,
        )

{"datasource": "vectorstore"}{

"binary_score": "yes"

} 

 
 {

"binary_score": "yes"

}{

"binary_score": "yes"

} 

 
 A joker in Rummikub is a special tile that can represent any tile in the game. It has several key functions:

1. **Initial Meld**: A player must play their initial meld before they can retrieve a joker.
2. **Clearing the Joker**: There are four ways to clear a joker:
	* Replacing it with tiles on the rack or table that match its value (e.g., three 3s).
	* Splitting a run and clearing the joker.
	* Adding a tile to complete a set, which clears the joker.
	* Substituting a real tile for the joker if you have it in hand.
3. **Combination Manipulation**: A combination with a joker can be expanded by adding tiles, but not rearranged or modified while the joker is present.
4. **Tile Representation**: The joker represents a specific tile and cannot be changed unless the player supplies the original tile from their hand.

In summary, a joker is a versatile tile that allows 

In [26]:
async for chunk in client.runs.stream(
    llama_3_1_mxbai_thread["thread_id"],
    llama_3_1_mxbai_graph,
    input={"question": question},
    stream_mode="messages-tuple",
):
    if chunk.event == "messages":
        print(
            "".join(
                data_item["content"]
                for data_item in chunk.data
                if "content" in data_item
            ),
            end="",
            flush=True,
        )

{"datasource": "vectorstore"}{

"binary_score": "yes"

} 

 
 {

"binary_score": "yes"

}{

"binary_score": "yes"

} 

 A joker in Rummikub is a special tile that can be used as any tile in a set. It has no specific number or color, but its value and color are determined by the set it is part of. A joker can be used to complete a set on the table, and it can also be retrieved from a set by a player who can replace it with tiles that keep the set legitimate.

Key actions involving a joker include:

* Using a joker as any tile in a set
* Retrieving a joker from a set on the table and replacing it with other tiles
* Playing a joker to make a new set after retrieving it

Note: A joker cannot be taken into a player's hand for later use, but it can be used to complete sets or create new ones.{
  "binary_score": "yes"
}{
  "binary_score": "yes"
}

## Comparison second question
### Answers
- **llama3.1_nomic**: "A joker in Rummikub is a special tile that can represent any tile in the game. It has several key functions:

1. **Initial Meld**: A player must play their initial meld before they can retrieve a joker.
2. **Clearing the Joker**: There are four ways to clear a joker:
	* Replacing it with tiles on the rack or table that match its value (e.g., three 3s).
	* Splitting a run and clearing the joker.
	* Adding a tile to complete a set, which clears the joker.
	* Substituting a real tile for the joker if you have it in hand.
3. **Combination Manipulation**: A combination with a joker can be expanded by adding tiles, but not rearranged or modified while the joker is present.
4. **Tile Representation**: The joker represents a specific tile and cannot be changed unless the player supplies the original tile from their hand.

In summary, a joker is a versatile tile that allows players to create sets and combinations in various ways, but it must be cleared according to specific rules."

- **llama3.1_mxbai**: "A joker in Rummikub is a special tile that can be used as any tile in a set. It has no specific number or color, but its value and color are determined by the set it is part of. A joker can be used to complete a set on the table, and it can also be retrieved from a set by a player who can replace it with tiles that keep the set legitimate.

Key actions involving a joker include:

* Using a joker as any tile in a set
* Retrieving a joker from a set on the table and replacing it with other tiles
* Playing a joker to make a new set after retrieving it

Note: A joker cannot be taken into a player's hand for later use, but it can be used to complete sets or create new ones."

### My comparison
Both answers were good, but I think the nomic response was better because it was more detailed and provided a better explanation of the joker's functions. The mxbai response was also good, but it was more concise and did not provide as much detail.

### Chatgpt comparison
**Best Answer: Llama3.1_nomic**
Llama3.1_nomic offers a comprehensive and detailed explanation of the joker's functions, covering its versatility and the specific rules for clearing and manipulating it. The response is well-structured, addressing multiple scenarios such as initial melds, clearing methods, and limitations. It is both informative and aligned with standard Rummikub rules, making it the best choice for a user seeking clarity and accuracy.

**Why Others are Less Good:**

- **Llama3.1_mxbai**: This response provides a basic explanation of the joker’s use but lacks depth and clarity. While it mentions retrieving and using jokers in sets, it omits critical details about clearing methods and restrictions on rearranging combinations with jokers. The explanation feels incomplete and less user-friendly compared to Llama3.1_nomic.


### Question 3
"What happens when the pool of tiles is empty?"

In [27]:
question = "What happens when the pool of tiles is empty?"

llama_3_1_nomic_thread = await client.threads.create()
llama_3_1_mxbai_thread = await client.threads.create()

llama_3_1_nomic_graph = "chatbot_3.1_nomic"
llama_3_1_mxbai_graph = "chatbot_3.1_mxbai"

In [28]:
async for chunk in client.runs.stream(
    llama_3_1_nomic_thread["thread_id"],
    llama_3_1_nomic_graph,
    input={"question": question},
    stream_mode="messages-tuple",
):
    if chunk.event == "messages":
        print(
            "".join(
                data_item["content"]
                for data_item in chunk.data
                if "content" in data_item
            ),
            end="",
            flush=True,
        )

{"datasource": "vectorstore"}{

"binary_score": "yes"

}{

"binary_score": "yes"

}{

"binary_score": "no"

}When the pool of tiles is empty, play continues until no more plays can be made. If a player cannot or does not wish to lay down any tiles, the game ends, and players count the total value of tiles in their racks. The player with the lowest tile count wins, and each of the other players loses the difference between their tile count and that of the winner.{
  "binary_score": "yes"
}{
  "binary_score": "yes"
}

In [29]:
async for chunk in client.runs.stream(
    llama_3_1_mxbai_thread["thread_id"],
    llama_3_1_mxbai_graph,
    input={"question": question},
    stream_mode="messages-tuple",
):
    if chunk.event == "messages":
        print(
            "".join(
                data_item["content"]
                for data_item in chunk.data
                if "content" in data_item
            ),
            end="",
            flush=True,
        )

{"datasource": "vectorstore"}{

"binary_score": "yes"

}{

"binary_score": "no"

}{

"binary_score": "no"

}When the pool of tiles is empty, play continues until no more plays can be made, at which point the game ends.{
  "binary_score": "yes"
}{
  "binary_score": "yes"
}

## Comparison second question
### Answers
- **llama3.1_nomic**: "When the pool of tiles is empty, play continues until no more plays can be made. If a player cannot or does not wish to lay down any tiles, the game ends, and players count the total value of tiles in their racks. The player with the lowest tile count wins, and each of the other players loses the difference between their tile count and that of the winner."

- **llama3.1_mxbai**: "When the pool of tiles is empty, play continues until no more plays can be made, at which point the game ends."

### My comparison
Nomic once again provides a more detailed and comprehensive answer, explaining the endgame conditions when the pool of tiles is empty. The mxbai response is concise but lacks the additional details that nomic provides.

### Chatgpt comparison
**Best Answer: Llama3.1_nomic**
Llama3.1_nomic provides a clear and detailed explanation of what happens when the pool is empty. It describes the continuation of play, the conditions for ending the game, and the scoring process. This answer aligns well with Rummikub rules and offers users a complete understanding of the scenario.

**Why Others are Less Good:**

- **Llama3.1_mxbai**: While it states that play continues until no more moves can be made, it lacks crucial details about scoring and how the winner is determined. This makes the answer less informative and less useful for players seeking clarity about endgame rules.

## Conclusion
Overall, the nomic performed the best. While the mxbai is not bad, I think nomic got the best and most info out of the docs. These need improvement, so it is impressive how good both are. So we will pick nomic as the embedding model.