In [41]:
# https://medium.com/@thakermadhav/build-your-own-rag-with-mistral-7b-and-langchain-97d0c92fa146

In [42]:
import os
import json
from pathlib import Path

In [43]:
# Configuration

# FAISS
faiss_gpu = False
faiss_embedding_model_name = 'jinaai/jina-embeddings-v2-base-en'
retrieve_topk = 6

# Text splitting settings
chunk_size = 1000
chunk_overlap = 200

# Rag settings
consistency_samples = 5

# Quantization settings
quantization_enabled = True
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = True

# Model
model_name='mistralai/Mistral-7B-Instruct-v0.1'

# Data
data_path = Path("./munchkin_rules/")

device = "cuda"

In [44]:
# nltk is used for PDF processing. Here we ensure anything it downloads goes to
# the cache folder, so it doesn't have to download again
nltk_data_path = Path("~/.cache/nltk_data").expanduser()
nltk_data_path.mkdir(parents=True, exist_ok=True)
os.environ["NLTK_DATA"] = str(nltk_data_path)

In [45]:
!pip install chromadb

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m


In [46]:
# Deps for PDF parsing
!pip install "unstructured[pdf]"
!sudo apt-get install -y poppler-utils tesseract-ocr

# I can't even remember why we need this one
!pip install sentence-transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tesseract-ocr is already the newest version (4.1.1-2.1build1).
poppler-utils is already the newest version (22.02.0-2ubuntu0.3).
0 upgraded, 0 newly installed, 0 to remove and 65 not upgraded.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m


In [94]:
from typing import Optional, Tuple

import torch

from langchain.callbacks.tracers import ConsoleCallbackHandler
from langchain.chains import LLMChain
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.prompts import PromptTemplate, StringPromptTemplate
from langchain.retrievers import ParentDocumentRetriever
from langchain.schema import AIMessage
from langchain.schema.runnable import RunnablePassthrough
from langchain.storage import InMemoryStore
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import VectorStore
from langchain_community.vectorstores.chroma import Chroma
from langchain_core.language_models import BaseChatModel
from langchain_core.runnables import RunnableLambda
from langchain_core.runnables import Runnable
from transformers import (
    PreTrainedModel, 
    PreTrainedTokenizerBase,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    AutoModel,
)

In [48]:
%load_ext autoreload
%autoreload 2
from util import HuggingFaceChatModel, VectorStoreRetrieverWithTextSplitter, parse_pdf

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [49]:
print(f"{torch.__version__=}")
print(f"{torch.version.cuda=}")
print(f"{torch.cuda.is_available()=}")
print(f"{torch.cuda.device_count()=}")

if "cuda" in device:
    assert torch.cuda.is_available(), "CUDA is not available"

torch.__version__='2.1.1+cu118'
torch.version.cuda='11.8'
torch.cuda.is_available()=True
torch.cuda.device_count()=1


In [50]:
!nvidia-smi

Thu Dec 28 17:54:00 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:07:00.0  On |                  N/A |
|  0%   55C    P5              45W / 420W |  10212MiB / 24576MiB |     32%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Build Chat Model

In [51]:
def load_transformers_model(model_name:str, bnb_config:Optional[BitsAndBytesConfig]=None) -> Tuple[PreTrainedModel, PreTrainedTokenizerBase]:
    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        trust_remote_code=True,
        device_map="auto"
    )
    tokenizer.pad_token_id = tokenizer.eos_token_id

    if bnb_config is not None:
        model_kwargs = {"quantization_config": bnb_config}
    else:
        model_kwargs = {"torch_dtype": torch.float16}

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",
        **model_kwargs,
    )

    return (model, tokenizer)


compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

model, tokenizer = load_transformers_model(model_name, bnb_config)

prompt_encoded = tokenizer.encode("What is the capital of the U.S.?", return_tensors="pt")
results = model.generate(prompt_encoded)
sequence = results[0]
sequence = tokenizer.decode(sequence)
print(sequence)

Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00,  2.77s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> What is the capital of the U.S.?

Washington, D.C


In [52]:
tokenizer.pad_token_id

2

In [53]:
chat_model = HuggingFaceChatModel(model=model, tokenizer=tokenizer, generate_kwargs={}, max_tokens=1000)
chat_model.invoke("What is the capital of the U.S.?")

AIMessage(content='The capital of the United States is Washington, D.C.')

## Build Retriever

In [54]:
def load_docs():
    rule_docs = []
    for filename in data_path.glob("*.pdf"):
        print(f"Processing {filename}")
        rule_docs.extend(parse_pdf(filename))
    return rule_docs

In [55]:
rule_docs = load_docs()

Processing munchkin_rules/munchkin_rules-1.pdf
Processing munchkin_rules/puppies-rules.pdf
Processing munchkin_rules/princesses_rules.pdf
Processing munchkin_rules/munch_4_rules_20thp.pdf


In [56]:
def load_embedding_model(model_name:str) -> HuggingFaceEmbeddings:
    # We first load the embedding model using AutoModel so that we can pass
    # trust_remote_code=True to install it, which we cannot do with 
    # HuggingFaceEmbeddings (https://github.com/langchain-ai/langchain/issues/6080)
    _ = AutoModel.from_pretrained(model_name, trust_remote_code=True)

    embedding_model = HuggingFaceEmbeddings(model_name=model_name, model_kwargs={'device': 'cpu'})
    return embedding_model

In [57]:
def build_vectorstore(embedding_model:HuggingFaceEmbeddings) -> VectorStore:
    db = Chroma(embedding_function=embedding_model)
    return db

In [58]:
# Retriever config dataclass
from dataclasses import dataclass
import math

@dataclass
class RetrieverConfig:
    max_context_size:int = 4096
    percent_context_use:float = 0.5
    parent_percent:float = 0.25
    parent_overlap_percent:float = 0.1
    child_percent:float = 0.25
    child_overlap_percent:float = 0.1
    retrieve_extra_results_percent:float = 0.0


def build_retriever(
    tokenizer:PreTrainedTokenizerBase,
    vectorstore:VectorStore,
    config:RetrieverConfig,
):
    context_size = int(config.max_context_size * config.percent_context_use)
    parent_chunk_size = int(context_size * config.parent_percent)
    parent_chunk_overlap = int(parent_chunk_size * config.parent_overlap_percent)
    parent_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
        tokenizer,
        chunk_size=parent_chunk_size,
        chunk_overlap=parent_chunk_overlap
    )

    number_docs_to_use = max(1, context_size // parent_chunk_size)
    if config.retrieve_extra_results_percent == 0:
        k = number_docs_to_use
    else:
        k = number_docs_to_use + math.ceil(number_docs_to_use * config.retrieve_extra_results_percent)
    k = int(k)

    search_kwargs={"k": k}

    if config.child_percent < 0.5:
        child_chunk_size = int(parent_chunk_size * config.child_percent)
        child_chunk_overlap = int(child_chunk_size * config.child_overlap_percent)
        child_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
            tokenizer,
            chunk_size=child_chunk_size,
            chunk_overlap=child_chunk_overlap
        )
        store = InMemoryStore()
        retriever = ParentDocumentRetriever(
            vectorstore=vectorstore,
            docstore=store,
            child_splitter=child_splitter,
            parent_splitter=parent_splitter,
            search_kwargs=search_kwargs,
        )
    else:
        retriever = VectorStoreRetrieverWithTextSplitter(
            vectorstore=vectorstore,
            text_splitter=parent_splitter,
            search_kwargs=search_kwargs,
        )

    return retriever

db = build_vectorstore(load_embedding_model(faiss_embedding_model_name))
retriever = build_retriever(tokenizer, db, RetrieverConfig())
retriever.add_documents(rule_docs)

In [59]:
query = "Can I play a Go Up a Level card during combat?"
result = db.similarity_search(query)
print(f"Query: {query}")
print()
print(f"From vectorstore:")
print(f"Result count: {len(result)}")
print(f"Doc length: {len(result[0].page_content)}")
print(f"Content: {result[0].page_content[:100]}")
print()

result = retriever.invoke("Can I play a Go Up a Level card during combat?")
print(f"From retriever:")
print(f"Result count: {len(result)}")
print(f"Doc length: {len(result[0].page_content)}")
print(f"Content: {result[0].page_content[:100]}")

# print(f"{str(len(result)):<5.5} {len(result[0].page_content):<5.5} {result[0].page_content[:100]}")

Query: Can I play a Go Up a Level card during combat?

From vectorstore:
Result count: 4
Doc length: 423
Content: OTHER TREASURES
Other Treasure cards (like Go Up a Level cards) are not Items. Most of these cards s

From retriever:
Result count: 2
Doc length: 1017
Content: "ONE-SHOT” TREASURES
A Treasure card that says “Usable once only” is often called a “one-shot” Treas


## Sampling LLM Chain

For self consistency, we need a way to sample multiple results from the LLM.

In [60]:
@dataclass
class SamplingConfig:
    temperature:float = 0.7
    top_k:Optional[int] = 0
    top_p:Optional[float] = None
    samples:int = 1

def build_sampling_llm_chain(chat_model:BaseChatModel, config:SamplingConfig) -> Runnable:
    temperature = config.temperature
    top_k = config.top_k
    top_p = config.top_p
    n = config.samples
    kwargs = dict(temperature=temperature, top_k=top_k, top_p=top_p, n=n)
    kwargs = {k: v for k, v in kwargs.items() if v is not None}
    
    return RunnableLambda(
        # lambda x: chat_model.batch([x]*n, temperature=temperature, top_k=top_k, top_p=top_p)
        lambda x: [
            AIMessage(content=g.text)
            for g in chat_model.generate([chat_model._convert_input(x).to_messages()], **kwargs).generations[0]
        ]
    ).with_config({"run_name": "chat-sampling"})

sampling_chain = build_sampling_llm_chain(chat_model, SamplingConfig(samples=3))
sampling_chain.invoke("What is the capital of the U.S.?")

[AIMessage(content='The capital of the United States is Washington, D.C.'),
 AIMessage(content='The capital of the United States is Washington, D.C.'),
 AIMessage(content='The capital of the United States is Washington, D.C.')]

## Basic RAG

In [81]:
from operator import itemgetter
from langchain_core.prompts import ChatPromptTemplate, BasePromptTemplate, format_document
from langchain.schema import Document
from functools import partial

document_prompt_template = """---
NAME: {source}
PAGE: {page_number}
PASSAGE:
{page_content}
---"""

DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(document_prompt_template)

rag_prompt_template = """\
You are an AI assistant to help boardgame players find answers to their rules \
questions.
Answer the question based only on the following board game rule excerpts. Do \
not use any other information. Never use the word "excerpts" in your answer. \
Simply refer to the context as the rules.

----
{context}
----

Question: {question}
"""

DEFAULT_RAG_PROMPT = ChatPromptTemplate.from_template(rag_prompt_template)

def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)


def build_basic_rag_chain(retriever, chat_chain, prompt=DEFAULT_RAG_PROMPT, document_prompt=DEFAULT_DOCUMENT_PROMPT):
    # context_chain = itemgetter("question") | retriever | _combine_documents
    return (
        RunnablePassthrough.assign(
            documents=itemgetter("question") | retriever
        )
        | RunnablePassthrough.assign(
            context=RunnableLambda(itemgetter("documents")) | partial(_combine_documents, document_prompt=document_prompt)
        )
        | RunnablePassthrough.assign(
            answer=RunnablePassthrough() | prompt | chat_chain
        )
    ).with_config({"run_name": "basic-rag-chain"})

In [82]:
basic_rag_chain = build_basic_rag_chain(retriever, chat_model)
result = basic_rag_chain.invoke({"question": "Can I play a Go Up a Level card during combat?"})
print("Question:")
print(result["question"])
print()
print()
print("Context:")
print(result["context"])
print()
print()
print("Answer:")
print(result["answer"])

Question:
Can I play a Go Up a Level card during combat?


Context:
---
NAME: munchkin_rules/munchkin_rules-1.pdf
PAGE: 3
PASSAGE:
"ONE-SHOT” TREASURES
A Treasure card that says “Usable once only” is often called a “one-shot” Treasure. Most of these are used during combat to strengthen the munchkins or the monsters, and may be played from your hand or from the table. Some have other effects, however, so read the card carefully! Discard these cards as soon as the combat is over or their effect is resolved.
One-shot Items with a Gold Piece value may be sold for levels, just like other Items.

OTHER TREASURES
Other Treasure cards (like Go Up a Level cards) are not Items. Most of these cards say when they can be played, and whether they stay in play or are discarded. A couple of specific examples: Go Upa Level cards may be played on yourself or any other player at any time, even during combat. Discard them once they are played. Exception: You cannot play a Go Up a Level card to give a play

## Universal Self-Consistency
https://arxiv.org/abs/2311.17311

In [63]:
from typing import List
from langchain.output_parsers import RegexParser

response_prompt_template = """{page_content}"""

consensus_prompt_template = """\
I have generated the following responses to the question: {question}

{context}

Evaluate these responses.
Select the most consistent response based on majority consensus.
Start your answer with "The most consistent response is Response X" (without \
quotes)
"""

DEFAULT_CONSENSUS_DOCUMENT_PROMPT = PromptTemplate.from_template(
    response_prompt_template
)

DEFAULT_CONSENSUS_PROMPT = ChatPromptTemplate.from_template(
    consensus_prompt_template
)

response_selection_parser = RegexParser(
    regex=r"(?i)response\s+(\d+)",
    output_keys=["response_selected_index"],
)

def convert_to_document(message: AIMessage) -> Document:
    return Document(
        page_content=message.content,
    )

def format_responses(responses:List[Document], document_prompt=DEFAULT_CONSENSUS_DOCUMENT_PROMPT, document_separator="\n\n") -> str:
    formatted = [f"Response {i}\n{format_document(doc, document_prompt)}" for i, doc in enumerate(responses)]
    return document_separator.join(formatted)

def build_universal_consistency_chain(chat_model:BaseChatModel, prompt=DEFAULT_CONSENSUS_PROMPT) -> Runnable:
    chat_model_consistency = chat_model.bind(temperature=0, max_tokens=1000)

    # chain that takes a list of responses and returns a formatted string combining them
    format_responses_chain = RunnableLambda(convert_to_document).map() | format_responses

    # chain that takes a question and a context and returns the index of the consensus response
    select_response_index_chain = prompt | chat_model_consistency | response_selection_parser | itemgetter(response_selection_parser.output_keys[0]) | int

    # chain that takes responses and a response_selected_index and returns the response at that index
    select_response_from_index_chain = RunnableLambda(lambda x: x["responses"][x["response_selected_index"]])

    # chain that takes a question and candiatate responses and returns the consensus response
    consistency_chain = (
        {"question": itemgetter("question"), "responses": itemgetter("responses"), "context": itemgetter("responses") | format_responses_chain}
        | RunnablePassthrough.assign(response_selected_index=select_response_index_chain)
        | select_response_from_index_chain
    )

    # chain that picks the first response if the consistency chain fails to
    # parse a response index
    fallback_chain = RunnableLambda(lambda x: x["responses"][0])

    return consistency_chain.with_fallbacks([fallback_chain], exceptions_to_handle=(ValueError, IndexError)).with_config({"run_name": "universal-consistency"})

In [64]:
basic_rag_chain = build_basic_rag_chain(retriever, sampling_chain)
consistency_chain = build_universal_consistency_chain(chat_model)

result = basic_rag_chain.invoke({"question": "Can I play a Go Up a Level card during combat?"})
consistency_result = consistency_chain.invoke({
    "question": result["question"],
    "responses": result["answer"],
})

print("Question:")
print(result["question"])
print()
print()
print("Answers:")
print('\n\n'.join(answer.content for answer in result["answer"]))
print()
print()
print("Consensus answer:")
print(consistency_result.content)

Question:
Can I play a Go Up a Level card during combat?


Answers:
No, you cannot play a Go Up a Level card during combat. According to the rules, Go Up a Level cards may be played on yourself or any other player at any time, even during combat. However, the rules also state that you cannot play a Go Up a Level card to give a player the winning level! Therefore, it is not allowed to play a Go Up a Level card during combat.

No, you cannot play a Go Up a Level card during combat.

No, you cannot play a Go Up a Level card during combat. According to the rules on page 3 of the Munchkin rules PDF, Go Up a Level cards may be played on yourself or any other player at any time, even during combat. However, once they are played, they must be discarded. The rules on page 1 of the same PDF state that there can be no disputes over whether a card can be played during combat unless the card explicitly states that it can be played during combat. In this case, the Go Up a Level card does not explici

## RAG with Thread-of-Thought

https://arxiv.org/abs/2311.08734

In [98]:
from langchain_core.runnables import RunnableBranch

# https://arxiv.org/abs/2311.08734


# You are an AI assistant to help boardgame players find answers to their rules \
# questions.
# Answer the question based only on the following board game rule excerpts. Do \
# not use any other information. Never use the word "excerpts" in your answer. \
# Simply refer to the context as the rules.

# As a content reviewer, I provide multiple retrieved passages about this \
# question; you need to answer the question.

# If you don't know the answer, just say that you don't know, don't try to make \
# up an answer.

thread_of_thought_template = """\
You are an AI assistant to help boardgame players find answers to their rules \
questions.

Answer the question based only on the following board game rule \
excerpts. Do not use any other information. Never use the word "excerpts" in \
your answer. Simply refer to the context as the rules.

----
{context}
----

Q: {question}
Walk me through this context in manageable parts step by step, summarizing and \
analyzing as we go.
"""

DEFAULT_THREAD_OF_THOUGHT_PROMPT = ChatPromptTemplate.from_template(
    thread_of_thought_template
)

DEFAULT_THEREFORE_PROMPT = PromptTemplate.from_template(
    "Therefore, the answer is "
)


def thread_of_thought_combine_documents(
    docs, 
    document_prompt=DEFAULT_DOCUMENT_PROMPT, 
    document_separator="\n"
):
    formatted = [
        f"{format_document(doc, document_prompt)}" 
        for i, doc in enumerate(docs)
    ]
    return document_separator.join(formatted)


def build_thread_of_thought_rag_chain(
    retriever, 
    chat_model:BaseChatModel, 
    sampling_chain=None, 
    prompt=DEFAULT_THREAD_OF_THOUGHT_PROMPT, 
    therefore_prompt=DEFAULT_THEREFORE_PROMPT
) -> Runnable:
    chat_model = chat_model.bind(temperature=0, max_tokens=1000)

    if sampling_chain is None:
        sampling_chain = chat_model
    
    # context_chain = retriever | thread_of_thought_combine_documents

    followup_chain = RunnableLambda(lambda x: f"{x.content}\n\n{therefore_prompt.format()}") | chat_model

    followup_branch = RunnableBranch(
        (lambda x: isinstance(x, list), followup_chain.map()),
        followup_chain,
    )

    return (
        RunnablePassthrough.assign(
            documents=itemgetter("question") | retriever
        )
        | RunnablePassthrough.assign(
            context=(
                RunnableLambda(itemgetter("documents"))
                | thread_of_thought_combine_documents
            )
        )
        | RunnablePassthrough.assign(
            answer=(
                RunnablePassthrough()
                | prompt
                | sampling_chain
                | followup_branch
            )
        )
    ).with_config({"run_name": "thread-of-thought"})

In [99]:
thread_of_thought_rag = build_thread_of_thought_rag_chain(
    retriever,
    chat_model.with_config({'callbacks': [ConsoleCallbackHandler()]})
)
result = thread_of_thought_rag.invoke(
    {"question": "Can I play a Go Up a Level card during combat?"}
)
print("Question:")
print(result["question"])
print()
print()
print("Context:")
print(result["context"])
print()
print()
print("Answer:")
print(result["answer"])

[32;1m[1;3m[llm/start][0m [1m[1:llm:HuggingFaceChatModel] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: You are an AI assistant to help boardgame players find answers to their rules questions.\n\nAnswer the question based only on the following board game rule excerpts. Do not use any other information. Never use the word \"excerpts\" in your answer. Simply refer to the context as the rules.\n\n----\n---\nNAME: munchkin_rules/munchkin_rules-1.pdf\nPAGE: 3\nPASSAGE:\n\"ONE-SHOT” TREASURES\nA Treasure card that says “Usable once only” is often called a “one-shot” Treasure. Most of these are used during combat to strengthen the munchkins or the monsters, and may be played from your hand or from the table. Some have other effects, however, so read the card carefully! Discard these cards as soon as the combat is over or their effect is resolved.\nOne-shot Items with a Gold Piece value may be sold for levels, just like other Items.\n\nOTHER TREASURES\nOther Treasure cards 

## Glue Code

In [67]:
from typing import Union

RetrieverWithAddDocuments = Union[ParentDocumentRetriever, VectorStoreRetrieverWithTextSplitter]

In [68]:
@dataclass
class RagChainConfig:
    rag_prompt:PromptTemplate = DEFAULT_RAG_PROMPT
    thread_of_thought_enabled:bool = True
    thread_of_thought_prompt:PromptTemplate = DEFAULT_THREAD_OF_THOUGHT_PROMPT
    thread_of_thought_therefore_prompt:PromptTemplate = DEFAULT_THEREFORE_PROMPT
    consensus_prompt:PromptTemplate = DEFAULT_CONSENSUS_PROMPT


def build_rag_chain(
    chat_model:BaseChatModel,
    sampling_chain:Runnable,
    retriever:RetrieverWithAddDocuments,
    config:RagChainConfig,
) -> Runnable:
    if config.thread_of_thought_enabled:
        rag_chain = build_thread_of_thought_rag_chain(
            retriever=retriever,
            chat_model=chat_model,
            sampling_chain=sampling_chain,
            prompt=config.thread_of_thought_prompt,
            therefore_prompt=config.thread_of_thought_therefore_prompt,
        )
    else:
        rag_chain = build_basic_rag_chain(
            retriever=retriever,
            chat_chain=sampling_chain,
            prompt=config.rag_prompt,
        )
    consistency_chain = build_universal_consistency_chain(chat_model, prompt=config.consensus_prompt)

    consistency_chain = (
        {"question": itemgetter("question"), "responses": itemgetter("answer")} 
        | consistency_chain
    )

    return (
        rag_chain
        | RunnablePassthrough.assign(
            answer=consistency_chain
        )
    )

In [69]:
def build_complete_chain(
        chat_model, 
        tokenizer, 
        vectorstore,
        retriever_config:RetrieverConfig, 
        sampling_config:SamplingConfig, 
        rag_config:RagChainConfig
) -> Tuple[RetrieverWithAddDocuments, Runnable]:
    retriever = build_retriever(tokenizer, vectorstore, retriever_config)
    sampling_chain = build_sampling_llm_chain(chat_model, sampling_config)
    rag_chain = build_rag_chain(chat_model, sampling_chain, retriever, rag_config)
    full_chain = {"question": RunnablePassthrough()} | rag_chain

    return retriever, full_chain

In [102]:
retriever, complete_chain = build_complete_chain(
    chat_model=chat_model,
    tokenizer=tokenizer,
    vectorstore=db,
    retriever_config=RetrieverConfig(
        max_context_size=4096,
        percent_context_use=0.75,
        parent_percent=0.15,
        parent_overlap_percent=0.1,
        child_percent=0.25,
        child_overlap_percent=0.1,
    ),
    sampling_config=SamplingConfig(
        temperature=0.7,
        samples=5,
    ),
    rag_config=RagChainConfig(
        thread_of_thought_enabled=True
    ),
)

retriever.add_documents(rule_docs)

query = "Can I play a Go Up a Level card during combat?"
result = complete_chain.invoke(query)

print("Question:")
print(result["question"])
print()
print()
print("Answer:")
print(result["answer"].content)

Question:
Can I play a Go Up a Level card during combat?


Answer:
Yes, you can play a Go Up a Level card during combat, but you must discard it once it is played.


## Evaluating

In [103]:
test_cases = [
    {
        "query": "If a monster does not pursue me because my level is too low, can I still loot the room?",
        "answer": "No, you cannot loot the room."
    },
    {
        "query": "Can I sell items from my hand to go up a level, assuming I can sell 1,000 gold pieces worth?",
        "answer": "Yes. You can sell items from your hand to go up a level."
    },
    {
        "query": "If a hireling is removed from play due to Bad Stuff, does the player retain any items the hireling was carrying?",
        "answer": "No. When a hireling is removed from play due to bad stuff, any items the hireling was carrying are also removed from play."
    },
    {
        "query": "Can I have multiple steeds equipped at the same time?",
        "answer": "No. You can only have one steed equipped at a time."
    },
    {
        "query": "Can I play a Go Up a Level card during combat on my turn?",
        "answer": "Yes. You can play a Go Up a Level card at any time."
    },
    {
        "query": "How many players can join me in a combat?",
        "answer": "Only one player can join you in combat."
    },
    {
        "query": "Does a player retain their princess card in play if they die?",
        "answer": "Yes. A player retains their princess card in play if they die."
    },
    {
        "query": "Can I carry multiple big items so long as only one is equipped?",
        "answer": "No. You can only carry one big item at a time."
    },
    {
        "query": "What is an item in play but not equipped called?",
        "answer": "An item in play but not equipped is called a carried item."
    },
    {
        "query": "If after breaking down the door I draw a steed face up, what are my options?",
        "answer": "You can put the steed into your hand, equip it, or treat as a monster and fight it."
    },
    {
        "query": "When can I play a Super Munchkin card?",
        "answer": "You can play a Super Munchkin card whenever it is legal to play a Class card."
    },
    {
        "query": "Can I play a Super Munchkin card without a class card?",
        "answer": "No, you must have a class card to attach it to."
    },
    {
        "query": "What cards can I trade with other players?",
        "answer": "You can trade any item cards in play (on the table) with other players."
    },
    {
        "query": "When can I player a Hireling?",
        "answer": "At any time."
    },
    {
        "query": "Can I play a Hireling card if I already have a Hireling in play?",
        "answer": "No. You can only have one Hireling in play at a time."
    },
    {
        "query": "When I loot the room, is the door card drawn face up or face down?",
        "answer": "The door card is drawn face down."
    },
    {
        "query": "Can I use a card to compel another player to help me in combat if winning that combat would give me the winning level?",
        "answer": "No. You cannot compel another player to get the winning level."
    },
    {
        "query": "Can I play a Go Up a Level card on another player?",
        "answer": "Yes."
    },
    {
        "query": "Can I play a Curse while in combat?",
        "answer": "Yes. A curse may be played at any time."
    },
    {
        "query": "How can I get rid of my Class card?",
        "answer": "You can discard your Class card at any time."
    },
    {
        "query": "When can I discard a Race card?",
        "answer": "You can discard your Race card at any time."
    }
]

In [104]:
from langchain.evaluation.qa.eval_chain import QAEvalChain, CotQAEvalChain
from langchain_core.output_parsers.string import StrOutputParser
import copy

eval_template = """You are a teacher grading a quiz.
You are given a question, the student's answer, and the true answer, and are \
asked to score the student answer as either CORRECT or INCORRECT.
Write out in a step by step manner your reasoning to be sure that your \
conclusion is correct. Avoid simply stating the correct answer at the outset.
As long as the student answer contains the true answer, you should conclude \
that the student answer is correct.
At the end, always output "GRADE: CORRECT" or "GRADE: INCORRECT" (without the \
quotes) to indicate your final conclusion on a line all by itself.

Example Format:
QUESTION: question here
STUDENT ANSWER: student's answer here
TRUE ANSWER: true answer here
EXPLANATION: step by step reasoning here
GRADE: CORRECT or INCORRECT here

Grade the student answers based ONLY on their factual accuracy with respect to \
the true answer. Ignore differences in punctuation and phrasing between the \
student answer and true answer. Begin! 

QUESTION: What should I do while driving and the light turns yellow?
STUDENT ANSWER: Slow down the vehicle in preparation to stop.
TRUE ANSWER: Slow down and prepare to stop.
EXPLANATION: The true answer says you should do two things: slow down and \
prepare to stop. The student answer covers both of these and is therefore \
correct.
GRADE: CORRECT

QUESTION: What should I do while stopped at a red light?
STUDENT ANSWER: Twiddle my thumbs contemplating the monotony of life while \
waiting for the light to maybe one day turn green.
TRUE ANSWER: Wait for the light to turn green.
EXPLANATION: The true answer mentions waiting for the light to turn green. \
The student answer also mentions waiting for the light to turn green. The \
student answer also mentions twiddling thumbs and contemplating the monotony \
of life which is not relevant and so will be ignored. Therefore, the student \
answer is correct.
GRADE: CORRECT

QUESTION: What should I do when the light turns green?
STUDENT ANSWER: Floor it.
TRUE ANSWER: Wait for the intersection to clear and then proceed.
EXPLANATION: The true answer states you should wait for the intersection to \
clear. The student answer does not mention anything about waiting for the \
intersection to clear. Therefore, the student answer is incorrect.
GRADE: INCORRECT

QUESTION: {query}
STUDENT ANSWER: {result}
TRUE ANSWER: {answer}
EXPLANATION:"""
eval_prompt = PromptTemplate(
    input_variables=["query", "result", "answer"], template=eval_template
)

def predict_answer(test_cases, chain):
    test_cases = copy.deepcopy(test_cases)
    for test_case in test_cases:
        query = test_case["query"]
        result = chain.invoke(query)
        context = None
        if isinstance(result, dict):
            context = result.get("context")
            result = result.get("answer")
        result = result.content
        test_case["result"] = result
        if context is not None:
            test_case["context"] = context
    return test_cases


def grade(test_cases, llm):
    test_cases = copy.deepcopy(test_cases)
    eval_chain = QAEvalChain.from_llm(llm, prompt=eval_prompt)
    results = eval_chain.evaluate(examples=test_cases, predictions=test_cases)
    # It looks like, once upon a time, the eval chain did this bit for us... but
    # now it doesn't? Maybe a bug? Unclear as there seem to have been other 
    # intentional changes in behavior. Anyway, we do it ourselves now.
    results = [eval_chain._prepare_output(result) for result in results]

    # Merge the test cases and results
    for test_case, result in zip(test_cases, results):
        test_case.update(result)

    return test_cases


def print_graded(graded):
    for test_case in graded:
        print("query:", test_case["query"])
        print("answer:", test_case["result"])
        print("reference:", test_case["answer"])
        print("reasoning:", test_case["reasoning"])
        print("score:", test_case["score"])
        if "expected_score" in test_case:
            print("expected_score:", test_case["expected_score"])
        print()


def get_overall_score(scores):
    return sum([score["score"] or 0 for score in scores]) / len(scores)

In [105]:
example_test_cases = test_cases[:1]
example_test_cases = predict_answer(example_test_cases, complete_chain)
example_test_cases = grade(example_test_cases, chat_model.with_config({"callbacks": [ConsoleCallbackHandler()]}))
print_graded(example_test_cases)

[32;1m[1;3m[llm/start][0m [1m[1:llm:HuggingFaceChatModel] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: You are a teacher grading a quiz.\nYou are given a question, the student's answer, and the true answer, and are asked to score the student answer as either CORRECT or INCORRECT.\nWrite out in a step by step manner your reasoning to be sure that your conclusion is correct. Avoid simply stating the correct answer at the outset.\nAs long as the student answer contains the true answer, you should conclude that the student answer is correct.\nAt the end, always output \"GRADE: CORRECT\" or \"GRADE: INCORRECT\" (without the quotes) to indicate your final conclusion on a line all by itself.\n\nExample Format:\nQUESTION: question here\nSTUDENT ANSWER: student's answer here\nTRUE ANSWER: true answer here\nEXPLANATION: step by step reasoning here\nGRADE: CORRECT or INCORRECT here\n\nGrade the student answers based ONLY on their factual accuracy with respect to the true answ

### Grade the Grader

We have a set of QA pairs that have been evaluated by a human grader for
correctness. We use these examples to compare the performance of our llm grader
to the human grader.

In [106]:
with open("grader_test_cases.json", "r") as f:
    grader_test_cases = json.loads(f.read())
graded = grade(grader_test_cases, chat_model.bind(temperature=0, max_tokens=1000))
grader_got_wrong = [grade for grade in graded if grade["score"] != grade["expected_score"]]
print_graded(grader_got_wrong)
sum(1 if grade["score"] == grade["expected_score"] else 0 for grade in graded) / len(graded)

query: Can I have multiple steeds equipped at the same time?
answer: The player can have only one Steed equipped at a time, and they can treat a Steed as a monster and fight it to gain a Treasure and a level. The player must follow certain rules when it comes to carrying and equipping Items, and they cannot discard a Big item to play another.
reference: No. You can only have one steed equipped at a time.
reasoning: The true answer states that you can only have one steed equipped at a time. The student answer also mentions that you can treat a steed as a monster and fight it to gain a treasure and a level, which is not relevant and will be ignored. The student answer also mentions certain rules about carrying and equipping items, which is not relevant and will be ignored. Therefore, the student answer is incorrect.

GRADE: INCORRECT
score: 0
expected_score: 1

query: How many players can join me in a combat?
answer: Yes, in the game of Munchkin, players can ask for help in combat if the

0.8571428571428571

### Evaluate chain

In [107]:
if False:
    predictions = predict_answer(test_cases, rag_chain_with_context)
    with open("test_cases.json", "w") as f:
        f.write(json.dumps(predictions))
    graded = grade(predictions, chat_model)
    print_graded(graded)
    get_overall_score(graded)

with open("test_cases.json", "r") as f:
    predictions = json.loads(f.read())

graded = grade(predictions, chat_model.bind(temperature=0, max_tokens=1000))
print_graded(graded)
get_overall_score(graded)

query: If a monster does not pursue me because my level is too low, can I still loot the room?
answer: To loot the room in Munchkin, you must have defeated a monster in a combat and then choose to loot the room. If you did not draw a monster during your first turn, you have two choices: either look for trouble or loot the room. If you choose to loot the room, you draw a second card from the deck and place it in your hand. You may choose to play it immediately or wait until your next turn to play it. If you choose to loot the room and did not draw a monster during your first turn, you will not receive any levels or treasure for looting the room. You must have defeated a monster first in order to receive these rewards.
reference: No, you cannot loot the room.
reasoning: The true answer is that you cannot loot the room if a monster does not pursue you because your level is too low. The student answer is incorrect because it mentions that you can still loot the room if you did not draw a m

0.5714285714285714

## Optimize Parameters

In [25]:
retriever_config = RetrieverConfig(
    max_context_size=4096,
    percent_context_use=0.75,
    parent_percent=0.15,
    parent_overlap_percent=0.1,
    child_percent=0.25,
    child_overlap_percent=0.1,
)

sampling_config = SamplingConfig(
    temperature=0.7,
    # top_k=0,
    # top_p=None,
    samples=5,
)

rag_config = RagChainConfig(
    thread_of_thought_enabled=True,
)

retriever, rag_chain_with_context = build_complete_chain(
    chat_model=chat_model,
    tokenizer=tokenizer,
    vectorstore=db,
    retriever_config=retriever_config,
    sampling_config=sampling_config,
    rag_config=rag_config,
)
# print(rag_chain_with_context)
rag_chain = rag_chain_with_context | itemgetter("answer")

In [27]:
retriever.add_documents(rule_docs)

In [30]:
from langchain.callbacks.tracers import ConsoleCallbackHandler

# rag_chain = build_basic_rag_chain(
#     retriever=retriever,
#     chat_chain=chat_model,
#     prompt=rag_config.rag_prompt,
# )

rag_chain_trace = rag_chain.with_config({'callbacks': [ConsoleCallbackHandler()]})

rag_chain_trace.invoke("Can I play a Go Up a Level card during combat?")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "Can I play a Go Up a Level card during combat?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel] Entering Chain run with input:
[0m{
  "input": "Can I play a Go Up a Level card during combat?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel > 3:chain:RunnablePassthrough] Entering Chain run with input:
[0m{
  "input": "Can I play a Go Up a Level card during combat?"
}
[36;1m[1;3m[chain/end][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel > 3:chain:RunnablePassthrough] [0ms] Exiting Chain run with output:
[0m{
  "output": "Can I play a Go Up a Level card during combat?"
}
[36;1m[1;3m[chain/end][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel] [1ms] Exiting Chain run with output:
[0m{
  "question": "Can I play a Go Up a Level card during combat?"
}
[32;1m[1;3m[c

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mas

[36;1m[1;3m[chain/end][0m [1m[1:chain:RunnableSequence > 4:chain:thread-of-thought > 15:chain:RunnableAssign > 16:chain:RunnableParallel > 17:chain:RunnableSequence > 20:chain:chat-sampling] [39.64s] Exiting Chain run with output:
[0m{
  "output": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "messages",
        "AIMessage"
      ],
      "kwargs": {
        "content": "The question is about whether or not a \"Go Up a Level\" card can be played during combat in the game Munchkin.\n\nThe context starts with a passage from the rules of the game which lists actions that can be taken at any time and actions that can be taken on your own turn. The \"Go Up a Level\" card is listed as an action that can be taken on your own turn.\n\nThe passage then moves on to discuss disputes between cards and rules. It states that nothing can reduce a player below Level 1, but that a player can go up a level after combat only if they h

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[36;1m[1;3m[llm/end][0m [1m[1:chain:RunnableSequence > 4:chain:thread-of-thought > 15:chain:RunnableAssign > 16:chain:RunnableParallel > 17:chain:RunnableSequence > 21:chain:RunnableBranch > 23:chain:RunnableEach > 24:chain:RunnableSequence > 26:llm:HuggingFaceChatModel] [8.73s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "No, a \"Go Up a Level\" card cannot be played during combat in the game Munchkin. According to the rules, a player can only go up a level after defeating a monster, and this must be done on their own turn. Additionally, the rules state that nothing can reduce a player below Level 1, so a player cannot use a \"Go Up a Level\" card to increase their level during combat.",
        "generation_info": null,
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
     

AIMessage(content='No, a "Go Up a Level" card cannot be played during combat in the game Munchkin. According to the rules, a player can only go up a level after defeating a monster, and this must be done on their own turn. Additionally, the rules state that nothing can reduce a player below Level 1, so a player cannot use a "Go Up a Level" card to increase their level during combat.')

In [41]:
chain = (
    RunnablePassthrough.assign(result=itemgetter("query") | rag_chain | StrOutputParser())
    | QAEvalChain.from_llm(chat_model)
) 

chain.with_config({"callbacks": [ConsoleCallbackHandler()]}).invoke(
    {
        "query": "If a monster does not pursue me because my level is too low, can I still loot the room?",
        "result": "No, if a monster does not pursue me because my level is too low, I cannot loot the room. According to the rules, a player can only go up a level after defeating a monster in combat. If the monster does not pursue me, I cannot defeat it and therefore cannot go up a level. Additionally, the rules state that a player cannot collect rewards for defeating a monster in the middle of a combat. If I do not defeat the monster, I cannot collect any rewards. Therefore, I cannot loot the room."
        "answer": "No, you cannot loot the room.",
    }
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence] Entering Chain run with input:
[0m{
  "query": "If a monster does not pursue me because my level is too low, can I still loot the room?",
  "answer": "No, you cannot loot the room."
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableAssign] Entering Chain run with input:
[0m{
  "query": "If a monster does not pursue me because my level is too low, can I still loot the room?",
  "answer": "No, you cannot loot the room."
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableAssign > 3:chain:RunnableParallel] Entering Chain run with input:
[0m{
  "query": "If a monster does not pursue me because my level is too low, can I still loot the room?",
  "answer": "No, you cannot loot the room."
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableAssign > 3:chain:RunnableParallel > 4:chain:RunnableSequence] Entering Chain run with input:
[0m{
  "q

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_

[36;1m[1;3m[chain/end][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableAssign > 3:chain:RunnableParallel > 4:chain:RunnableSequence > 8:chain:RunnableParallel > 10:chain:thread-of-thought > 18:chain:chat-sampling] [34.94s] Exiting Chain run with output:
[0m{
  "output": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "messages",
        "AIMessage"
      ],
      "kwargs": {
        "content": "Q: If a monster does not pursue me because my level is too low, can I still loot the room?\n\nStep 1: Understand the context\n\nThe passage is from a rulesheet for the card game \"Munchkin.\" In this game, players take on the role of adventurers who fight monsters to gain levels and treasure. The rules for combat and looting are outlined in the passage.\n\nStep 2: Analyze the question\n\nThe question asks if a player can still loot the room if a monster does not pursue them because their level is too low.\n\nStep 3: Summariz

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[36;1m[1;3m[llm/end][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableAssign > 3:chain:RunnableParallel > 4:chain:RunnableSequence > 8:chain:RunnableParallel > 10:chain:thread-of-thought > 19:chain:RunnableBranch > 21:chain:RunnableEach > 24:chain:RunnableSequence > 26:llm:HuggingFaceChatModel] [15.47s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "No, if a monster does not pursue me because my level is too low, I cannot loot the room. According to the rules, a player can only go up a level after defeating a monster in combat. If the monster does not pursue me, I cannot defeat it and therefore cannot go up a level. Additionally, the rules state that a player cannot collect rewards for defeating a monster in the middle of a combat. If I do not defeat the monster, I cannot collect any rewards. Therefore, I cannot loot the room.",
        "generation_info": null,
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
         

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[36;1m[1;3m[llm/end][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableAssign > 3:chain:RunnableParallel > 4:chain:RunnableSequence > 29:chain:universal-consistency > 30:chain:RunnableSequence > 43:chain:RunnableAssign > 44:chain:RunnableParallel > 45:chain:RunnableSequence > 47:llm:HuggingFaceChatModel] [476ms] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "The most consistent response is Response 2.",
        "generation_info": null,
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "The most consistent response is Response 2."
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableAssign > 3:chain:RunnableParallel > 

{'query': 'If a monster does not pursue me because my level is too low, can I still loot the room?',
 'answer': 'No, you cannot loot the room.',
 'result': 'No, if a monster does not pursue me because my level is too low, I cannot loot the room. According to the rules, a player can only go up a level after defeating a monster in combat. If the monster does not pursue me, I cannot defeat it and therefore cannot go up a level. Additionally, the rules state that a player cannot collect rewards for defeating a monster in the middle of a combat. If I do not defeat the monster, I cannot collect any rewards. Therefore, I cannot loot the room.',
 'results': 'CORRECT.'}

In [43]:
m = "As a content reviewer, I provide multiple retrieved passages about this question; you need to answer the question.\n\nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n----\n\n---\nNAME: munchkin_rules/munchkin_rules-1.pdf\nPAGE: 2\nPASSAGE:\nWhen You May Take Actions\nYou may perform these actions at any time: & Discard a Class or Race.\n& Play a Go Up a Level or Hireling. Play a Curse.\nYou may perform these actions at any time, as long as you are not in combat: Trade an Item with another player (the other player may not be in combat, either).\ny Change which Items you have equipped.\nPlay a card that you have just received (some cards may be played even during combat; see above).\nYou may perform these actions on your own turn: 3 Play a new Class or Race card (at any time).\nwe Sell Items for levels (except when you are in combat). Play an Item (most Items cannot be played during combat, but some one-shot Items can; see p- 3).\n---\n\n\n---\nNAME: munchkin_rules/munchkin_rules-1.pdf\nPAGE: 1\nPASSAGE:\nConflicts Between Cards and Rules\nThis rulesheet gives the general tules. Many cards add special rules, so in most cases when the rulesheet disagrees with a card, follow the card. However, ignore any card effect that might seem to contradict one of the rules listed below unless the card explicitly says it supersedes that rule!\niL Nothing can reduce a player below Level 1, although card effects might reduce a player's or a monster's combat strength (p. 3) below I.\n2. You go up a level after combat only if you Ail a monster.\n3. You cannot collect rewards for defeating a monster (eg., Treasure, levels) in the middle of a combat. You must finish the fight before gaining any rewards.\n4. You must killa monster to reach Level 10, and you cannot force another player to help you do it.\nAny other disputes should be settled by loud arguments, with the owner of the game having the last word. You could also read the Munchkin FAQ and errata pages at munchkin.game, or start a discussion at forums.sjgames.com/, munchkin . . . unless it’s more fun to argue.\n\nSTEVE JACKSON GAMES\nYour Hand: Cards in your hand are not in play. They don’t help you, but they can’t be taken away except by cards that specifically affect “your hand.” At the end of your turn, you may have no more than five cards in your hand (see Charity, p- 2).\nCards in play may not be returned to your hand - they must be discarded or traded if you want to get rid of them.\n---\n\n----\n\nQ: Can I play a Go Up a Level card during combat?\nWalk me through this context in manageable parts step by step, summarizing and analyzing as we go.\n"
print(m)
print()
print()
print(chat_model.invoke(m).content)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


As a content reviewer, I provide multiple retrieved passages about this question; you need to answer the question.

If you don't know the answer, just say that you don't know, don't try to make up an answer.

----

---
NAME: munchkin_rules/munchkin_rules-1.pdf
PAGE: 2
PASSAGE:
When You May Take Actions
You may perform these actions at any time: & Discard a Class or Race.
& Play a Go Up a Level or Hireling. Play a Curse.
You may perform these actions at any time, as long as you are not in combat: Trade an Item with another player (the other player may not be in combat, either).
y Change which Items you have equipped.
Play a card that you have just received (some cards may be played even during combat; see above).
You may perform these actions on your own turn: 3 Play a new Class or Race card (at any time).
we Sell Items for levels (except when you are in combat). Play an Item (most Items cannot be played during combat, but some one-shot Items can; see p- 3).
---


---
NAME: munchkin_rul

In [70]:
# # Using regular LLM interface
# from langchain.llms import VLLMOpenAI

# llm = VLLMOpenAI(
#     openai_api_key="EMPTY",
#     openai_api_base="http://localhost:8000/v1",
#     temperature=0.1,
#     # model_kwargs=dict(repetition_penalty=1.1),
#     max_tokens=2_000,
#     model_name=model_name,
#     frequency_penalty=0.2,
# )
# print(llm("[INST] Generate 10 names for a fantasy elf Paladin. [/INST] "))

1. Galadriel
2. Elrond
3. Legolas
4. Arwen
5. Thranduil
6. Faramir
7. Eärendil
8. Lúthien
9. Glorfindel
10. Celebrindor


In [13]:
# To parse the PDFs, there are three strategies available: "fast", "hi_res", and
# "ocr_only". For the PDFs used here, "fast" retrieves a bunch of duplicate text
# in the wrong order. "hi_res" doesn't handle columns of text well and produces
# incoherent results. "ocr_only" seems to work reasonably well in this case.
rule_docs = []
for filename in data_path.glob("*.pdf"):
    print(f"Processing {filename}")
    loader = UnstructuredPDFLoader(filename, strategy="ocr_only")
    rule_docs.extend(loader.load())

Processing munchkin_rules/munchkin_rules-1.pdf


Processing munchkin_rules/puppies-rules.pdf
Processing munchkin_rules/princesses_rules.pdf
Processing munchkin_rules/munch_4_rules_20thp.pdf


In [14]:
# Chunk text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
chunked_documents = text_splitter.split_documents(rule_docs)

In [17]:
!nvidia-smi

Fri Dec  1 23:51:35 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:07:00.0 Off |                  N/A |
|  0%   47C    P8    23W / 420W |    139MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [18]:
# import faiss
# Load chunked documents into the FAISS index
db = FAISS.from_documents(
    chunked_documents, 
    embedding_model
)

In [29]:
!nvidia-smi

Fri Dec  1 23:57:07 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:07:00.0 Off |                  N/A |
|  0%   49C    P8    21W / 420W |  22475MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [57]:
prompt_template = """[INST] 
Instruction: You are an assitant to help answer questions about board game rules. Answer questions concisely and in one or two sentences. Rely upon the following passages from the rulebook when answering questions.

{context}

QUESTION:
{question} 
[/INST]"""

# text_generation_pipeline = transformers.pipeline(
#     model=model,
#     tokenizer=tokenizer,
#     task="text-generation",
#     # temperature=0.2,
#     # repetition_penalty=1.1,
#     # return_dict_in_generate=True,
#     # output_scores=True,
#     return_full_text=True,
#     max_new_tokens=1000,
# )
# text_generation_pipeline.model.config.pad_token_id = text_generation_pipeline.model.config.eos_token_id

# mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# Create prompt from prompt template
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

# Create llm chain 
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Create retriever
# retrieve_topk
retriever = db.as_retriever(search_kwargs={"k": 3})

def format_docs(docs):
    passages = []
    for i, doc in enumerate(docs):
        passages.append(f"Passage {i+1}: {doc.page_content}")
    return "\n\n".join(passages)

# Create rag chain
rag_chain = ( 
 {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | llm_chain
)

In [None]:
import langchain
langchain.debug = False

In [58]:
for i, test_case in enumerate(test_cases):
    # # Hack to avoid a warning about horribly inefficient our use of the GPU is
    # text_generation_pipeline.call_count = 0

    result = rag_chain.invoke(test_case["question"])

    print("input:", test_case["question"].strip())
    print("reference:", test_case["answer"].strip())
    print("prediction:", result["text"].strip())

    if i < len(test_cases) - 1:
        print()
        print()

input: If a monster does not pursue me because my level is too low, can I still loot the room?
reference: No. If a monster does not pursue you, it means you automatically run away from it. However, you still count as having been in combat and cannot loot the room.
prediction: No, if a monster does not pursue you because your level is too low, you cannot loot the room.


input: Can I sell items from my hand to go up a level, assuming I can sell 1,000 gold pieces worth?
reference: Yes. You can sell items from your hand to go up a level.
prediction: No, you cannot sell items from your hand to go up a level. You can only sell items worth a total of at least 1,000 Gold Pieces and immediately go up one level.


input: If a hireling is removed from play due to Bad Stuff, does the player retain any items the hireling was carrying?
reference: No. When a hireling is removed from play due to bad stuff, any items the hireling was carrying are also removed from play.
prediction: If a Hireling is re

In [60]:
def ask(question):
    result = rag_chain.invoke(question)
    print("CONTEXT")
    print(result["context"])
    # for i, doc in enumerate(result["context"]):
    #     print(f"Document {i}:")
    #     print(f"{doc}")
    #     print("\n")
    print()
    print("ANSWER")
    print(result["text"].strip())

In [61]:
ask("Can I play a Go Up a Level card during combat on my turn?")

CONTEXT
Passage 1: Hireling may be played at any time, on any turn. You cannot give a Hireling an Item to carry while you are in combat, however.

COMBAT

To fight a monster, compare its combat strength to yours. Combat strength is the total of Level plus all modifiers - positive or negative - given by Items and other cards. If the monster's combat strength is equal to yours, or greater, you lose the combat and must Run Away (see p. 5). If your combat strength totals more than the monster's — note that monsters win ties! — you kill it and goupa level (two levels for some big monsters). You'll also get the number of Treasures shown on its card.

Sometimes a card will let you get rid of the monster without killing it. This is still “winning,” but you don't get a level. Unless the ability says otherwise, you don’t get the Treasures, either. If the last monster is removed from a combat, it ends instantly.

Some monster cards have special powers that affect combat

Passage 2: killing a mons

In [77]:
# Lets try thread of thought
prompt_template = """[INST] As a content reviewer, I provide multiple passages about this question; you need to answer the question.
Passage 1: 

Hireling may be played at any time, on any turn. You cannot give a Hireling an Item to carry while you are in combat, however.

COMBAT

To fight a monster, compare its combat strength to yours. Combat strength is the total of Level plus all modifiers - positive or negative - given by Items and other cards. If the monster's combat strength is equal to yours, or greater, you lose the combat and must Run Away (see p. 5). If your combat strength totals more than the monster's — note that monsters win ties! — you kill it and goupa level (two levels for some big monsters). You'll also get the number of Treasures shown on its card.

Sometimes a card will let you get rid of the monster without killing it. This is still “winning,” but you don't get a level. Unless the ability says otherwise, you don’t get the Treasures, either. If the last monster is removed from a combat, it ends instantly.

Some monster cards have special powers that affect combat

Passage 2: 

killing a monster, unless a card specifically allows you to win another way.

When You May Take Actions

You may perform these actions at any time: & Discard a Class or Race.

& Play a Go Up a Level or Hireling. Play a Curse.

You may perform these actions at any time, as long as you are not in combat:

Trade an Item with another player (the other player may not be in combat, either).

y Change which Items you have equipped.

Play a card that you have just received (some cards may be

played even during combat; see above).

You may perform these actions on your own turn:

3 Play a new Class or Race card (at any time).

we Sell Items for levels (except when you are in combat). Play an Item (most Items cannot be played during combat, but some one-shot Items can; see p- 3).

TURN PHASES

Your turn begins as soon as the previous player's turn ends. When your cards are arranged the way you want, go to phase iL

(1) Kick Open The Door: Draw one card from the Door deck and turn it face up.

Passage 3: 

Conflicts Between Cards and Rules

This rulesheet gives the general tules. Many cards add special rules, so in most cases when the rulesheet disagrees with a card, follow the card. However, ignore any card effect that might seem to contradict one of the rules listed below unless the card explicitly says it supersedes that rule!

iL Nothing can reduce a player below Level 1, although card effects might reduce a player's or a monster's combat strength (p. 3) below I.

2. You go up a level after combat only if you Ail a monster.

3. You cannot collect rewards for defeating a monster (eg., Treasure, levels) in the middle of a combat. You must finish the fight before gaining any rewards.

4. You must killa monster to reach Level 10, and you cannot force another player to help you do it.

Question: Can I play a Go Up a Level card during combat on my turn?
Walk me through this context in manageable parts step by step,
summarizing and analyzing as we go.
Answer:
"""

# Create prompt from prompt template
prompt = PromptTemplate(
    input_variables=[],
    template=prompt_template,
)

# Create llm chain 
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Create retriever
# retrieve_topk
retriever = db.as_retriever(search_kwargs={"k": 3})

def format_docs(docs):
    passages = []
    for i, doc in enumerate(docs):
        passages.append(f"Passage {i+1}: {doc.page_content}")
    return "\n\n".join(passages)

# Create rag chain
# rag_chain = ( 
#  {"context": retriever | format_docs, "question": RunnablePassthrough()}
#     | llm_chain
# )

In [78]:
thot_chain = llm_chain
print(thot_chain.invoke({})["text"])


Passage 1:

* Hireling can be played at any time, on any turn.
* Cannot give a Hireling an Item to carry while in combat.

Passage 2:

* You may perform these actions at any time: Discard a Class or Race, Play a Go Up a Level or Hireling, Play a Curse.
* You may perform these actions on your own turn: Play a new Class or Race card, Sell Items for levels, Play an Item.

Passage 3:

* Conflicts Between Cards and Rules.
* Nothing can reduce a player below Level 1.
* You go up a level after combat only if you kill a monster.
* You cannot collect rewards for defeating a monster in the middle of a combat.
* You must kill a monster to reach Level 10 and cannot force another player to help you do it.

Answer: No, you cannot play a Go Up a Level card during combat on your turn because the rules state that you can only go up a level after combat if you kill a monster. Additionally, the rules state that you cannot collect rewards for defeating a monster in the middle of a combat, so you would ne

In [None]:
ask("What is the card limit for how many cards can be in my hand?")

CONTEXT
Any other disputes should be settled by loud arguments, with the owner of the game having the last word. You could also read the Munchkin FAQ and errata pages at munchkin.game, or start a discussion at forums.sjgames.com/, munchkin . . . unless it’s more fun to argue.

STEVE JACKSON GAMES

Your Hand: Cards in your hand are not in play. They don’t help you, but they can’t be taken away except by cards that specifically affect “your hand.” At the end of your turn, you may have no more than five cards in your hand (see Charity, p- 2).

Cards in play may not be returned to your hand - they must be discarded or traded if you want to get rid of them.

CHARACTER CREATION

Everyone starts as a Level | human with no class. (Heh, heh.) Munchkin characters may be either male or female. Your character's sex is the same as your own at the start of the game, unless you declare otherwise.

Anyone can carry any Item (except for extra Big items; see below), but you may equip only one Headgear, 