<a href="https://colab.research.google.com/github/polinak1r/AI-Powered-Question-Answering-for-Munchkin-Game/blob/main/AI_Powered_Question_Answering_for_Munchkin_Game.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook Outline: AI-Powered Question Answering for Munchkin Game

1. **Imports, Model Setup, and Prompt Functions**  
   We begin by installing and importing the necessary libraries and loading a causal language model (`microsoft/Phi-3-mini-128k-instruct`) along with its tokenizer. We define a `prompt_maker` function to construct the question-and-options prompt and a `score_options` function to compute logits for each option.

2. **Kaggle Setup, Context Truncation, and Corpus Assembly**  
   We configure Kaggle credentials to download the rules (`munchkin_rules.md`) and other files. A `truncate_tokens` function enforces `MAX_LEN = 256` to limit the context size. We then read and sentence-split the Munchkin rules, scrape and clean the official FAQ, and combine both sets of texts into a single corpus for retrieval.

3. **TF-IDF Retrieval Construction**  
   We build a TF-IDF index by tokenizing all corpus documents and computing term frequencies. A sparse TF matrix is multiplied by the IDF values, enabling quick lookups of the most relevant texts for a given query.

4. **Answering Questions with Context**  
   We load the training dataset (multiple-choice Q&A), retrieve the top-k TF-IDF matches for each question, truncate the context if needed, and pass everything to our language model via a prompt. We select the best-scoring answer based on the final token’s logits.

5. **Evaluation and Submission**  
   We compute accuracy on the training set and then apply the same pipeline to the test set, which does not have labels. Finally, we save the predicted answers (one per question) as a CSV file for submission.


In [10]:
from collections import Counter, defaultdict
import json
import re

from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import requests
import scipy
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from tokenizers import Tokenizer
from tqdm import tqdm
from spacy.lang.en import English

In [None]:
device = torch.device('cuda:0')

In [None]:
model_name = 'microsoft/Phi-3-mini-128k-instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             trust_remote_code=True,
                                             torch_dtype=torch.float16,
                                             device_map=device)

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

In [None]:
def prompt_maker(question: str,
                 contexts: list[str],
                 options: list[str]
                ) -> tuple[str, list[str]]:
    prompt = 'You answering questions about a bord game given some contexts from the game rules.\n'
    prompt += f'The question: "{question}"\n\nTo answer this question, consider the following contexts:\n'
    for context in contexts:
        prompt += f'- {context}\n'
    prompt += '\nYou have the following options:\n'
    options_markers = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    if len(options) > len(options_markers):
        raise ValueError('Too many options for the prompt')
    used_option_markers = []
    for option, option_marker in zip(options, options_markers):
        prompt += f'{option_marker}. {option}\n'
        used_option_markers.append(option_marker)
    used_options_str = ', '.join(used_option_markers)
    prompt += f'\nWhat option is more likely to be correct: {used_options_str}?'
    prompt += '\nThe most likely option is: '
    return prompt, used_option_markers

In [None]:
prompt, choices = prompt_maker('What level should I be to win?',
                               ['Munchkin is a card game', 'You can win by reaching level 10'],
                               ['Level 8', 'Level 10'])
print(prompt)

You answering questions about a bord game given some contexts from the game rules.
The question: "What level should I be to win?"

To answer this question, consider the following contexts:
- Munchkin is a card game
- You can win by reaching level 10

You have the following options:
A. Level 8
B. Level 10

What option is more likely to be correct: A, B?
The most likely option is: 


This form of prompt allows using logits of next token for classification, since the next token supposed to be a single letter poining to the correct option.

In [None]:
@torch.inference_mode()
def score_options(prompt: str,
                  options: list[str],
                  model: torch.nn.Module,
                  tokenizer: Tokenizer,
                  device) -> torch.Tensor:
    token_ids = tokenizer(prompt, return_tensors='pt')['input_ids']
    outs = model(token_ids.to(device))
    opt_tokens = []
    for opt in options:
        opt_token_tens = tokenizer(opt, return_tensors='pt',add_special_tokens=False)['input_ids'][0]
        assert len(opt_token_tens) == 1, 'Only one token option is supported'
        opt_token = opt_token_tens[0]
        opt_tokens.append(opt_token)
    log_probs = outs.logits[0, -1, opt_tokens].cpu()
    return log_probs

In [None]:
choices_scores = score_options(prompt, choices, model, tokenizer, device)
choice_idx = choices_scores.argmax().item()
choice = choices[choice_idx]
print(prompt + choice)

You answering questions about a bord game given some contexts from the game rules.
The question: "What level should I be to win?"

To answer this question, consider the following contexts:
- Munchkin is a card game
- You can win by reaching level 10

You have the following options:
A. Level 8
B. Level 10

What option is more likely to be correct: A, B?
The most likely option is: B


Despite the possible context size of current models is large as thousands of tokens, we limit the maximum number of tokens in context. This makes the task more challenging but in the same time it becomes closer to real-world applications. All contexts we use must undergo truncation procedure. MAX_LEN variable must not be changed.

In [None]:
MAX_LEN = 256

def truncate_tokens(contexts: list[str]) -> str:
    total_tokens = 0
    turncated_contexts = []
    for ctx in contexts:
        ctx_tokens = tokenizer(ctx, add_special_tokens=False)['input_ids']
        total_tokens += len(ctx_tokens)
        if total_tokens > MAX_LEN:
            ctx = tokenizer.decode(ctx_tokens)
            return turncated_contexts + [ctx]
        else:
            turncated_contexts.append(ctx)
    return turncated_contexts

In [None]:
truncate_tokens(['something ' * 32, 'other ' * 256, 'another ' * 32])

2024-05-16 13:23:17.388995: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-16 13:23:17.389100: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-16 13:23:17.662613: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


['something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something ',
 'other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other other o

# Assemble the Corpus for Retrieval

In [1]:
!pip install -q kaggle

In [2]:
import json

with open("kaggle.json", "r") as f:
    creds = json.load(f)

In [3]:
from google.colab import userdata
import os

os.environ["KAGGLE_USERNAME"] = creds["username"]
os.environ["KAGGLE_KEY"] = creds["key"]

In [4]:
import os
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [6]:
!kaggle competitions download -c nlp-nup-2024-hw2 -f munchkin_rules.md

Downloading munchkin_rules.md to /content
  0% 0.00/25.7k [00:00<?, ?B/s]
100% 25.7k/25.7k [00:00<00:00, 36.7MB/s]


In [7]:
with open('munchkin_rules.md') as fp:
    rules = fp.read()

In [8]:
print(rules[:1000])

Munchkin brings you the essence of the dungeon-crawling experience... without all that messy roleplaying!
This game includes 168 cards, one six-sided die, and these rules. Three to six can play. You will need 10 tokens (coins, poker chips, whatever – or any gadget that counts to 10) for each player.

# Setup
Divide the cards into the Door deck and the Treasure deck. Shuffle both decks. Deal four cards from each deck to each player.

# Card Management
Keep separate face-up discard piles for the two decks. You may not look through the discards unless you play a card that allows you to! When a deck runs out, reshuffle its discards. In Play: These are the cards on the table in front of you, showing your Race and Class (if any) and the Items you are carrying. Continuing Curses and some other cards also stay on the table after you play them. Cards in play are public information and must be visible to the other players.
Your Hand: Cards in your hand are not in play. They don’t help you, but t

In [11]:
nlp = English()
nlp.add_pipe("sentencizer")

<spacy.pipeline.sentencizer.Sentencizer at 0x7c1e3f7c9dd0>

In [12]:
rules_texts = [s.text.strip() for s in nlp(rules).sents]
rules_texts[:20]

['Munchkin brings you the essence of the dungeon-crawling experience... without all that messy roleplaying!',
 'This game includes 168 cards, one six-sided die, and these rules.',
 'Three to six can play.',
 'You will need 10 tokens (coins, poker chips, whatever – or any gadget that counts to 10) for each player.',
 '# Setup\nDivide the cards into the Door deck and the Treasure deck.',
 'Shuffle both decks.',
 'Deal four cards from each deck to each player.',
 '# Card Management\nKeep separate face-up discard piles for the two decks.',
 'You may not look through the discards unless you play a card that allows you to!',
 'When a deck runs out, reshuffle its discards.',
 'In Play: These are the cards on the table in front of you, showing your Race and Class (if any) and the Items you are carrying.',
 'Continuing Curses and some other cards also stay on the table after you play them.',
 'Cards in play are public information and must be visible to the other players.',
 'Your Hand: Cards in

In [13]:
faq_link = 'https://munchkin.game/gameplay/faq/'
response = requests.get(faq_link)
soup = BeautifulSoup(response.content, "html.parser")
faq_document = soup.find('div', id='main').get_text()

In [15]:
faq_document = re.sub('\n\n+', '\n\n', faq_document)
print(faq_document[:1000])



FAQ

Frequently Asked Questions for all Munchkin Games, Supplements, and Accessories
Updated April 20, 2020
If you can't find your answer here, check the errata, to make sure we didn't make a mistake.
If you want to check the FAQ for a specific game, you can head straight to the Table of Contents.
General Notes
Every Munchkin set puts its own spin on the game, but they do have many rules in common. This first section deals with some of those.
All answers in this FAQ refer to editions of Munchkin published in 2010 or later. If you have a game published before then, look at the 2010 Munchkin Change Log for a detailed list of changes, set by set.
June 2018 update: We are no longer supporting the Epic Munchkin rules, so all answers here (outside of the Epic Munchkin section itself) assume that you are playing a regular game ending at Level 10. See that section for more details.
July 2018 update: We have made two global rule updates to all Munchkin games, and these updates will affect som

In [16]:
faq_texts = faq_document.split('\n')
faq_texts = [text for text in faq_texts if text.strip()]

In [None]:
# faq_texts[170:180]

In [None]:
# faq_texts

In [None]:
faq_texts_clear = []
for i in tqdm(range(len(faq_texts))):
    sent = faq_texts[i]
    if i <44:
        faq_texts_clear.append(sent)
    if i > 143:
        if sent[:2] == 'Q.' and faq_texts[i+1][:2] == 'A.':
            faq_texts_clear.append(sent[2:] + " " + faq_texts[i+1][2:])

print(len(faq_texts), len(faq_texts_clear))
faq_texts = faq_texts_clear

100%|██████████| 1439/1439 [00:00<00:00, 568377.76it/s]

1439 585





Our corpus for retrieval will be just a concatenation of `faq_texts` and `rules_texts`. We will search over it given our question, retrieve the most relevant texts and pass them to the model as context to answer the question.

In [None]:
corpus = faq_texts + rules_texts
len(corpus)

888

# TF-IDF retrieval

In this section we build a simple TF-IDF retrieval method.

In [None]:
index = defaultdict(Counter)
term_counts = Counter()

tok_list = []
for text_num, text in enumerate(corpus):
    tokens = tokenizer.tokenize(text.lower())
    for tok in tokens:
        index[tok][text_num] += 1
        term_counts[tok] += 1


In [None]:
num_texts = len(corpus)
num_terms = len(index)

term_to_idx = {t: n for n, (t, _) in enumerate(term_counts.most_common())}

df_counts = []
df_doc_idxs = []
df_term_idxs = []
for term, text_counts in index.items():
    for text_num, count in text_counts.items():
        df_counts.append(count)
        df_doc_idxs.append(text_num)
        df_term_idxs.append(term_to_idx[term])

In [None]:
tf_mat = scipy.sparse.coo_array(
    (df_counts, (df_doc_idxs, df_term_idxs)),
    shape=(num_texts, num_terms)
).tocsr()

idf = np.log(num_texts / (tf_mat > 0).sum(0)).reshape(1, -1)
tfidf_mat = tf_mat * idf


In [None]:
def get_query_tf_vec(query, tokenizer, num_terms):
    q_vec = np.zeros(num_terms, dtype=int)
    toks = tokenizer.tokenize(query.lower())
    for tok in toks:
        if tok in term_to_idx:
            q_vec[term_to_idx[tok]] += 1
    return q_vec

def tfidf_dot_product_for_query(q, tfidf_mat, idf, tokenizer):
    num_terms = tfidf_mat.shape[1]
    query_vec = get_query_tf_vec(q, tokenizer, num_terms)
    return tfidf_mat @ (query_vec * idf.reshape(-1))


def get_top_k_texts_for_query(q, k, corpus, tfidf_mat, idf, tokenizer):
    if k == 0:
        return []
    top_k_indices = np.argsort(tfidf_dot_product_for_query(q, tfidf_mat, idf, tokenizer))[-k:]
    return [corpus[i] for i in top_k_indices]


top_k = 5
contexts = get_top_k_texts_for_query('What level should I be to win?', top_k, corpus, tfidf_mat, idf, tokenizer)
contexts

[" Can I play Duck of Earl if I am Level 1 and can't lose a level? What if I'm Level 9 and can't gain a level unless I kill a monster? And what if I'm using a special trick die so I'm guaranteed to roll a 6?  You can still play the card in all these circumstances, but you don't lose/gain the level. Normally, using a trick die would be very much against the spirit of the rules, but the text on this card leaves that door wide open. Good . . . luck?",
 'You cannot use a card or ability to force someone to help you if the combat is for the win. If you have forced someone to help and then the combat becomes one for the win, your helper is kicked out of the fight without penalty and you must fight alone (or ask someone to help – good luck with that). On the other hand, if you force someone to help and they would win the game, they get to stay in . . . be careful whose help you accept!',
 ' The Dungeon of Extra Effort says that to win the game we must make it to Level 11, but that Level 10 an

And the truncated version is:

In [None]:
# truncate_tokens(contexts)

# Dataset

Below is the code for loading the train dataset and computation of accuracy metric for our solution.

In [17]:
!kaggle competitions download -c nlp-nup-2024-hw2 -f train.csv

Downloading train.csv to /content
  0% 0.00/172k [00:00<?, ?B/s]
100% 172k/172k [00:00<00:00, 46.9MB/s]


In [18]:
ds_train = pd.read_csv('train.csv')
ds_train['options'] = ds_train['options'].map(eval)
ds_train = ds_train.to_dict('records')

In [19]:
ds_train[:2]

[{'id': 0,
  'question': 'What does the instruction "Gotta sing it!" on Make Bacon Pancakes refer to?',
  'options': ['The recipe of making bacon pancakes',
   'The tune of a popular song',
   'The rules of the game',
   'The name of the monster'],
  'answer': 1},
 {'id': 1,
  'question': 'When is it appropriate to utilize the Buried Treasure ability of the Pirate?',
  'options': ['After defeating a monster in combat.',
   'When you have not encountered a monster during your turn.',
   'When you are Running Away from a monster.',
   'During the Loot The Room phase, after Looking For Trouble.'],
  'answer': 1}]

In [None]:
k = 3
correct = 0
total = len(ds_train)
mean_context_size = 0
max_context_size = 0
for sample in tqdm(ds_train):
    question = sample['question']
    options = sample['options']
    contexts = get_top_k_texts_for_query(question, k, corpus, tfidf_mat, idf, tokenizer)
    context_size = sum(len(tokenizer.tokenize(ctx)) for ctx in contexts)
    mean_context_size += context_size / len(ds_train)
    max_context_size = max(max_context_size, context_size)
    contexts = truncate_tokens(contexts)
    prompt, choices = prompt_maker(question,
                                   contexts,
                                   options)


    choices_scores = score_options(prompt, choices, model, tokenizer, device)
    choice_idx = choices_scores.argmax().item()
    choice = choices[choice_idx]
    correct += choice_idx == sample['answer']
print(f'Top-k: {k}\nAccuracy: {correct / total:.3f}\n')

100%|██████████| 537/537 [02:04<00:00,  4.30it/s]

Top-k: 3
Accuracy: 0.702






0.693

# Test Set and Submission

The only difference of the test set from the train set is that it does not provide labels.

In [None]:
ds_test = pd.read_csv('test.csv')
ds_test['options'] = ds_test['options'].map(eval)
ds_test = ds_test.to_dict('records')

In [None]:
# ds_test[:2]

In [None]:
k = 3
submission = []
for sample in tqdm(ds_test):
    question = sample['question']
    options = sample['options']
    id_sample = sample['id']
    contexts = get_top_k_texts_for_query(question, k, corpus, tfidf_mat, idf, tokenizer)
    contexts = truncate_tokens(contexts)
    prompt, choices = prompt_maker(question,
                                   contexts,
                                   options)

    choices_scores = score_options(prompt, choices, model, tokenizer, device)
    choice_idx = choices_scores.argmax().item()
    submission.append({'id': id_sample, 'answer': choice_idx})



100%|██████████| 538/538 [02:06<00:00,  4.24it/s]


In [None]:
df = pd.DataFrame(submission)
df.to_csv('submission.csv', index=False)