MTG Judge RAG
---------------------------------------------------
This is a simple Python script for building an AI-powered MTG rules assistant
using Retrieval-Augmented Generation (RAG) with OpenAI + FAISS.

- Loads the Comprehensive Rules from a text file.
- Splits rules into chunks.
- Creates embeddings with OpenAI.
- Stores them in ChromaDB for fast search (not using FAISS due to py versioning)
- Lets you ask questions, retrieves relevant rules, and asks the LLM to answer.

In [117]:
# -------- IMPORTS --------
import os
import re
import json
import chromadb

from openai import OpenAI
# client = OpenAI(api_key=OPENAI_API_KEY)   # don’t pass api_key explicitly

from dotenv import load_dotenv
import ast # to convert from string to dict


In [95]:
# -------- CONFIG --------
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
EMBED_MODEL = "text-embedding-3-large"
CHAT_MODEL = "gpt-4o-mini"
CHROMA_DB_DIR = "./chroma_db"
os.makedirs(CHROMA_DB_DIR, exist_ok=True) # to create folder if it doesn't exist
RULES_FILE = "./comprehensive-rules.txt"
CARDS_FILE = "./clean-standard-cards.json"
CHUNK_SIZE = 700 # words approximation
TOP_K = 6

In [96]:
# -------- INITIALIZATION --------
client = OpenAI(api_key=OPENAI_API_KEY)
load_dotenv()

True

In [97]:
# -------- HELPER LOAD RULES --------
def load_rules(path):
    """Load the MTG comprehensive rules from a text file."""
    if not os.path.exists(path):
        print(f"Rules file not found at {path}")
        return []

    docs = []
    with open(path, "r", encoding="utf-8", errors="ignore") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            # Rules usually like: 603.1. Some text
            match = re.match(r"^(\d{1,3}(?:\.\d+)+)\s+(.*)$", line)
            if match:
                rule_id, body = match.groups()
                docs.append({
                    "id": f"CR:{rule_id}",
                    "text": f"{rule_id} {body}",
                    "rule_id": rule_id,
                    "source": "Comprehensive Rules"
                })
    return docs

In [99]:
# -------- HELPER LOAD CARDS --------
def load_cards(path):
    """Load MTG card data from your JSON export."""
    if not os.path.exists(path):
        print(f"Card file not found at {path}")
        return []

    with open(path, "r", encoding="utf-8") as f:
        cards = json.load(f)

    docs = []
    for c in cards:
        # Skip cards without names or text
        if "name" not in c or not c.get("originalText"):
            continue

        # Construct a searchable text block for embedding
        text_parts = [
            f"Name: {c['name']}",
            f"Mana Cost: {c.get('manaCost', '')}",
            f"Types: {' '.join(c.get('types', []))}",
            f"Subtypes: {' '.join(c.get('subtypes', []))}",
            f"Abilities/Keywords: {', '.join(c.get('keywords', []))}",
            f"Text: {c['originalText']}"
        ]

        # Add rulings (big chunk but useful)
        rulings = c.get("rulings", [])
        if rulings:
            rulings_text = " | ".join(r["text"] for r in rulings if "text" in r)
            text_parts.append(f"Rulings: {rulings_text}")

        full_text = "\n".join(text_parts)

        docs.append({
            "id": f"CARD:{c['uuid']}",   # use UUID for uniqueness
            "text": full_text,
            "source": "Card Database",
            "card_name": c["name"],
            "manaCost": c.get("manaCost", ""),
            "types": ", ".join(c.get("types", [])),       # FIXED: stringify list
            "subtypes": ", ".join(c.get("subtypes", [])), # FIXED: stringify list
            "keywords": ", ".join(c.get("keywords", [])), # FIXED: stringify list
            "rarity": c.get("rarity", "")
        })

    print(f"Loaded {len(docs)} cards from {path}")
    return docs


In [100]:
# -------- HELPER CHUNK TEXT --------
def chunk_text(text, chunk_size=CHUNK_SIZE):
    """Split text into smaller chunks so embeddings don't get too big."""
    sentences = re.split(r'(?<=[.!?]) +', text)
    chunks = []
    current = []
    length = 0

    for s in sentences:
        tokens = len(s.split())
        if length + tokens > chunk_size:
            chunks.append(" ".join(current))
            current = [s]
            length = tokens
        else:
            current.append(s)
            length += tokens
    if current:
        chunks.append(" ".join(current))

    return chunks

In [101]:
# -------- HELPER BUILD INDEX --------
def build_index():
    """Create ChromaDB collection from rules + card data."""
    client = OpenAI(api_key=OPENAI_API_KEY)

    print("Loading rules...")
    rules = load_rules(RULES_FILE)

    print("Loading cards...")
    cards = load_cards(CARDS_FILE)  # add this

    all_docs = rules + cards  # merge datasets

    texts, metas, ids = [], [], []

    for d in all_docs:
        chunks = chunk_text(d["text"])
        for i, ch in enumerate(chunks):
            texts.append(ch)
            metas.append(d)
            ids.append(f"{d['id']}_{i}")

    if not texts:
        raise ValueError("No valid chunks found to embed.")

    print(f"Total chunks: {len(texts)}")

    # Create embeddings
    embeddings = client.embeddings.create(model=EMBED_MODEL, input=texts)
    vecs = [d.embedding for d in embeddings.data]

    # Initialize Chroma client
    chroma_client = chromadb.PersistentClient(path=CHROMA_DB_DIR)

    # Drop old collection (clean rebuild)
    try:
        chroma_client.delete_collection("mtg_data")
    except:
        pass

    collection = chroma_client.get_or_create_collection(name="mtg_data")

    # Add to Chroma
    collection.add(
        ids=ids,
        embeddings=vecs,
        documents=texts,
        metadatas=metas
    )

    print("Index built and saved with ChromaDB!")




In [102]:
# -------- HELPER SEARCH INDEX --------
def search_index(query, top_k=TOP_K):
    """Search ChromaDB for relevant rule chunks."""
    query = query.strip()
    if not query:
        raise ValueError("Empty query provided.")

    # client = OpenAI()
    emb = client.embeddings.create(model=EMBED_MODEL, input=[query])
    vec = emb.data[0].embedding

    chroma_client = chromadb.PersistentClient(path=CHROMA_DB_DIR)
    collection = chroma_client.get_or_create_collection(name="mtg_rules")

    results = collection.query(query_embeddings=[vec], n_results=top_k)

    docs = []
    for i, doc in enumerate(results["documents"][0]):
        docs.append({
            "text": doc,
            "meta": results["metadatas"][0][i]
        })
    return docs

In [103]:
# -------- HELPER GENERATE SUBQUERIES --------
def generate_subqueries(query, n=10):
    """Chain of Thought decomposition function. Use the LLM to break a user query into smaller sub-questions."""
    #client = OpenAI(api_key=OPENAI_API_KEY)
    prompt = f"""
    Break down the following Magic: The Gathering rules question into {n} smaller, 
    more specific sub-questions that cover timing, abilities, rules interactions, 
    and possible edge cases. Return them as a numbered list.

    Original Question: {query}
    """
    resp = client.chat.completions.create(
        model=CHAT_MODEL,
        temperature=0.2,
        messages=[
            {"role": "system", "content": "You are an expert MTG judge assistant."},
            {"role": "user", "content": prompt}
        ]
    )
    text = resp.choices[0].message.content
    subqueries = [line.strip("0123456789. ") for line in text.splitlines() if line.strip()]
    return subqueries

In [142]:
# -------- HELPER JSON PARSE --------
def safe_json_parse(text):
    """Function to safely parse from string to json. Converts response from gpt to json"""
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        # Try to "repair" common JSON issues
        fixed = text.strip()
        if fixed.startswith("```"):
            fixed = fixed.split("```")[1]  # strip markdown fences
        try:
            return json.loads(fixed)
        except:
            return {"error": "Failed to parse JSON", "raw": text}


In [None]:
# -------- HELPER ANSWER WITH SUBQUERIES --------
def answer_with_subqueries(query):
    """Break question into subqueries, search index for each, and generate final answer."""
    # Step 1: Get subqueries
    subqueries = generate_subqueries(query, n=10) #* 10 subqueries

    # Step 2: Collect context from all subqueries
    all_context = []
    for sq in subqueries:
        results = search_index(sq, top_k=5)  # use your existing search_index
        for i, r in enumerate(results, 1):
            all_context.append(f"Subquery: {sq}\n- Source: {r['meta'].get('source', '')}\n- Text: {r['text']}")

    context = "\n\n".join(all_context)

    response_format = """
    Please provide a structured answer as a single json file with the following properties:
        
    - "question": a string with a rephrased version of the user question in the most clear form,
    - "single_word_answer": a string with a single word, either: "Yes", "No" or "Unclear" that defines the response of the user query
    - "short answer": short paragraph with a summary of the answer
    - "full_explanation": a string with the detailed reasoning with rules and card interactions,
    - "sources: a string" with that cites the full text of rules used for the response and also the card text used for the reasoning.

    No other text should be provided, just the json structure.

    IMPORTANT. Asume that the user might have unclear, missing or conflicting information. If that happends feel free to let the user know. You don't need to force a yes or no answer. You can say that you are unclear and need a query that is more clear. Even in this case, use the above provided structure.
    """

    # Step 3: Ask LLM for final structured response
    system_prompt = f"""
    You are an expert Magic: The Gathering judge assistant.
    A user has a question about card interactions or rules.

    Use these sources (rules + card texts):
    {context}

    Answer format:
    {response_format}
    """

    user_prompt = f"""
    The question from the user is:
    {query}
    """
    
    client1 = OpenAI(api_key=OPENAI_API_KEY)
    resp = client1.chat.completions.create(
        model=CHAT_MODEL,
        temperature=0, # lowering temperature for more accurate and consistant response according to rules
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )

    #* SECONDARY JUDGE VERIFICATION
    judge2_system_prompt = f"""
    You are an expert Magic: The Gathering high judge.
    You will be given:

    - A user’s question
    - A judge’s ruling on that question
    - All the context provided by the judge

    Your task is to carefully review and analyze the information. Then, determine whether the context adequately supports the judge’s ruling. Based on your analysis, you may either accept or deny the ruling.

    Your response should follow this format:

    - If you agree with the ruling, respond: Accepted
    - If you disagree with the ruling, respond: Denied, [include context explaining the reason for denial, also add recommendations for more context to extract from the rules]

    The original user question is:
    {query}
    """

    judge_prompt = f"""
    The context brought by the initial ruling judge was:
    {context}

    The final response from the initial judge was:
    {resp.choices[0].message.content}
    """

    client2 = OpenAI(api_key=OPENAI_API_KEY) # new model initialization for new judge
    resp2 = client2.chat.completions.create(
        model=CHAT_MODEL,
        temperature=0, # lowering temperature for more accurate and consistant response according to rules
        messages=[
            {"role": "system", "content": judge2_system_prompt},
            {"role": "user", "content": judge_prompt}
        ]
    )

    judge2_response = resp2.choices[0].message.content

    if judge2_response.strip().startswith("Accepted"):
        return ast.literal_eval(resp.choices[0].message.content) # converts string to dict
    
    #* if judge 2 denied the ruling, ask judge 1 to look for more information and check ruling again
    new_subqueries = generate_subqueries(judge2_response, n=10) #* 10 subqueries

    # Step 2: Collect context from all subqueries
    all_context = []
    for sq in new_subqueries:
        results = search_index(sq, top_k=5)  # use your existing search_index
        for i, r in enumerate(results, 1):
            all_context.append(f"Subquery: {sq}\n- Source: {r['meta'].get('source', '')}\n- Text: {r['text']}")

    context = context + "\n\n".join(all_context)

    new_prompt = f"""
    A judge just ruled your response invalid due to lack of context. Please use the provided context for a better understanding of the rule and come up with a better response.

    secondary judge response:
    {judge2_response}

    new context provided:
    {context}

    use the same structe as requested in the system prompt
    """

    resp3 = client1.chat.completions.create(
        model=CHAT_MODEL,
        temperature=0, # lowering temperature for more accurate and consistant response according to rules
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
            {"role": "assistant", "content": resp.choices[0].message.content},
            {"role": "user", "content": new_prompt}
        ]
    )

    return ast.literal_eval(resp3.choices[0].message.content)



In [None]:
        # return {
        #     "single_word_answer": "Denied",
        #     "question": query,
        #     "full_explanation": f"""
        #         We’re sorry, our virtual judges were unable to reach an agreement on a final response to your question.

        #         Reason: The original judge’s ruling was: {judge2_response}

        #         Please try asking your question again, this time with clearer wording to help us provide a more definitive answer.
        #     """
        # }

In [105]:
# -------- BUILDING INDEX --------
build_index()  # only first time

Loading rules...
Loading cards...
Loaded 90 cards from ./clean-standard-cards.json
Total chunks: 91
Index built and saved with ChromaDB!


# Single testing

In [128]:
single_question = "Can there be infinite or multiple cleanup steps triggered by effects like Kozilek plus discard effects?"
response = answer_with_subqueries(single_question)
print(response)

{
    "question": "Can there be infinite or multiple cleanup steps triggered by effects like Kozilek plus discard effects?",
    "single_word_answer": "No",
    "short_answer": "There cannot be infinite or multiple cleanup steps triggered by effects like Kozilek's ability. The cleanup step is a specific phase in the turn structure that occurs once, and while triggered abilities can occur during this step, they do not create additional cleanup steps.",
    "full_explanation": "In Magic: The Gathering, the cleanup step is a defined part of the turn structure that occurs after the end step. During the cleanup step, players discard down to their maximum hand size, and any abilities that trigger at this time resolve. However, the cleanup step itself does not repeat or create additional cleanup steps. Kozilek, Butcher of Truth has a triggered ability that allows you to draw cards when you discard, but this does not lead to an infinite loop or multiple cleanup steps. The rules state that the 

# Multiple testing

In [140]:
with open("easy-questions.json", "r", encoding="utf-8") as f:
    easy_questions = json.load(f)
with open("hard-questions.json", "r", encoding="utf-8") as f:
    hard_questions = json.load(f)
with open("own-questions.json", "r", encoding="utf-8") as f:
    own_questions = json.load(f)

# all_questions = easy_questions + hard_questions + own_questions
all_questions = own_questions

correct_answers_count = 0

judge_ruling_conflict_count = 0
judge_ruling_conflict_questions = []

unclear_questions_count = 0
unclear_questions = []

wrong_answered_questions = []

for question in all_questions:
    # response = answer_question(question["text"])
    response_dict = answer_with_subqueries(question["text"])
    print(response_dict)
    # response = answer_with_subqueries(question.text)
    if response_dict["single_word_answer"] == question["answer"]:
        correct_answers_count += 1
    elif response_dict["single_word_answer"] == "Denied":
        judge_ruling_conflict_count += 1
        judge_ruling_conflict_questions.append({"question": question, "response_dict": response_dict})
    elif response_dict["single_word_answer"] == "Unclear":
        unclear_questions_count += 1
        unclear_questions.append({"question": question, "response_dict": response_dict})
    else:
        wrong_answered_questions.append({"question": question, "response_dict": response_dict})

print(f"correct answers: {correct_answers_count}/{len(all_questions)}")

print(f"judge ruling conflict: {judge_ruling_conflict_count}/{len(all_questions)}")
for question in judge_ruling_conflict_questions:
    print(question)

print(f"judge ruling conflict: {unclear_questions_count}/{len(all_questions)}")
for question in unclear_questions:
    print(question)

print(f"incorrect answers: {len(all_questions)-correct_answers_count-judge_ruling_conflict_count-unclear_questions_count}/{len(all_questions)}")
for question in wrong_answered_questions:
    print(question)


{
    "question": "Can imprinting Time Walk on Panoptic Mirror lead to infinite turns?",
    "single_word_answer": "Unclear",
    "short_answer": "While imprinting Time Walk on Panoptic Mirror allows you to take an extra turn, achieving infinite turns requires additional interactions or cards that enable multiple activations of Panoptic Mirror within the same turn or during extra turns.",
    "full_explanation": "Imprinting Time Walk on Panoptic Mirror allows you to activate its ability to cast Time Walk, which grants you an extra turn. However, you can only activate Panoptic Mirror's ability once per turn. To achieve infinite turns, you would need to find a way to activate Panoptic Mirror multiple times in a single turn or during the extra turns gained from Time Walk. This could involve using other cards that allow you to untap or reuse Panoptic Mirror, or cards that grant additional actions or turns. Without such interactions, you cannot achieve infinite turns solely with Panoptic Mi

TypeError: string indices must be integers, not 'str'

# Results



## Test: No temperature set and no subqueries.

### Correct answers: 30/45

## Test: Temperature set to 0.1 and no subqueries.

### Correct answers: 35/45

## Test: Temperature set to 0.1 and 5 subqueries.

### Correct answers: 37/45

#### Incorrect answer to:
- Does scry let you look at cards and choose to put some on top or bottom of your library?
- If you draw more than seven cards, can you keep them all if no other effect limits your hand size?
- Can there be infinite or multiple cleanup steps triggered by effects like Kozilek plus discard effects?
- If a creature phases out and back in, does it lose summoning sickness if it had it before?
- Are special actions like playing a land or turning a face-down creature face-up things opponents cannot respond to?
- If I imprint Time Walk on Panoptic Mirror, do I get infinite turns?
- I am being attacked by Axebane Ferox and I declare Aegis Turtle as a blocker. But before assigning damage, I play Bounce Off on Aegis Turtle. Do I still receive 4 damage from Agonasaur Rex?
- Someone is playing Flashfreeze on one of my spells, Can I play Aven Interrupter on top of Flashfreeze so my initial spell can be resolved?

## Test: Temperature set to 0 and 10 subqueries.

- 8 minutes for 45 queries (10 seconds per query)

### Correct answers: 39/45

#### Incorrect answer to:
- Does scry let you look at cards and choose to put some on top or bottom of your library?. Right answer: Yes. Response: No
- Can there be infinite or multiple cleanup steps triggered by effects like Kozilek plus discard effects?. Right answer: Yes. Response: No
- Are continuous effects applied in a specific layered system, such as type-changing, ability additions, P/T changes, in numbered order?. Right answer: Yes. Response: No
- Are special actions like playing a land or turning a face-down creature face-up things opponents cannot respond to?. Right answer: Yes. Response: No
- If I imprint Time Walk on Panoptic Mirror, do I get infinite turns?. Right answer: Yes. Response: No
- Someone is playing Flashfreeze on one of my spells, Can I play Aven Interrupter on top of Flashfreeze so my initial spell can be resolved?. Right answer: Yes. Response: No

## Test: Temperature set to 0, 10 subqueries, top_k set to 5 (previously 3) and added secondary judge. 

- 8 minutes for 45 queries (10 seconds per query)

### Correct answers: 39/45