MTG Judge RAG
---------------------------------------------------
This is a simple Python script for building an AI-powered MTG rules assistant
using Retrieval-Augmented Generation (RAG) with OpenAI + FAISS.

- Loads the Comprehensive Rules from a text file.
- Splits rules into chunks.
- Creates embeddings with OpenAI.
- Stores them in ChromaDB for fast search (not using FAISS due to py versioning)
- Lets you ask questions, retrieves relevant rules, and asks the LLM to answer.

In [19]:
# -------- IMPORTS --------
import os
import re
import json
import chromadb

from openai import OpenAI
client = OpenAI()   # don’t pass api_key explicitly

from dotenv import load_dotenv
load_dotenv()

True

In [20]:
# -------- CONFIG --------
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
EMBED_MODEL = "text-embedding-3-large"
CHAT_MODEL = "gpt-4o-mini"
CHROMA_DB_DIR = "./chroma_db"
os.makedirs(CHROMA_DB_DIR, exist_ok=True) # to create folder if it doesn't exist
RULES_FILE = "./comprehensive-rules.txt"
CARDS_FILE = "./clean-standard-cards.json"
CHUNK_SIZE = 700 # words approximation
TOP_K = 6

In [21]:
# -------- HELPER LOAD RULES --------
def load_rules(path):
    """Load the MTG comprehensive rules from a text file."""
    if not os.path.exists(path):
        print(f"Rules file not found at {path}")
        return []

    docs = []
    with open(path, "r", encoding="utf-8", errors="ignore") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            # Rules usually like: 603.1. Some text
            match = re.match(r"^(\d{1,3}(?:\.\d+)+)\s+(.*)$", line)
            if match:
                rule_id, body = match.groups()
                docs.append({
                    "id": f"CR:{rule_id}",
                    "text": f"{rule_id} {body}",
                    "rule_id": rule_id,
                    "source": "Comprehensive Rules"
                })
    return docs

In [22]:
# -------- HELPER LOAD CARDS --------
def load_cards(path):
    """Load MTG card data from your JSON export."""
    if not os.path.exists(path):
        print(f"Card file not found at {path}")
        return []

    import json
    with open(path, "r", encoding="utf-8") as f:
        cards = json.load(f)

    docs = []
    for c in cards:
        # Skip cards without names or text
        if "name" not in c or not c.get("originalText"):
            continue

        # Construct a searchable text block for embedding
        text_parts = [
            f"Name: {c['name']}",
            f"Mana Cost: {c.get('manaCost', '')}",
            f"Types: {' '.join(c.get('types', []))}",
            f"Subtypes: {' '.join(c.get('subtypes', []))}",
            f"Abilities/Keywords: {', '.join(c.get('keywords', []))}",
            f"Text: {c['originalText']}"
        ]

        # Add rulings (big chunk but useful)
        rulings = c.get("rulings", [])
        if rulings:
            rulings_text = " | ".join(r["text"] for r in rulings if "text" in r)
            text_parts.append(f"Rulings: {rulings_text}")

        full_text = "\n".join(text_parts)

        docs.append({
            "id": f"CARD:{c['uuid']}",   # use UUID for uniqueness
            "text": full_text,
            "source": "Card Database",
            "card_name": c["name"],
            "manaCost": c.get("manaCost", ""),
            "types": ", ".join(c.get("types", [])),       # FIXED: stringify list
            "subtypes": ", ".join(c.get("subtypes", [])), # FIXED: stringify list
            "keywords": ", ".join(c.get("keywords", [])), # FIXED: stringify list
            "rarity": c.get("rarity", "")
        })

    print(f"Loaded {len(docs)} cards from {path}")
    return docs


In [23]:
# -------- HELPER CHUNK TEXT --------
def chunk_text(text, chunk_size=CHUNK_SIZE):
    """Split text into smaller chunks so embeddings don't get too big."""
    sentences = re.split(r'(?<=[.!?]) +', text)
    chunks = []
    current = []
    length = 0

    for s in sentences:
        tokens = len(s.split())
        if length + tokens > chunk_size:
            chunks.append(" ".join(current))
            current = [s]
            length = tokens
        else:
            current.append(s)
            length += tokens
    if current:
        chunks.append(" ".join(current))

    return chunks

In [24]:
# -------- HELPER BUILD INDEX --------
def build_index():
    """Create ChromaDB collection from rules + card data."""
    client = OpenAI(api_key=OPENAI_API_KEY)

    print("Loading rules...")
    rules = load_rules(RULES_FILE)

    print("Loading cards...")
    cards = load_cards(CARDS_FILE)  # add this

    all_docs = rules + cards  # merge datasets

    texts, metas, ids = [], [], []

    for d in all_docs:
        chunks = chunk_text(d["text"])
        for i, ch in enumerate(chunks):
            texts.append(ch)
            metas.append(d)
            ids.append(f"{d['id']}_{i}")

    if not texts:
        raise ValueError("No valid chunks found to embed.")

    print(f"Total chunks: {len(texts)}")

    # Create embeddings
    embeddings = client.embeddings.create(model=EMBED_MODEL, input=texts)
    vecs = [d.embedding for d in embeddings.data]

    # Initialize Chroma client
    chroma_client = chromadb.PersistentClient(path=CHROMA_DB_DIR)

    # Drop old collection (clean rebuild)
    try:
        chroma_client.delete_collection("mtg_data")
    except:
        pass

    collection = chroma_client.get_or_create_collection(name="mtg_data")

    # Add to Chroma
    collection.add(
        ids=ids,
        embeddings=vecs,
        documents=texts,
        metadatas=metas
    )

    print("Index built and saved with ChromaDB!")




In [25]:
# -------- HELPER SEARCH INDEX --------
def search_index(query, top_k=TOP_K):
    """Search ChromaDB for relevant rule chunks."""
    query = query.strip()
    if not query:
        raise ValueError("Empty query provided.")

    client = OpenAI()
    emb = client.embeddings.create(model=EMBED_MODEL, input=[query])
    vec = emb.data[0].embedding

    chroma_client = chromadb.PersistentClient(path=CHROMA_DB_DIR)
    collection = chroma_client.get_or_create_collection(name="mtg_rules")

    results = collection.query(query_embeddings=[vec], n_results=top_k)

    docs = []
    for i, doc in enumerate(results["documents"][0]):
        docs.append({
            "text": doc,
            "meta": results["metadatas"][0][i]
        })
    return docs

In [34]:
# -------- HELPER ANSWER QUESTION --------
def answer_question(query):
    """Retrieve context and ask the LLM for an MTG answer (TLDR + Explanation + Sources)."""
    results = search_index(query)
    context_blocks = []

    for i, r in enumerate(results, 1):
        text = r["text"]
        meta = r["meta"]
        source = meta.get("source", "Unknown")
        context_blocks.append(f"[{i}] ({source}) {text}")

    context = "\n\n".join(context_blocks)

    user_prompt = f"""
You are an expert Magic: The Gathering judge assistant.
A user has a question about card interactions or rules.

Question:
{query}

Use these sources (rules + card texts):
{context}

Answer format:
1. **Question** - A short rephrasing of the user question.
1. **Short Answer** – a very short and direct answer (1 sentense if possible).
2. **Explanation** – detailed reasoning about interactions, order of operations, triggers and if other situations would change the outcome.
3. **Sources** – cite specific rule numbers or card names from the context.
"""

    client = OpenAI(api_key=OPENAI_API_KEY)
    resp = client.chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": "You are a helpful MTG rules assistant. Always explain clearly."},
            {"role": "user", "content": user_prompt}
        ]
    )
    return resp.choices[0].message.content


In [27]:
# -------- BUILDING INDEX --------
build_index()  # only first time

Loading rules...
Loading cards...
Loaded 90 cards from ./clean-standard-cards.json
Total chunks: 91
Index built and saved with ChromaDB!


In [35]:
# -------- TESTING --------
question = "If I imprint Time Walk on Panoptic Mirror, do I get infinite turns?"
print(answer_question(question)) # right answer: YES, infinite turns

1. **Question** - Does imprinting Time Walk on Panoptic Mirror grant me infinite turns?  
2. **Short Answer** – No, you do not get infinite turns from imprinting Time Walk on Panoptic Mirror.  
3. **Explanation** – Panoptic Mirror allows you to copy a spell you have imprinted on it (in this case, Time Walk) and cast that copy for no cost during your upkeep. However, Time Walk only gives you an extra turn after the current one, and after you cast the imprinted Time Walk from the Mirror, it goes to exile again. You will need to pay its mana cost again to imprint and cast it a second time, which means you can only take one extra turn at a time, not infinite turns. You cannot repeatedly activate Panoptic Mirror without recurring the spell, which would require additional resources each time. Infinite turns would only be possible if you could continuously cast Time Walk without additional costs or conditions, which the Mirror does not facilitate.  
4. **Sources** – Comprehensive Rules, speci

In [36]:
question = "I am being attacked by an 8/8 creature with trample and I defend with a 2/2 creature. But before assigning damage, I play Bounce Off on the defending creature. Do I prevent the damage or will the attacking creature still deal damage to me?"
print(answer_question(question)) # right answer: Full damage is received because of trample.

1. **Question** - Does casting Bounce Off on my defending creature prevent the damage from an attacking 8/8 creature with trample?

2. **Short Answer** – No, casting Bounce Off will not prevent the damage; you will still take damage from the attacking creature.

3. **Explanation** – When an attacking creature with trample is blocked by a creature, it assigns enough damage to the blocker to satisfy lethal damage (in this case, that's 2 damage to the 2/2 blocker) before dealing any remaining damage to the defending player. However, if you cast Bounce Off on your 2/2 creature after it's been declared as a blocker, the attacking creature will already have assigned damage during the combat damage step before the 2/2 creature returns to your hand. The trample mechanic dictates that if the attacker assigns lethal damage to the blocker (which it will, even if it’s only there temporarily), any additional damage is dealt to the defending player. Since the creature is still there for the damage a

In [37]:
question = "I am being attacked by an 8/8 creature and I defend with a 2/2 creature. But before assigning damage, I play Bounce Off on the defending creature. Do I prevent the damage or will the attacking creature still deal damage to me?"
print(answer_question(question)) # right answer: Damage is fully prevented because it doesn't have trample.

1. **Question** - If I use Bounce Off on my defending 2/2 creature during an attack, does that prevent the attacking 8/8 creature from dealing damage to me?

2. **Short Answer** – No, using Bounce Off does not prevent the attacking creature from dealing damage to you.

3. **Explanation** – When you are attacked by an 8/8 creature and you declare your 2/2 creature as a blocker, the combat damage step follows. If you cast **Bounce Off** on the 2/2 creature, it will be returned to your hand before combat damage is assigned. According to the rules of combat, once a creature is declared as a blocker and is then removed from the battlefield (in this case, by being bounced to your hand), the attack still goes through and damage is dealt to you. The rules specify that combat damage is assigned based on which creatures are still on the battlefield for that damage step, and since the defending creature is no longer there to absorb damage, the 8/8 creature will deal its full damage to you. 

4. *

In [None]:
question = "in a combat between two 3/3 creatures, can I sac one of the creatures as a result of it taking letal damage to activate an ability? will it still kill the other creature?"
print(answer_question(question)) # right answer: No, you cannot sacrifice a creature that has been dealt lethal damage. You could sacrifice before assigning damage but the other creature would not receive damage.

1. **TL;DR** – Yes, you can sacrifice one of the creatures after it is assigned lethal damage to activate an ability; however, if it is sacrificed before state-based actions are checked, the other creature will still die from lethal damage.

2. **Explanation** – In Magic: The Gathering, when two 3/3 creatures deal damage to each other in combat, any creature that receives damage equal to or greater than its toughness is considered to have lethal damage. After the combat damage is dealt, state-based actions are checked, and any creature with lethal damage will be put into the graveyard. However, if you have an ability that allows you to sacrifice a creature (for example, an activated ability on another card that allows you to sacrifice a creature), you can choose to use that ability after damage has been dealt but before state-based actions are fully resolved. In that scenario, if you sacrifice your creature after it has received damage (but before it goes to the graveyard), it will not