# Summary

An introduction to the entire project.

# Table of Contents

Refer to the [Project ToC](../00-intro/00-00-project-table-of-contents.ipynb) file for the table of contents for each notebook in the project.

---

# Python

Refer to the [README.md](../../README.md) for the python environment setup.

---

# Introduction

This project is an exploration of applying data science to Magic: The Gathering (MTG), a collectible card game known for its popularity, long history, and intricate gameplay. MTG offers a wealth of data and countless opportunities for analysis.

Although I'm not an MTG expert, I played briefly in 1999-2000 and recently returned to the game after my son became interested. This project is a way for me to deepen my understanding of MTG and provide insights that others might find useful.

I'll analyze three main aspects of MTG: __gameplay__, __collecting__, and __economics__.

Given the complexity of gameplay, my analysis will cover various perspectives: aggregate win rates, micro-level decisions like combat resolution and mulligan choices, and evaluations of components such as card mana efficiency. We will also explore the __metagame__, including deck building, play formats, and drafting strategies

The collecting aspect of MTG is vast, with nearly 100,000 cards printed over the game's history. I’ll examine this by looking into aspects like the probability distributions of collector booster packs.

Finally, the economics of MTG, particularly the secondary market for singles, is of great interest to me. I'll focus on predicting card prices and analyzing price trends for new sets.

---

# Data Science

MTG can be fodder for may aspects data science.  Here is a sampler of concepts I would like to explore:
 __Regression:__ Predict card's expected mana cost given its attributes and keywords.
- __LLMs:__ Generate numeric representations from card descriptions with encoder LLMs.  Use representations for predictive tasks.
- __Network Science:__ Bipartite graphs of deck-card relationships.  Identify communities in card one-mode projection.
- __Graph Neural Networks:__ Predict deck win rate with given card compositions.
- __Bayesian Inference:__ Posterior distribution of a booster pack's value given the pack composition and secondary market prices.
- __Time-Series Analysis:__ Predict card price on secondary market $d$ days after set release.
- __Hidden Markov Models:__ Estimation of board state.  For example, estimate the states from [Quadrant Theory](https://magic.wizards.com/en/news/feature/quadrant-theory-2014-08-20).  These states are 'opening', 'parity', 'winning', or 'losing' state.
- __Reinforcement Learning:__ Maximize the outcome of combat stage, given the board state of potentially attacking and defending creatures.
- __Optimization__: Optimize win rate of a deck, given constraints such as maximum market cost or number of mythic rares.
- __Game Theory__: For mulligans, calculate the expected value of hands using utility theory.

---

## Card Scope

### OTJ Set Description

To keep the problem space tractable, I'll limit the analysis to the [Outlaws of Thunder Junction](https://mtg.fandom.com/wiki/Outlaws_of_Thunder_Junction) (OTJ) set, which was released on April 19, 2024.  

The set contains 276 regular cards comprised of:

- 91 Commons
- 100 Uncommons
- 60 Rares
- 20 Mythic rares 
- 5 Basic lands

Additional "booster fun" cards include:
- 13 Showcase "Wanted Poster" cards
- 60 Extended Art cards
- 13 Borderless cards
- 6 Bundle Basics lands
- 7 Promos

For game play analysis, I'll focus on the 276 regular cards.  For market analysis, I'll include the booster fun cards, as the scarcity and desirability of these cards may affect the secondary market prices and will be interesting to explore.

### Card Data Sources

I'll use the card data generously made available by the tireless folks at the open-source project [MTGJSON](https://mtgjson.com/).  MTGJSON provides a comprehensive database of MTG cards, including card attributes, card text, and card prices.  The data is available in JSON format, which I'll convert to a pandas DataFrame for analysis.

MTGJSON sources a lot of data from [Scryfall](https://scryfall.com/docs/api), which has an excellent webapp for exploring MTG card data.

MTGJSON sources the booster pack composition data from.  The source code provided by [taw](https://github.com/taw) on [github](https://github.com/taw/magic-search-engine?tab=readme-ov-file) contains estimated booster pack composition probabilities.  He also provides a webapp at [mtg.wtf](https://mtg.wtf/).  Note that booster pack composition is proprietary information of Wizards of the Coast, and the above probabilities are estimates.

### Card EDA

Refer to the [UPDATE.ipynb](UPDATE.ipynb) for the exploratory data analysis of the card data.  


## Play Format Scope

### Limited Play
For game play, I will look at limited formats, such as Draft and Sealed Deck.  Constructed formats, such as Standard, Modern, and Legacy, are out of scope for now.  This is due to the complexity of the metagame and the vast number of cards available for deck construction.  

Draft play also allows us to study three types of player skill.  First, there is the skill of drafting the best cards from the draft pool based on one's currently drafted cards.  Second, there is the skill of deck construction from the drafted cards.  Third, there is the skill of playing the deck in a tournament setting. 

Sealed and constructed formats are also interesting, but exclude the drafting skill.  Constructed formats also require a deep understanding of the metagame, which is out of scope for now.


### MTG Arena Draft

I will specifically look at the Draft format on [MTG Arena](https://magic.wizards.com/en/mtgarena).  MTG Arena is an online platform for playing MTG.  The Draft format on MTG Arena is a popular way to play limited MTG.  In the Draft format, players open booster packs and pick one card from the pack.  The remaining cards are passed to the next player.  This process continues until all cards are picked.  Players then construct a deck from the picked cards and play a tournament with the constructed deck.

MTG Arena offers several Draft formats.  I will focus on the [Premier Draft](https://magicarena.fandom.com/wiki/Premier_Draft) format, which is a best-of-one (Bo1) format.  In the Premier Draft format, players draft against other players in a pod of 8 players.  Players play a tournament with the drafted deck.  Players play until they reach 7 wins or 3 losses.  Players have a sideboard and are allowed to change the deck composition between games.

Other formats include Quick Draft, which also Bo1, but the drafting process in against a pool of bots, and play is against players with independent draft pools.  Traditional Draft is a best-of-three (Bo3) format.  Sealed Deck is another limited format where players open 6 booster packs and construct a deck from the opened cards.  These are out of scope for now.

### Draft Play Data Sources

For draft play, I will used data from [17lands](https://www.17lands.com/public_datasets).  They compile data from their user base to provide draft pick data.  The data includes the draft pick order, the cards picked, and the win rate of the deck.

### Draft EDA

Refer to the [UPDATE.ipynb](UPDATE.ipynb) for the exploratory data analysis of the draft data.