This page will describe in detail what techniques were used to let the AI be both fast and challenging. To have such a balanced AI, trade offs are needed which sometimes sacrifice making the best choice for making a good enough choice at a much higher speed.
These are the characteristics of the AI:
Minimax game tree :
- Alpha-beta pruning and score caching
- Local selection of best choices
- Find the best choice for the time AI can lookahead
- Difficulty level : number of main phases used for lookahead
- In function of the number of choices
- Game state can be fully copied to evaluate choices in parallel
Every game state change is done with simple actions :
- Moderate memory consumption for deep trees
- Fast backward tree traversing thanks to undo support
- Iterative scoring instead of scoring the full game situation
- The scoring system has a big influence on how the AI plays
Specific algorithms for :
- Paying mana costs
- Playing spells and abilities
- Picking targets and colors
- Declaring attackers and blockers
Ignores unknown or fuzzy facts :
- Cards in opponent's hand or library
- Untapped lands of opponent
Unknown cards for the AI are all cards in the libraries and in the hand of opponent. All unknown cards are treated as a colorless 1/1 creature token with defender, cannot be countered and shroud and with a mana cost of 8.
The game progress is controlled with a simple state machine that keeps the current phase and step of the game. In a loop until the game is finished, depending on the current state, the game progresses to the next state. Notice that a phase and step do not have exactly the same meaning as in the actual game rules.
These are the phases in order :
- First Main
- Begin of Combat
- Declare Attackers
- Declare Blockers
- Combat Damage
- End of Combat
- Second Main
- End of Turn
These are the possible steps in each phase, the first two always occur a single time, the other three can occur multiple times :
- Next Phase
- Active Player
- Other Player
Next to this, the game also keeps an ordered list of events. When there are events in this list, the game will execute the oldest event until there are no more events left. An event is always composed out of the following :
- optional choice picker
- optional data
- event action
Here is where the real work of the AI comes into play. When the player controlling the event is an AI player and a choice must be made, then it must try to find the best choice. Limiting the number of choices that can be made has a big influence on the speed of the AI. These are some of the decisions made to accomplish this :
- Avoid cards that have multiple targets like Incremental Blight or Arc Lightning
- No Planeswalker cards because each creature can choose to either attack the opponent or a Planeswalker controlled by the opponent and combat is already stressful
For each possible type of choice that can be made, there is a specific AI to determine and pick choices. The choice of an event can be composed out of one or more of these choices.
Passing priority is always an option. When the top item on the stack is already controlled by the AI, it will always let that item resolve first and pick passing priority as an option.
The other options come from cards or abilities that can be played. There are however a lot of optional checks to limit the valid options :
- timing : determines in function of game state if a card or ability should be played
- priority : determines in what order cards and abilities should be played
- independent abilities : when two permanents have a common activated ability that does not influence the permanent, then only pick one
- maximum number of times an activated ability should be played by AI in a turn, very useful for abilities without cost
Some examples of timings assigned to cards and abilities :
- Main : only play during a main phase
- First or Second Main : only play during first or second main phase
- Flash : play on opponent's turn in declare attackers or end of turn phase
- Removal : play in main phase, declare blockers phase or when there is an item of opponent on top of stack
- Counter : play when there is a spell of opponent on stack
The priority of cards and abilities is determined by :
- Cards have a sequence number that determines a fixed order in which they can be played
- Timings also have a priority that imposes the order in which timings can be played
The combination of these features seriously limits the number of options at the cost of not always making the best choice.
In Magarena, there is no mana pool. This means that mana abilities can only be played when paying a mana cost. A mana ability can also only produce a single mana, so there is no support for e.g. the Ravnica bounce lands. When the AI has to pay a mana cost, it will only generate all possible unique options when the choice is made at the top level of the tree. With unique options is meant that when one red mana must be payed and the AI controls ten Mountains, it will have a single option with one of the Mountains. When the choice is not at the top level, a system called delayed paying of mana costs is used. This system will check if the AI can still pay the mana cost, and instead of actually paying it, it is added to a combined delayed mana cost for the turn. So instead of evaluating all options, the AI can merely check if it can still pay for the cost at that time. The code to do this check is very fast. Then at the end of the turn the most optimal way to pay for the combined mana cost is executed. This system works very well for most lands, but when man lands or creature permanents have a mana ability, things become trickier. The creature could become tapped by e.g. attacking because the cost was not paid yet. To prevent this the game also generates hidden exclude choices at the begin of a turn, which chooses for such permanents to either generate mana or to be able to tap to attack.
Magarena supports variable costs with a single X, where X is always minimum one, which is in most cases necessary to produce an effect. For such costs, all possible values for X are generated, also when paying delayed costs. Good thing is that this generates still much less options than considering every possible way to pay for an X cost when traversing the tree.
Adding a mana pool would certainly be possible for the player. The player would then be allowed to play mana abilities at any time. The AI would still only play mana abilities when paying for a mana cost. Cards like Dark Ritual would add mana to the pool and then that mana would be used by the AI before paying the remaining mana cost with mana abilities. When using delayed costs, that mana would remain in the pool until end of turn when the combined mana cost is payed. But the current approach actually makes the game play pretty smooth so this might not be worth adding for just a couple of cards.
Targets can be labeled as neutral, positive or negative. Neutral means that the AI can pick all legal targets. Positive means the AI will only pick permanents it controls or itself, for instance for preventing damage. Negative means the AI will only pick permanents the opponent controls or the opponent, for instance for dealing damage. In some limited cases, this will prevent the AI from for instance dealing damage to a Stuffy Doll it controls which deals the same damage to the opponent.
A target choice can also be given a target choice picker. This picker will be used to locally generate a single optimal option when the choice is not made at the top level of the tree. This greatly reduces the number of game permutations in the tree, but will sometimes pick a suboptimal option. Example of such pickers are for destroying creatures or to give a creature haste until end of turn.
When choosing is not at random, the AI will basically consider all possible combinations of the amount of cards that must be chosen. When choosing is at random, the AI will always pick the first cards in its hand, while the human player picks random cards. This is done to have determinism in evaluating the tree. That is also the reason why effects with a random outcome are avoided in Magarena.
For chosing one of the five colors, there should be always 5 options. But in some cases, custom color pickers are used to locally pick the best color, e.g. the color is determined that is the most common among permanents controlled by the AI for Treva or the opponent for Dromar.
This choice is mostly used for conditional triggers containing may. Usually it is combined with other choices. For no, only a single option must be considered. For yes, all combinations of the optional other options are considered.
For kicker and multi kicker choices, the options are between 0 and the number of times the kicker cost can be payed.
Suppose there are eight candidate attackers 1,2,3,4,5,6,7,8. When the AI orders these by strength, using their power, toughness and some relevant combat abilities, this becomes from best to worst 7,4,3,6,8,2,1,5. Now creature 5 has a power of zero or less so it is removed and we get 7,4,3,6,8,2,1. Creature 8 must attack if able so it is also removed 7,4,3,6,2,1. It will however be present in all possible options. Depending on the relative turn from the starting turn of the tree, the AI will consider a maximum number of attackers to limit the options, for instance 4. First it will create alpha strike options, always removing the worst creature to lower the number of attackers :
6 optional attackers :
5 optional attackers :
Then it creates all other possible options with up to 4 optional attackers :
4 optional attackers :
3 optional attackers :
2 optional attackers :
1 optional attacker :
no optional attackers :
As can be seen from this example, there are already 18 possible options, even when limiting the number of optional attackers.
This is a lot tougher than the declare attackers AI to avoid having too much options to evaluate. The reason is simple, for each attacker there can be multiple blockers and the order in which they block is also important. At the top level of the tree multiple blocking options will be evaluated. At the other levels, only the best option will be retained giving a single option. To find the best blocking options, a scoring system is used which has two variations to calculate the score for the outcome of the combat :
- fast combat scoring : the score is calculated with custom code which only uses power, toughness and some abilities and ignores everything else, meaning it is only partially accurate
- game combat scoring : to calculate the score the actual combat is executed with the game engine, which is more accurate but much slower
When calculating these scores, because they need multiple times the same properties like abilities of the creatures, caching of these properties in the game engine is enabled. The fast combat scoring is used in function of the number of attackers and the relative turn from the starting turn of the tree.
There is also an algorithm to obtain the best candidate blocking options instead of just trying everything that is possible.
First for each attacker, the optimal order of all candidate blockers is determined as follows :
- creatures that cannot be dealt lethal damage by the attacker are in front in order of the amount of lethal damage needed where higher is better
- then creatures are ordered in function of their power where higher is better
Finally for each attacker, the following blocking options are considered :
- no blocking creature
- every single blocking creature which does not deal lethal damage to attacker
- all possible permutations of minimal number of blocking creatures in above order that deal lethal damage to the attacker
As example for the algorithm the following situation is used :
- attacking player attacks with creature A
- defending player can block A with creatures 1,2,3
- the optimal blocking order is determined to be 3,2,1
- creature 3 can deal lethal damage to the attacker
- creature 1 + 2 can deal lethal damage to the attacker
This will give the following blocking options :
no blocker :
single blocker, no lethal damage to attacker :
- A : 2
- A : 1
lethal damage to attacker :
- A : 3
- A : 2,1
This gives 5 blocking options. Compare this with all 11 possible blocking options :
- no blockers : 1 option
- 1 blocker : 3 options (1 or 2 or 3)
- 2 blockers : 6 options (1,2 or 1,3 or 2,1 or 2,3 or 3,1 or 3,2)
- 3 blockers : 1 option
This difference becomes very quickly significant when the number of creatures on the battlefield increases.
When the AI needs to find the best option for a given choice, it will try to evaluate all possible actions in the game until a certain period in time and then give each option the score at that time. The option with the best score or where that score is reached the fastest will be selected as the best option. At the highest difficulty level, Magarena will try to lookahead for 6 main phases which is roughly the equivalent of three turns. This is in many cases an appropriate time frame to make good decisions. For instance creatures that are played in turn N can only attack in turn N + 2 due to the summoning sickness rule, so this already requires a 3 turns time frame. While it would be possible to let Magarena lookahead even more main phases, 6 seems like a good tradeof between speed and strength. The reason why the tree is called Minimax is quite straightforward, when the AI must make a choice it will take the maximum score, but when the opponent must make a choice it will take the minimum score, because obviously the opponent will also try to maximize the score. More on this algorithm can be found on for instance the Wiki page.
When parallel evaluation must be performed by multiple threads, it is very important that the full game state can be copied so that the threads can work on different copies.
The AI will always first evaluate the first option in a single thread. While this is not actually a must, it is very beneficial due to two reasons :
- the game will try to find out the optimal number of main phases, because in some cases the maximum number would require too much time, this works by stopping the evaluation at a certain depth or number of game permutations
- the score pruning and caching for the other options works much better
Notice that the order of the options will certainly have an influence on the speed of the evaluation. When the first option is the best option, the score pruning will work better than when the last option is the best one. Magarena currently does not try to sort the options first to benefit from this. When the first option is evaluated, then all other options are evaluated concurrently with multiple threads, equal to the number of processors. When there are four processors and only three choices, then one processor will be idle. So it is always a single thread per choice.
During evaluation of a choice, all options are iterated. For each option, the event is executed with that option and then the evaluation continues until the end state is reached. This would however mean that to evaluate each option, a copy must be made from the game state at that time. This would be both memory and time consuming. Instead the game engine does all state changes with simple actions like changing a player's life. These actions are recorded in a list and when the game state must be restored, they are undone in reverse order. Keeping this list and the state each action changes requires much less memory and is faster. Another benefit of these actions is that they can return a score. The score accumulated by all actions up to that time is then the score for any given time. This allows very fast score calculation at any given time in the evaluation. To calculate the score for each action, a scoring system is used. This is a component that can be tweaked significantly to determine how well the AI plays.
This technique is used to stop the evaluation in the tree as soon as a score at a given level in the tree can no longer win at a previous level of the tree. Take for instance this example :
- lowest score at level 9 is 2000, option with minimum score is searched (opponent)
- highest score at level 10 is 1800, option with maximum score is searched (AI), this score can still win at the previous level because it is lower than 2000
- highest score at level 10 is 2500, option with maximum score is searched (AI), this score can no longer win at the previous level so stop evaluation
This is a simple but very effective technique. During the evaluation, at the end of certain phases (First Main, End of Combat and Cleanup), a 64 bits game identifier is calculated which reflects the game state at that time. Then a score cache is checked if this state already occurred during a previous evaluation and when it is found the best score for that state is obtained from the cache and must no longer be searched. This technique has however a drawback, namely that different game states can map to the same identifier. During testing the speed benefit was however so significant that this drawback is overshadowed. The score cache is shared between all threads.
This page describes what techniques were used to make Minimax a feasible approach for a game of this complexity. It shows that this requires several compromises to limit the number of options that must be evaluated. Also the game engine must be designed from up front to handle this algorithm efficiently.