##### Welcome to the project showcase file for the Rock-Paper-Scissors-like Neural Networks project.

Before continuing, make sure you read **start_here.ipynb** first.

**This project focuses on the development of a way to train neural networks to fight against opponents in games that resemble rock-paper-scissors.**

This is how we define "**rock-paper-scissors-like**":
- Turn-based game.
- At least 2 players (we allow more than 2).
- Players start with a positive, whole-number score.
    - If their score drops to 0, they are eliminated (at the end of the turn).
- On each turn, players select one move each.
    - There is a fixed selection of moves (e.g. rock, paper, scissors, lizard, spock).
    - Any player can pick any move.
- Each move is used simultaneously (e.g. everyone throws their moves at the same time in rock paper scissors).
    - Moves target every other player, once each.
        - e.g. imagine a game with 3 people. If two people play rock, and one plays paper, the paper hits the rock players once each (eliminating both!).
    - The move changes the score of the target based on the target's move that turn.
        - e.g. if you play rock, and they play scissors, they lose a point... But if you *both* play rock, they don't lose a point.
- The game continues until either 1 or 0 players remain.
    - If one player remains, they win.
    - If no players remain, nobody wins (yes, this can happen. you need have at least 3 players, though).

As a sanity check, we can see that rock-paper-scissors is, indeed, RPS-like: players start with a score of 1. They may choose Rock, Paper, or Scissors. They throw their moves at the same time, and players are eliminated if they play the wrong move (their score drops from 1 to 0).

Another simple example (which we will make extensive use of), a best-of-three RPS game is also RPS-like: it is rock-paper-scissors, but players start with a score of 2 instead of 1 (you have to play the wrong move two times out of three to lose).

---

We will now showcase the applications of this project.

In [1]:
import rpsnetworks.training_manager, rpsnetworks.player_templates, rpsnetworks.schema_templates, rpsnetworks.controller_templates

The following code randomly trains a neural network to play a game of rock paper scissors against an opponent who always plays rock or paper.<br>
It will try to converge on an optimal strategy to beat its opponent the highest % of the time.

In [2]:
RPS_SCHEMA = rpsnetworks.schema_templates.RockPaperScissors()
opponents = [rpsnetworks.player_templates.mixedStrategyPlayer("player1", 1, [0.5, 0.5, 0])]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 10, 10,
    [2, 2, 3], rpsnetworks.player_templates.basicNetworkPlayer, 1,
    opponents, RPS_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 10
Base # of test-games per child: 10
---------------
Schema used: Rock Paper Scissors
Opponent count: 1
---------------
RESULTS:
Network proficiency: 1.000
Move distribution:
rock: 0.000
paper: 1.000
scissors: 0.000


The block above also outputs the results.<br>
Network proficiency is the proportion of games the network won against a final test of 10,000 games against its opponent.

Now we'll train one to play against two players at once:<br>
The first player always plays rock, the second always plays paper.<br>
Both our network and the opponents start with 3 points.

In [16]:
opponents = [
    rpsnetworks.player_templates.mixedStrategyPlayer("player1", 3, [1, 0, 0]),
    rpsnetworks.player_templates.mixedStrategyPlayer("player2", 3, [0, 1, 0])
]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 10, 10,
    [2, 2, 3], rpsnetworks.player_templates.basicNetworkPlayer, 3,
    opponents, RPS_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 10
Base # of test-games per child: 10
---------------
Schema used: Rock Paper Scissors
Opponent count: 2
---------------
RESULTS:
Network proficiency: 0.976
Move distribution:
rock: 0.000
paper: 0.709
scissors: 0.291


Not perfect.<br>
We can swap to another model that is aware of both its health and its opponents.<br>
This will allow the network to develop more a advanced strategy.

In [17]:
opponents = [
    rpsnetworks.player_templates.mixedStrategyPlayer("player1", 3, [1, 0, 0]),
    rpsnetworks.player_templates.mixedStrategyPlayer("player2", 3, [0, 1, 0])
]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 10, 10,
    [4, 2, 3], rpsnetworks.player_templates.playerAwareNetworkPlayer, 3,
    opponents, RPS_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 10
Base # of test-games per child: 10
---------------
Schema used: Rock Paper Scissors
Opponent count: 2
---------------
RESULTS:
Network proficiency: 1.000
Move distribution:
rock: 0.000
paper: 0.250
scissors: 0.750


Typically, the network will train itself here to play paper until the first opponent is defeated. Then, it will play scissors until the second opponent is defeated.

However, it sometimes develops...alternative strategies. For example, it's possible to sometimes play rock and still win 100% of battles.<br>
The fitness function does not punish these silly moves. We are optimizing only for a winning strategy, so there is no need. Introducing extra complexity into the scoring algorithm also reduces performance, and risks introducing biases that make for suboptimal training.

A more nuanced example:<br>
Player 1 plays Rock 50% of the time, Paper 25%, and Scissors 25%.<br>
Player 2 plays Paper 50% of the time, and Scissors 50%.

In [5]:
opponents = [
    rpsnetworks.player_templates.mixedStrategyPlayer("player1", 3, [0.5, 0.25, 0.25]),
    rpsnetworks.player_templates.mixedStrategyPlayer("player2", 3, [0, 0.5, 0.5])
]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 40, 20,
    [4, 2, 3], rpsnetworks.player_templates.playerAwareNetworkPlayer, 3,
    opponents, RPS_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 40
Base # of test-games per child: 20
---------------
Schema used: Rock Paper Scissors
Opponent count: 2
---------------
RESULTS:
Network proficiency: 0.644
Move distribution:
rock: 0.000
paper: 0.253
scissors: 0.747


The optimal strategy here is harder to find - and there is no strategy for a 100% win rate.<br>
Training results will vary. Typically, network proficiency is greater than 60% - but it will sometimes get stuck in bad strategies, like always playing scissors.

Care has been taken to design the training algorithm to reduce the frequency of these sorts of results.

Rock paper scissors is an RPS-like, but many other games are, as well.

We will play a custom game:<br>
There are 5 moves. You can *Attack*, *Defend*, *Special*, *Deflect*, or *Heal*.<br>
If you attack, and your opponent does not defend, they lose 1 point.<br>
If you defend, and your opponent uses special, both of you lose 1 point.<br>
If you use special, and your opponent uses attack, special, or heal, they lose 2 points.<br>
If you deflect, and your opponent uses special, they lose 2 points.<br>
If you heal, you gain 1 point on top of any losses inflicted by the opponent that turn.<br>
The same goes for your opponent for all these rules. (e.g. if you both use special, then you both lose 2 points)<br>
This is an RPS-like. It's a more intricate system to train AI on, and helped expose shortcomings in the training process while developing it.

An opponent who only attacks:

In [6]:
RPG_SCHEMA = rpsnetworks.schema_templates.RPGFascimile()

opponents = [
    rpsnetworks.player_templates.mixedStrategyPlayer("player1", 4, [1, 0, 0, 0, 0]),
]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 10, 10,
    [2, 2, 5], rpsnetworks.player_templates.basicNetworkPlayer, 4,
    opponents, RPG_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 10
Base # of test-games per child: 10
---------------
Schema used: RPG Fascimile
Opponent count: 1
---------------
RESULTS:
Network proficiency: 1.000
Move distribution:
attack: 0.000
defend: 0.000
special: 0.308
deflect: 0.000
heal: 0.692


Typically the training process discovers a strategy involving frequent use of the special move.

We train a network aware of its opponent's health against an opponent who attacks 50% of the time, specials 25% and deflects 25%.

In [7]:
opponents = [
    rpsnetworks.player_templates.mixedStrategyPlayer("player1", 4, [0.50, 0, 0.25, 0.25, 0]),
]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 40, 30,
    [3, 2, 5], rpsnetworks.player_templates.playerAwareNetworkPlayer, 4,
    opponents, RPG_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 40
Base # of test-games per child: 30
---------------
Schema used: RPG Fascimile
Opponent count: 1
---------------
RESULTS:
Network proficiency: 0.791
Move distribution:
attack: 0.000
defend: 0.604
special: 0.000
deflect: 0.000
heal: 0.396


Many times, the training process discovers the following simple, but effective strategy:<br>
*Heal* until I have more health than my opponent.<br>
*Defend*, and the recoil from my opponent using special will eventually cause me to win.

We'll train a network aware of its opponent's and its own health against an opponent who is equally likely to use any move.

In [8]:
opponents = [
    rpsnetworks.player_templates.mixedStrategyPlayer("player1", 4, [0.2, 0.2, 0.2, 0.2, 0.2]),
]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 20, 20,
    [3, 3, 5], rpsnetworks.player_templates.playerAwareNetworkPlayer, 4,
    opponents, RPG_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 20
Base # of test-games per child: 20
---------------
Schema used: RPG Fascimile
Opponent count: 1
---------------
RESULTS:
Network proficiency: 0.797
Move distribution:
attack: 0.000
defend: 0.000
special: 0.300
deflect: 0.000
heal: 0.700


Typically the training process discovers a strategy pretty similar to the last time. It heals until it has the HP advantage (something between 1 and 3, depending on the network), then attacks or specials repeatedly. If it loses the HP advantage, it heals until it gets it back.

Some networks also learn that, while they have the advantage and their opponent is low, to use special.

The network might be defeated if the opponent uses special enough times while the network is trying to get the health advantage - and can also be defeated if it gets greedy (uses special while the opponent is at low health), and the opponent happens to deflect.

Now let's train against an opponent who heals 50% of the time, and does another random move the other 50%.

In [9]:
opponents = [
    rpsnetworks.player_templates.mixedStrategyPlayer("player1", 4, [0.125, 0.125, 0.125, 0.125, 0.5]),
]

train_out = rpsnetworks.training_manager.trainNetwork(
    10, 20, 20,
    [3, 3, 5], rpsnetworks.player_templates.playerAwareNetworkPlayer, 4,
    opponents, RPG_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 10
Children tested per generation: 20
Base # of test-games per child: 20
---------------
Schema used: RPG Fascimile
Opponent count: 1
---------------
RESULTS:
Network proficiency: 0.888
Move distribution:
attack: 0.000
defend: 0.000
special: 0.435
deflect: 0.000
heal: 0.565


Now, let's see what happens when we put a pre-programmed, strategic AI up against our neural network.<br>
This AI likes to attack and special when it has the advantage, deflect and heal when at a disadvantage, and also heal if it has the same health as any of its opponents.

In [14]:
opponents = [
    rpsnetworks.player_templates.Player("player1", 4,
                                        rpsnetworks.controller_templates.customStrategy1Controller("player1", "cool_strat", [0.2, 0.2, 0.2, 0.2, 0.2]))
]

train_out = rpsnetworks.training_manager.trainNetwork(
    15, 64, 20,
    [3, 3, 5], rpsnetworks.player_templates.playerAwareNetworkPlayer, 4,
    opponents, RPG_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 15
Children tested per generation: 64
Base # of test-games per child: 20
---------------
Schema used: RPG Fascimile
Opponent count: 1
---------------
RESULTS:
Network proficiency: 0.597
Move distribution:
attack: 0.207
defend: 0.025
special: 0.172
deflect: 0.000
heal: 0.596


The network can end up training itself in a wide variety of ways. However, it struggles to break a 65% win rate.

In [11]:
opponents = [
    rpsnetworks.player_templates.Player("player1", 4,
                                        rpsnetworks.controller_templates.customStrategy1Controller("player1", "cool_strat", [0.2, 0.2, 0.2, 0.2, 0.2])),
    rpsnetworks.player_templates.Player("player2", 4,
                                        rpsnetworks.controller_templates.customStrategy1Controller("player2", "cool_strat", [0.2, 0.2, 0.2, 0.2, 0.2]))
]

train_out = rpsnetworks.training_manager.trainNetwork(
    15, 64, 20,
    [4, 3, 5], rpsnetworks.player_templates.playerAwareNetworkPlayer, 4,
    opponents, RPG_SCHEMA, verbose=True
)

TRAINING PARAMETERS:
Generations: 15
Children tested per generation: 64
Base # of test-games per child: 20
---------------
Schema used: RPG Fascimile
Opponent count: 2
---------------
RESULTS:
Network proficiency: 0.916
Move distribution:
attack: 0.353
defend: 0.008
special: 0.015
deflect: 0.004
heal: 0.619


As it turns out, the neural network has a much easier time winning if it's up against 2 players. (win rates of >85%)<br>
One strategy is as follows: it sits out the fight for a bit by healing at the start, waiting to have a good advantage over the other two players, who are lowering each others' healths while preserving its own strength.<br>
Once it has the advantage, it attacks repeatedly - healing if necessary.