#### Continuous Training
The agent periodically updates its strategy based on recent gameplay experiences, thus adapting to changes in opponent strategies over time.  
An interesting article explaining the Q-Table algorithm is found in [this](https://towardsdatascience.com/math-of-q-learning-python-code-5dcbdc49b6f6).


In [3]:
from rps_game import *
from rps_agent import *

agent = QLearning()
set_agent(agent)

play(player, quincy, 1000)
play(player, mrugesh, 1000)
play(player, kris, 1000)
play(player, abbey, 1000)

Final results: {'p1': 561, 'p2': 47, 'tie': 392}
Player 1 win rate: 92.26973684210526%
Final results: {'p1': 830, 'p2': 166, 'tie': 4}
Player 1 win rate: 83.33333333333334%
Final results: {'p1': 497, 'p2': 46, 'tie': 457}
Player 1 win rate: 91.52854511970534%
Final results: {'p1': 467, 'p2': 280, 'tie': 253}
Player 1 win rate: 62.51673360107095%


62.51673360107095

#### Ensemble Learning
Multiple instances of the agent are trained with different opponents and their outcomes are combined through a majority voting system.

In [1]:
def ensemble_choose_action(agent_ensemble, state):
    # Voting mechanism: each agent votes, and the action with the majority wins
    votes = [agent.get_action(state) for agent in agent_ensemble]
    majority_vote = max(set(votes), key=votes.count)
    return majority_vote

def player(prev_play, opponent_history = []):
    global agent_ensemble, opponents

    for i, opponent in enumerate(opponents):

        if agent_ensemble[i].last_action is None or prev_play == '':
            action = random.choice(agent_ensemble[i].actions)
            agent_ensemble[i].last_action = action
            agent_ensemble[i].step += 1
            return action

        state = (agent_ensemble[i].last_action, prev_play) 

    return ensemble_choose_action(agent_ensemble, state)

In [41]:
from rps_game import *
from rps_agent_ens import *

opponents = [quincy, mrugesh, kris, abbey]
agent_ensemble = [QLearningAgent() for _ in range(len(opponents))]

for i, opponent in enumerate(opponents):
    agent_ensemble[i].train(opponent, num_episodes=6000) 

In [None]:
play(player, quincy, 1000)