<a href="https://colab.research.google.com/github/spicecat/Haxballers/blob/main/Haxballers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Haxballers


## Summary

Our project idea is to develop and train a multi-agent system to play the game of Haxball, a simplified soccer simulation game. The behaviors we want the agents to learn is to score goals, defend against goals, and pass to teammates. Agents will use the current game state as input, including the positions and velocities of all players and the ball, and output a cardinal direction to move in or a kick action.

## Project Goals

- Minimum Goal: Develop an agent that can move to the ball.
- Realistic Goal: Develop an agent that can score and defend goals.
- Moonshot Goal: Develop an agent that can coordinate with teammates by passing the ball.

## Algorithms

We anticipate using model-free on-policy multi-agent reinforcement learning to train the Haxball agents. The training environment will progress through the following stages: 1v0, 1v1, 2v1, 2v2, and 3v3.

## Evaluation Plan

For quantitative evaluation, we will have agents at different levels of training compete in an Elo rating system. Some metrics we may measure are the number of games won, the number of goals scored, the number of goals defended, and pass frequency. As a baseline approach, we will train agents to move behind the ball and kick it towards the opposing goal. We estimate that successful training will improve the win rate metric by 90%.

For qualitative analysis, we will use the 1v0 training environment for sanity checks. We will visualize the results externally by reviewing game replays. A successful result is expected to display agents moving efficiently, kicking the ball towards the opposing goal, and passing the ball towards open teammates.

## AI Usage

AI was used for coding assistance.


## Setup

https://github.com/HaxballGym/HaxballGym
https://github.com/HaxballGym/Ursinaxball

In [1]:
# @title Set up virtual display
%pip install -q screeninfo
import os
os.environ['DISPLAY'] = ':99'
!Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &

In [2]:
# @title Install HaxballGym
!git clone --recursive https://github.com/spicecat/HaxballGym.git
%cd HaxballGym/
%pip install -e .
%cd ..
!mkdir -p /content/recordings

fatal: destination path 'HaxballGym' already exists and is not an empty directory.
/content/HaxballGym
Obtaining file:///content/HaxballGym
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Processing ./packages/ursinaxball (from haxballgym==0.6.0)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: haxballgym, ursinaxball
  Building editable for haxballgym (pyproject.toml) ... [?25l[?25hdone
  Created wheel for haxballgym: filename=haxballgym-0.6.0-py3-none-any.whl size=1617 sha256=c3c204ad18aa632b83dd36ceb42a6120bb38d7026fb0db9f85eb7d15835da8eb
  Stored in directory: /tmp/pip-ephem-wheel-cache-p9etssxb/wheels/30/d2/e1/ee5da

## Training

In [3]:
import time
from ursinaxball import Game
import haxballgym

game = Game(
    folder_rec="/content/recordings/",
    enable_renderer=False,
    enable_vsync=True,
)
env = haxballgym.make(game=game)

ep_reward = 0

while True:
    save_rec = False
    if abs(ep_reward) > 1:
        save_rec = True
    obs, info = env.reset(options={"save_recording": save_rec})
    obs_1 = obs[0]
    obs_2 = obs[1]
    done = False
    steps = 0
    ep_reward = 0
    t0 = time.time()
    while not done:
        actions_1 = env.action_space.sample()
        actions_2 = env.action_space.sample()
        actions = [actions_1, actions_2]
        new_obs, reward, terminated, truncated, info = env.step(actions)
        done = terminated or truncated
        ep_reward += reward[0]
        obs_1 = new_obs[0]
        obs_2 = new_obs[1]
        steps += 1

    length = time.time() - t0
    print(
        "Step time: {:1.5f} | Episode time: {:.2f} | Episode Reward: {:.2f}".format(
            length / steps, length, ep_reward
        )
    )

Step time: 0.01559 | Episode time: 3.74 | Episode Reward: 1.80
Step time: 0.01888 | Episode time: 4.53 | Episode Reward: 0.10
Step time: 0.01320 | Episode time: 3.17 | Episode Reward: 0.00
Step time: 0.01683 | Episode time: 1.28 | Episode Reward: 1.10
Step time: 0.01293 | Episode time: 3.10 | Episode Reward: 0.00
Step time: 0.01616 | Episode time: 3.88 | Episode Reward: 0.00
Step time: 0.01488 | Episode time: 3.57 | Episode Reward: 0.00
Step time: 0.01302 | Episode time: 3.12 | Episode Reward: 0.00
Step time: 0.01343 | Episode time: 3.22 | Episode Reward: 0.00
Step time: 0.01904 | Episode time: 4.57 | Episode Reward: 0.00
Step time: 0.01346 | Episode time: 3.23 | Episode Reward: 0.00
Step time: 0.01361 | Episode time: 3.27 | Episode Reward: 0.00
Step time: 0.01728 | Episode time: 4.15 | Episode Reward: 0.10
Step time: 0.01649 | Episode time: 3.96 | Episode Reward: 0.00
Step time: 0.01308 | Episode time: 3.14 | Episode Reward: 0.00
Step time: 0.01300 | Episode time: 3.12 | Episode Rewar

KeyboardInterrupt: 

In [4]:
# @title View Replays
from IPython.display import IFrame

IFrame("https://wazarr94.github.io/", "100%", "720")