# Optimizing Reinforcement Algorithms & Comparing Performances with Street Fighter II: Special Champion Edition

*by Josiah Hegarty, Everett Lewark, and Austin Youngren*

## Introduction

## Methods

### Designing Reward Functions

During each time step, the game environment will return a dictionary containing information about the current game state. The particular variables exposed to the program vary by game. Within the default Street Fighter II environment, the following variables are exposed:

- player score
- player health
- enemy health
- player round wins
- enemy round wins
- the countdown timer

Our initial reward function was fairly simple: reward the model when it does damage to the opponent, and penalize it when it takes damage. However, this simpler reward function has a problem. How, exactly, does a player deal damage? There are some ranged attacks in the game, but for the most part, dealing damage is more complex than just making a single action. In the case of melee attacks, a player must first walk toward the opponent before they can deal damage, and this is a whole task unto itself.

Reinforcement learning models [tend to perform better](https://www.youtube.com/watch?v=IdJL9rcQrFU) when they are given some sort of continuous function they can try to optimize, but in the case of damage our rewards are much more sparse. The model would have to stumble into the other player and then happen to make a move that deals damage. To make matters worse, this would need to occur enough times that the model could learn a pattern from it, including both the ability to visually recognize where the player is relative to the enemy, as well the necessary actions to move it closer. This seems like a tall order.

As a result, Everett tried introducing another variable into the reward function that rewarded the player for moving closer to the opponent. In order to do this, the X and Y coordinates of both the player and the enemy must be exposed from the environment so that they can be integrated into the reward function. The stable-retro library provides [documentation](https://stable-retro.farama.org/integration/) on the process of integrating a new game environment, and the process of modifying an existing integration is similar. The process involved compiling and running a specialized integration UI, which looked like this:

![stable-retro integration UI](images/integration-ui.png)

Using this interface, which functions similarly to other memory-inspection tools like Cheat Engine, Everett located variables within the game's RAM using an iterative process. For instance, to locate the player X coordinate, a search was performed for variables that were marked as unchanged. He then moved the player to the right and narrowed the current set of variables by searching for ones that increased. By following steps like these repeatedly, the console's entire RAM was gradually reduced to a few candidate memory locations, which were manually checked using the automatically-updating table in the sidebar. The same strategy was then used to locate the memory locations for the player Y coordinate and the enemy coordinates.

## Results

## Conclusions

# Word Count

In [3]:
import io
from nbformat import current
import glob
nbfile = glob.glob('report.ipynb')
if len(nbfile) > 1:
    print('More than one ipynb file. Using the first one.  nbfile=', nbfile)
with io.open(nbfile[0], 'r', encoding='utf-8') as f:
    nb = current.read(f, 'json')
word_count = 0
for cell in nb.worksheets[0].cells:
    if cell.cell_type == "markdown":
        word_count += len(cell['source'].replace('#', '').lstrip().split(' '))
print('Word count for file', nbfile[0], 'is', word_count)

Word count for file report.ipynb is 475
