# Deep Reinforcement Learning for Automated Testing

To explore automated application testing with deep reinforcement learning (DRL), two types of agents were trained using **PPO (Proximal Policy Optimization)** and **A2C (Advantage Actor-Critic)** algorithms in a custom "Bubble Game" environment. The reward function provided incentives for desired behaviours, such as: shooting (regardless of hitting a target), aligning with bubbles,
sucessfully popping bubbles. While penalizing undesirable actions such as colliding with bubbles, idling, wall-camping, and bubble drag. 

Two reward configurations were designed to operationalize distinct testing strategies. **Survivor mode** prioritizes conservative navigation and sustained episode length by imposing heavier penalties on risky actions, whereas **Speedrunner** mode incentivizes aggressive, high-throughput interaction via larger rewards for rapid bubble pops and frequent movement.

To ensure fair and reproducible comparisons, four models were trained in a virtual environment under an identical random seed, with matched environment settings and hyperparameters. Following training, a dedicated evaluation script recorded performance metrics, which were subsequently parsed into a CSV file for this analysis.

#### Imports + Cleaning

In [None]:
import pandas as pd

In [None]:
ppo_survivor_data = pd.read_csv('logs/ppo_survivor.csv')
ppo_speedrunner_data = pd.read_csv('logs/ppo_speedrunner.csv')
a2c_survivor_data = pd.read_csv('logs/a2c_survivor.csv')
a2c_speedrunner_data = pd.read_csv('logs/a2c_speedrunner.csv')


    episode       reward  shots  pops  deaths  frames_alive  wall_ratio  \
0         1  2072.563546      8     1       1          1703    0.560188   
1         2    32.009345      0     0       1           109    0.000000   
2         3    61.959087      0     0       1           116    0.931034   
3         4  1039.152390      1     0       1          1008    0.690476   
4         5   699.644869      2     1       1           552    0.588768   
5         6   300.398562      0     0       1           354    0.471751   
6         7  2383.205031      4     1       0          2000    0.606500   
7         8   948.324027      1     1       1           726    0.648760   
8         9   877.317851      0     0       1           833    0.786315   
9        10  2000.709654      0     0       1          1693    0.943296   
10       11   -30.283749      0     0       1            50    0.000000   
11       12    -7.983874      0     0       1            60    0.000000   
12       13  1269.225799 

### Bubble Game: PPO vs A2C