# CSPB 3202 Final Project #

Tyler Kinkade, jaki9292@colorado.edu

GitHub: [https://github.com/jaki9292/rl-project](https://github.com/jaki9292/rl-project)

## Overview ##

This write-up reports on a small project to train and test a reinforcement learning algorithm (Russell & Norvig, 2022; Sutton & Barto, 2018) with the Gymnasium (2022) Python software package. 

This report is divided into the following sections: approach, results, discussion, and suggestions for future research.

## Approach ##

This section is divided into the following subsections: environment and game rules, models, methods and purpose, and problem solving procedure.

### Environment and Game Rules ###

Does it explain how the environment works and what the game rules are?


In [1]:
# Set up and display random agent in lunar lander environment
# Adapted from: https://gymnasium.farama.org/
# References: 
# https://gymnasium.farama.org/environments/box2d/lunar_lander/
# https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py

# Install dependencies
# pip install gymnasium
# pip install gymnasium[box2d]

import gymnasium as gym

# Initialize environment
env = gym.make("LunarLander-v2", 
               continuous = False,     # Discrete version
               gravity = -10.0, 
               enable_wind = False, 
               wind_power = 0.0, 
               turbulence_power = 0.0, 
               render_mode="human")    # Render for humans

# Reset environment with random number generator seed for reproducibility
observation, info = env.reset(seed = 21)

# Accumulator variables
reward_total = 0.0
reward_totals = []
episode = 0

# Attempt for 200 timesteps
for step_index in range(500):

    # Get random action from action space
    action = env.action_space.sample()

    # Obtain observation, reward, terminated status, truncated status, 
    # and environment info for given action
    observation, reward, terminated, truncated, info = env.step(action)

    # Accumulate reward total 
    reward_total += reward

    # If episode ends, restart
    if terminated:
        # Append total to list of reward totals
        reward_totals.append(reward_total)

        # Report result
        print(f"Episode {episode} total rewards: {reward_total}")

        # Increment epsiode count
        episode += 1

        # Reset reward total
        reward_total = 0
        
        # Start new episode 
        observation, info = env.reset()

env.close()


Episode 0 total rewards: -131.77339963398612
Episode 1 total rewards: -392.1633176088692
Episode 2 total rewards: -532.852776846538
Episode 3 total rewards: -680.4815247340057
Episode 4 total rewards: -933.3632637277645


### Models ###

Does it explain clearly the model(s) of choices, the methods and purpose of tests and experiments?

Approximate reinforcement learning (Russell & Norvig, 2022; Sutton & Barto, 2018)



### Methods and Purpose ###

methods and purpose of tests and experiments





### Problem Solving Procedure ###

Does it show problem solving procedure- e.g. how the author solved and improved when an algorithm doesn't work well. Note that it's not about debugging or programming/implementation, but about when a correctly implemented algorithm wasn't enough for the problem and the author had to modify/add some features or techniques, or compare with another model, etc.

## Results ## 

show the result and interpretation of your experiment. Any iterative improvements summary.

demo clips

Does it include the results summary, interpretation of experiments and visualization (e.g. performance comparison table, graphs etc)?

## Discussion ## 

Does it include discussion (what went well or not and why), and suggestions for improvements or future work?

## Suggestions for Future Research ##




## References ##

Gymnasium. (2022). _Gymnasium documentation._ Farama Foundation. [https://gymnasium.farama.org/](https://gymnasium.farama.org/)

Russell, S., & Norvig, R. (2022). Artificial intelligence: A modern approach, (4th ed.). Pearson. 

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.