In [1]:
import gymnasium as gym
import support_modules as sm

# Acrobot

## Description
<div style="text-align: justify">    
The Acrobot environment is based on Sutton’s work in “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding” and Sutton and Barto’s book. The system consists of two links connected linearly to form a chain, with one end of the chain fixed. The joint between the two links is actuated. The goal is to apply torques on the actuated joint to swing the free end of the linear chain above a given height while starting from the initial state of hanging downwards.

As seen in the Gif: two blue links connected by two green joints. The joint in between the two links is actuated. The goal is to swing the free end of the outer-link to reach the target height (black horizontal line above system) by applying torque on the actuator.
</div>

https://gymnasium.farama.org/environments/classic_control/acrobot/

## Random policy

### Single episode

In [2]:
env = gym.make('Acrobot-v1', render_mode='human')
state, _ = env.reset()
done = False

while not done:
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    env.render()
    
    if terminated or truncated:
        done = True

env.close()

### 1000 episodes

In [3]:
env = gym.make('Acrobot-v1',render_mode=None)

rewards = list()

for episode in range(1000):
    state, _ = env.reset()
    ep_reward = 0
    done = False

    while not done:
        action = env.action_space.sample()
        observation, reward, terminated, truncated, info = env.step(action)
        ep_reward += reward
        
        if terminated or truncated:
            done = True
    
    rewards.append(ep_reward)

env.close()
print(f'Average reward: {sum(rewards)/len(rewards)}')
env.close()

Average reward: -498.525


## Q-Learning

In [4]:
env = gym.make('Acrobot-v1',render_mode=None)

disccrete_state_space,state_intervals = sm.generate_discrete_states(20,env)

In [5]:
disccrete_state_space

[20, 20, 20, 20, 20, 20]

In [6]:
state_intervals

array([0.1      , 0.1      , 0.1      , 0.1      , 1.2566371, 2.8274334])