### Documentación

Problemas interesantes para Aprendizaje por refuerzo
 * Gymnasium: https://gymnasium.farama.org/environments/box2d/

## Instalación

!pip install gymnasium  
!pip install gymnasium[box2d] 

## Acciones adicionales

### En macos

pip uninstall swig  
xcode-select -—install (si no se tienen ya)  
pip install swig  / sudo port install swig-python
pip install 'gymnasium[box2d]' # en zsh hay que poner las comillas  

### en Windows

Si da error, se debe a la falta de la versión correcta de Microsoft Visual C++ Build Tools, que es una dependencia de Box2D. Para solucionar este problema, puede seguir los siguientes pasos:  
 * Descargar Microsoft Visual C++ Build Tools desde https://visualstudio.microsoft.com/visual-cpp-build-tools/.
 * Dentro de la app, seleccione la opción "Herramientas de compilación de C++" para instalar.
 * Reinicie su sesión en Jupyter Notebook.
 * Ejecute nuevamente el comando !pip install gymnasium[box2d] en la línea de comandos de su notebook.

In [1]:
import gymnasium as gym
import gymnasium.utils.play
import numpy as np
import pygame

from MLP import MLP

## **Human play**

In [13]:
# prueba lunar lander por humano
env = gym.make("LunarLander-v3", render_mode="rgb_array")

lunar_lander_keys = {
    (pygame.K_UP,): 2,
    (pygame.K_LEFT,): 1,
    (pygame.K_RIGHT,): 3,
}
gymnasium.utils.play.play(env, zoom=1.5, keys_to_action=lunar_lander_keys, noop=0)

## **Auto-play**

In [14]:
env = gym.make("LunarLander-v3", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()
env.close()

----

## **Agent play**

In [2]:
# construir modelo
model = MLP(layers=(8,16,4))
ch = model.to_chromosome()
model.from_chromosome(ch)

# pasar al modelo los pesos del mejor cromosoma obtenido con neuroevolución

# definir política
def policy(observation):
    s = model.forward(observation)
    action = np.argmax(s)
    return action

In [3]:
# prueba lunar lander por agente
env = gym.make("LunarLander-v3", render_mode="human")

def run ():
    #observation, info = env.reset(seed=42)
    observation, info = env.reset()
    ite = 0
    racum = 0
    while True:
        action = policy(observation)
        observation, reward, terminated, truncated, info = env.step(action)
        
        racum += reward

        if terminated or truncated:
            r = (racum+200) / 500
            print("racum:", racum)
            print("reward:", r)
            return racum
run()

racum: -183.26768556205383
reward: 0.03346462887589235


np.float64(-183.26768556205383)

In [4]:
N = 10
r = 0
for _ in range(N):
    r += run()
    
print('Refuerzo medio', r/N)

racum: -156.09736370590616
reward: 0.08780527258818768
racum: -185.63222638293763
reward: 0.028735547234124737
racum: -142.63723876920005
reward: 0.1147255224615999
racum: -107.58233224627372
reward: 0.18483533550745254
racum: -106.8148285502296
reward: 0.1863703428995408
racum: -140.14049188636073
reward: 0.11971901622727853
racum: -89.3440127365786
reward: 0.2213119745268428
racum: -13.946091151867904
reward: 0.37210781769626416
racum: -91.77715274461056
reward: 0.21644569451077889
racum: -158.52779515534505
reward: 0.08294440968930991
Refuerzo medio -119.249953332931


#### ¿No has tenido bastante?

Prueba a controlar el flappy bird https://github.com/markub3327/flappy-bird-gymnasium

pip install flappy-bird-gymnasium

import flappy_bird_gymnasium  
env = gym.make("FlappyBird-v0")

Estado (12 variables):
  * the last pipe's horizontal position
  * the last top pipe's vertical position
  * the last bottom pipe's vertical position
  * the next pipe's horizontal position
  * the next top pipe's vertical position
  * he next bottom pipe's vertical position
  * the next next pipe's horizontal position
  * the next next top pipe's vertical position
  * the next next bottom pipe's vertical position
  * player's vertical position
  * player's vertical velocity
  * player's rotation

  Acciones:
  * 0 -> no hacer nada
  * 1 -> volar