### Documentación

Problemas interesantes para Aprendizaje por refuerzo
 * Gymnasium: https://gymnasium.farama.org/environments/box2d/

## Instalación

!pip install gymnasium  
!pip install gymnasium[box2d] 

## Acciones adicionales

### En macos

pip uninstall swig  
xcode-select -—install (si no se tienen ya)  
pip install swig  / sudo port install swig-python
pip install 'gymnasium[box2d]' # en zsh hay que poner las comillas  

### en Windows

Si da error, se debe a la falta de la versión correcta de Microsoft Visual C++ Build Tools, que es una dependencia de Box2D. Para solucionar este problema, puede seguir los siguientes pasos:  
 * Descargar Microsoft Visual C++ Build Tools desde https://visualstudio.microsoft.com/visual-cpp-build-tools/.
 * Dentro de la app, seleccione la opción "Herramientas de compilación de C++" para instalar.
 * Reinicie su sesión en Jupyter Notebook.
 * Ejecute nuevamente el comando !pip install gymnasium[box2d] en la línea de comandos de su notebook.

In [None]:
# prueba lunar lander por humano

import gymnasium as gym

env = gym.make("LunarLander-v2", render_mode="rgb_array")

import numpy as np
import pygame
import gymnasium.utils.play

lunar_lander_keys = {
    (pygame.K_UP,): 2,
    (pygame.K_LEFT,): 1,
    (pygame.K_RIGHT,): 3,
}
gymnasium.utils.play.play(env, zoom=3, keys_to_action=lunar_lander_keys, noop=0)

### Genetic Algorithm

In [None]:
from GA import Genetic_Algorithm

GA = Genetic_Algorithm(N=100, pcross=0.3, pmut=0.6, sigma=0.1, n_tour=20, n_iter=500, num_exp=30, bounds=[-2, 2])
chromosome, fitness = GA.evolve()

### HP Tunning

In [10]:
N = [100]
pcross= [0.3, 0.4, 0.5]
pmut = [0.5, 0.6, 0.7]
n_iter= [150]
n_tour = [15, 20, 25]
sigma = [0.1]
num_exp = [25]
bounds = [[-2, 2]]

In [None]:
#Probar después
import math
N = [100]
pcross= []
pmut = []
n_iter= [150]
n_tour = []
sigma = [0.1, 0.05, 0.15]
num_exp = [25]
bounds = [[-2, 2], [-1, 1], [-math.pi, math.pi]]

In [None]:
#! Probar a disminuir sigma a medida que avanza el experimento

In [11]:
from GA import Genetic_Algorithm

def run_experiment(combination, num_experiments=5):
    total_fitness = 0
    for _ in range(num_experiments):
        GA = Genetic_Algorithm(*combination, verbose=False)
        chromosome, fitness = GA.evolve()
        
        total_fitness += fitness

    total_fitness = total_fitness / num_experiments
    
    return total_fitness, combination, chromosome

In [12]:
import itertools
from concurrent.futures import ProcessPoolExecutor, as_completed

# Define el número de veces que cada configuración será ejecutada
num_experiments = 4


# Crear todas las combinaciones posibles de hiperparámetros
all_combinations = list(itertools.product(
    N, pcross, pmut, sigma, n_tour, n_iter, num_exp, bounds
))

# Almacenar los resultados
results = []

# Paralelizar la ejecución
with ProcessPoolExecutor() as executor:
    # Iniciar las ejecuciones en paralelo
    future_to_combination = {executor.submit(run_experiment, combination, num_experiments): combination for combination in all_combinations}

    total_combinations = len(all_combinations)
    completed_combinations = 0
    
    for future in as_completed(future_to_combination):
        total_fitness, combination, chromosome = future.result()
        results.append((total_fitness, combination, chromosome))

        completed_combinations += 1
        print(f"Combinación completada. Progreso: {completed_combinations}/{total_combinations}")
        print(f"Combinación de parámetros: {combination}")
        print(f"Fitness medio: {total_fitness}")
        print(f"Mejor individuo: {chromosome}")

Combinación completada. Progreso: 1/27
Combinación de parámetros: (100, 0.4, 0.5, 0.1, 20, 150, 25, [-2, 2])
Fitness medio: 76.53383628385181
Mejor individuo: [3.1494882533308495, -1.4050634291021904, 1.5041180757569572, -1.8810074580620455, -1.0071227082053478, 3.295899611478477, -1.9550939209028948, 0.4395840350924578, 1.1933436459912723, -4.715148867351629, -1.7201342779184585, 1.7406290018044748, 1.7841047528046619, 3.0653608492859195, 0.4087348302556544, -0.9304935249269837, -0.2912850614123682, 1.9406015771173262, -0.38083952395436066, -4.362905516895165, 2.9964800697387384, 3.5012735607807577, 0.4051876322286954, -2.4640779564657533, 2.133296814895779, 1.412541883602255, 2.8353733112649535, -0.4011666716082073, 5.536459089401663, -0.6165347659501073, -3.3923160978448794, -2.9385364552150426, -1.75330356026792, -0.9764480948699813, 4.196493601548106, 0.16443592286583703, -4.057246807266355, -1.4371511036653235, 0.3212877932165328, 2.6523338655762054, -0.9611469932821571, 1.572484

### Test best result of GA

In [None]:
chromosome = [-5.1557750797652355, -7.980841742847936, -13.163908631368175, -3.7457309819000195, -8.18898413840871, -4.872006813596256, 13.991734931310354, -8.356878946457561, 8.38089353366159, 6.078935817191149, -9.487176681508071, 3.029574157970837, -13.820013892079945, -10.006371949751829, -11.216150583249334, -0.6590412635596098, -0.8315393670344533, -4.116855203639449, 3.1647444175241475, -0.8735228934970257, 2.5629252115160215, 7.931768054124031, -9.933120701796021, 6.2203459228961275, -1.6504193744167428, -3.3407521515035468, -3.8958760575609817, -0.11988148170118623, 0.6746965115662906, 3.3678316885261985, 10.315084749184617, 6.723320494495934, -6.017050974048589, 0.26124708130640284, 14.367420733360625, 7.739783436607821, 0.45398462638793796, -1.2797265398008815, -4.310356822602239, -0.7057149276846835, 7.905378315256625, -9.877879356120989, 15.91713636133181, 5.386944564550635, -3.462766914836761, 7.705603019251466, -4.952930250219165, 4.2987471689882355, 1.649741701143725, -4.046242351659431, -1.3619661766784603, 2.6486541165533617, 8.671259406147879, 12.927455987073701, -6.601154262362282, -11.139166678250085, 1.1503658860775783, -13.279556849970124, 4.845557595614942, -6.790825056611087, 11.608359907566719, 11.244917721387866, -5.061570439374675, -9.437455030858743, -1.932178536368972, -4.16771989851538, -4.9936099976462565, -1.8068451025788548, 3.8966980452321742, 6.816712963071394, 1.991974541804312, 13.400839064918687, 4.6136942954593865, 7.2389641896429175, 4.285791677477841, 7.521949330603041, 4.039725210851329, -6.1825100995238405, 7.1642549690913935, 4.779652342278313, -3.612127037879377, -6.644734917508778]

model = MLP([8, 6, 4])

model.from_chromosome(chromosome)

# definir política
def policy (observation):
    s = model.forward(observation)
    action = np.argmax(s)
    return action

# prueba lunar lander por agente

import gymnasium as gym

env = gym.make("LunarLander-v2", render_mode="human")

def run ():
    #observation, info = env.reset(seed=42)
    observation, info = env.reset()
    ite = 0
    racum = 0
    while True:
        action = policy(observation)
        observation, reward, terminated, truncated, info = env.step(action)
        
        racum += reward

        if terminated or truncated:
            r = (racum+200) / 500
            print(racum, r)
            return racum

In [None]:
while True:
    run()

#### ¿No has tenido bastante?

Prueba a controlar el flappy bird https://github.com/markub3327/flappy-bird-gymnasium

pip install flappy-bird-gymnasium

import flappy_bird_gymnasium  
env = gym.make("FlappyBird-v0")

Estado (12 variables):
  * the last pipe's horizontal position
  * the last top pipe's vertical position
  * the last bottom pipe's vertical position
  * the next pipe's horizontal position
  * the next top pipe's vertical position
  * he next bottom pipe's vertical position
  * the next next pipe's horizontal position
  * the next next top pipe's vertical position
  * the next next bottom pipe's vertical position
  * player's vertical position
  * player's vertical velocity
  * player's rotation

  Acciones:
  * 0 -> no hacer nada
  * 1 -> volar

In [None]:
len(MLP([8, 6, 4]).to_chromosome())