New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling the gym-env by name #73
Comments
Hey @lolanchen ! Thanks for opening this issue. Revamping the gym API is next on the project todo list, after doubles support (#71). I'll take a deeper look at RLlib's api and examples later today and let you know if I think of a workaround. |
Thanks for the prompt reply. |
I was able to run this adapted version of the import asyncio
import numpy as np
import ray
import ray.rllib.agents.ppo as ppo
from asyncio import ensure_future, new_event_loop, set_event_loop
from gym.spaces import Box, Discrete
from poke_env.player.env_player import Gen8EnvSinglePlayer
from poke_env.player.random_player import RandomPlayer
class SimpleRLPlayer(Gen8EnvSinglePlayer):
def __init__(self, *args, **kwargs):
Gen8EnvSinglePlayer.__init__(self)
self.observation_space = Box(low=-10, high=10, shape=(10,))
@property
def action_space(self):
return Discrete(22)
def embed_battle(self, battle):
# -1 indicates that the move does not have a base power
# or is not available
moves_base_power = -np.ones(4)
moves_dmg_multiplier = np.ones(4)
for i, move in enumerate(battle.available_moves):
moves_base_power[i] = (
move.base_power / 100
) # Simple rescaling to facilitate learning
if move.type:
moves_dmg_multiplier[i] = move.type.damage_multiplier(
battle.opponent_active_pokemon.type_1,
battle.opponent_active_pokemon.type_2,
)
# We count how many pokemons have not fainted in each team
remaining_mon_team = (
len([mon for mon in battle.team.values() if mon.fainted]) / 6
)
remaining_mon_opponent = (
len([mon for mon in battle.opponent_team.values() if mon.fainted]) / 6
)
# Final vector with 10 components
return np.concatenate(
[
moves_base_power,
moves_dmg_multiplier,
[remaining_mon_team, remaining_mon_opponent],
]
)
def compute_reward(self, battle) -> float:
return self.reward_computing_helper(
battle, fainted_value=2, hp_value=1, victory_value=30
)
def observation_space(self):
return np.array
ray.init()
config = ppo.DEFAULT_CONFIG.copy()
config["num_gpus"] = 0
config["num_workers"] = 0 # Training will not work with poke-env if this value != 0
config["framework"] = "tfe"
trainer = ppo.PPOTrainer(config=config, env=SimpleRLPlayer)
def ray_training_function(player):
for i in range(2):
result = trainer.train()
print(result)
player.complete_current_battle()
env_player = trainer.workers.local_worker().env
opponent = RandomPlayer()
env_player.play_against(
env_algorithm=ray_training_function,
opponent=opponent,
) Let me know if this workaround works for you! Regardless, please do not close this issue, as I would like to come back to it once #71 is done - there is a lot of work to do to make this kind of experiments easier to run :) |
Hey! Your code does run on my server and by modifying the range in
which is really hard to tell whether the agent is learning or not.
and the
and the evaluation result against RandomPlayer is The reward does seems to increase steadily but the result against random is much worse than your example based on keras-rl. |
Hey @lolanchen, Thanks for keeping me posted! Edit: regarding reward values, you can customize |
Yes, just changing Thanks a lot for all the advice! I'll try to play more with the parameters tomorrow. |
Hey @lolanchen, Did you manage to make it work with other RL libraries? |
@mancho1987 |
@lolanchen thanks for the fast reply :) I am just starting in RL so my knowledge is quite limited. I found acme from deepmind, and was wondering if it could be used here, but after checking RLlib seems to me that there is no need for acme. I will try it with RLlib as mentioned in the posts above to see if I can make it work :) btw, what features are you selecting? I read about embeddings, are they useful for things like moves? |
Hi all, I was able to run the Stable_Baselines (https://stable-baselines.readthedocs.io/en/master/index.html) examples with some small changes:
What I am trying to do now is to train an agent and use it to play against humans on the shadow server. I am facing some problems when starting the game, but I am still not sure how to use a loaded agent to select moves. I am thinking of loading the saved model inside the SimpleRLPlayer and use it to select a move, using the "choose_move" and "embed_battle" functions. I am just having a problem with how to use the predicted actions (one from 22 values) and return the string choice the method needs to return. Anyone ever tried something like that? Cheers, Pablo |
A quick updated, I checked the closed issues and I got my answer there, I would implement the choose_move like that: def choose_move(self, battle):
# If the player can attack, it will7
observations = self.embed_battle(battle)
action = self.model.predict(observations)
return self._action_to_move(action, battle) in case anyone is looking for it :) |
@pablovin I assumed that you mentioned this: #119 (comment) |
@mnguyen0226 |
@hsahovic Hmmmm I see I see. So I approach training with Stable Baseline. What I am trying to do is to having 2 trained model fight each other on 2 different laptop/accounts. I was able to do this with MaxDamage vs Random agents. My plan now is to have TrainedRLAgent (with DQN) vs Random agent. I have trained the simpleRL DQN with Stable Baseline provided above with good evaluation score. The Random Agent was able to automatically take move but the TrainedRLAgent I got the trained model was not able to automatically select action. Here is my script for running the trainedRLPlayer
^ did I missed anything to get the loaded & trained DQN agent to automatically take action? |
Is any error raised? If not, can you set: simplerl_player = TrainedRLPlayer(
player_configuration=PlayerConfiguration("bot_1_account", "bot_1_pw"),
server_configuration=ShowdownServerConfiguration,
log_level=20,
) and see where the logs stop? |
There is no error raised. Set up and play 1 game (I still have to manually get choose the action instead of RL agent manually choose the action). There is no log provided. Oddly, on the terminal it stated that the bot that I send the challenge to was not founded although I invite one to challege the other and get them to be in the same fight. |
Can you open a separate issue, with the logs (eg. the output on the terminal)? |
@hsahovic I was able to load and has the agent automatically pick action now. Figured out the issue with "action" variable return a tuple instead of int. Thank you for the help! |
This should be fixed by upgrading to |
Hi Haris! First of all thank you for putting in the effort of making poke-env.
I run your
rl_with_open_ai_gym_wrapper.py
and tried a bunch of other RL algorithms in keras-rl2 on theplay_against
method and they worked just fine.Then naturally I would like to get poke-env working on other newer and better maintained RL libraries than keras-rl2.
I tried to get RLlib working with poke-env, specifically with the
plain_against
method but couldn't get it to work.RLlib's training flow goes like this (code copied from RLlib's doc )
where the whole gym-env class is passed to the trainer object.
3 days of try and error concludes that there is no workaround between this syntax and the
play_against
method in poke-env.I wonder if it's possible to wrap poke-env into a registered gym-env and make it callable by its name,
like
env gym.make('poke_env-v0')
?The text was updated successfully, but these errors were encountered: