New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sending or accepting challenges gives an error about coroutines #43
Comments
Hi @Gummygamer ! Thanks for opening this issue. I do not think that we have a |
I just defined that function at the RL test as a placeholder for sending or accepting challenges, it's just this: async def final_tests(): if I change to accepting challenges, I get the same issue. I can send the whole code for further inspection, but it's almost identical to the RL example at the documentation. |
Thanks, I'll take a look at it! |
Hi! I took a look at the code you sent: the problem is that you are trying to send challenges with an agent using the open AI API. This API is only meant for training models, and currently does not support this use-case directly - it is planned for later but requires additional work. You can work around that by defining an agent similar to this one: class EmbeddedRLPlayer(Player):
def choose_move(self, battle):
if np.random.rand() < 0.01: # avoids infinite loops
return self.choose_random_move(battle)
embedding = SimpleRLPlayer.embed_battle(self, battle)
action = self.dqn.forward(embedding)
return SimpleRLPlayer._action_to_move(self, action, battle) Once instantiated, you can use this agent to challenge a human with |
It still happens the same way after embedding. |
@Gummygamer that's unexpected; I'll post a full version later today |
I confirm that I was able to battle the dqn agent using the embedded player; here is the exact code (up to the username) that I used: # -*- coding: utf-8 -*-
import numpy as np
import tensorflow as tf
from poke_env.player_configuration import PlayerConfiguration
from poke_env.player.env_player import Gen8EnvSinglePlayer
from poke_env.player.random_player import RandomPlayer
from poke_env.player.player import Player
from poke_env.server_configuration import LocalhostServerConfiguration
from rl.agents.dqn import DQNAgent
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
import asyncio
# We define our RL player
# It needs a state embedder and a reward computer, hence these two methods
class SimpleRLPlayer(Gen8EnvSinglePlayer):
def embed_battle(self, battle):
# -1 indicates that the move does not have a base power
# or is not available
moves_base_power = -np.ones(4)
moves_dmg_multiplier = np.ones(4)
for i, move in enumerate(battle.available_moves):
moves_base_power[i] = (
move.base_power / 100
) # Simple rescaling to facilitate learning
if move.type:
moves_dmg_multiplier[i] = move.type.damage_multiplier(
battle.opponent_active_pokemon.type_1,
battle.opponent_active_pokemon.type_2,
)
# We count how many pokemons have not fainted in each team
remaining_mon_team = (
len([mon for mon in battle.team.values() if mon.fainted]) / 6
)
remaining_mon_opponent = (
len([mon for mon in battle.opponent_team.values() if mon.fainted]) / 6
)
# Final vector with 10 components
return np.concatenate(
[
moves_base_power,
moves_dmg_multiplier,
[remaining_mon_team, remaining_mon_opponent],
]
)
def compute_reward(self, battle) -> float:
return self.reward_computing_helper(
battle, fainted_value=2, hp_value=1, victory_value=30
)
class MaxDamagePlayer(RandomPlayer):
def choose_move(self, battle):
# If the player can attack, it will
if battle.available_moves:
# Finds the best move among available ones
best_move = max(battle.available_moves, key=lambda move: move.base_power)
return self.create_order(best_move)
# If no attack is available, a random switch will be made
else:
return self.choose_random_move(battle)
NB_TRAINING_STEPS = 1
NB_EVALUATION_EPISODES = 1
tf.random.set_seed(0)
np.random.seed(0)
# This is the function that will be used to train the dqn
def dqn_training(player, dqn, nb_steps):
dqn.fit(player, nb_steps=nb_steps)
player.complete_current_battle()
def dqn_evaluation(player, dqn, nb_episodes):
# Reset battle statistics
player.reset_battles()
dqn.test(player, nb_episodes=nb_episodes, visualize=False, verbose=False)
print(
"DQN Evaluation: %d victories out of %d episodes"
% (player.n_won_battles, nb_episodes)
)
async def final_tests():
await emb_player.send_challenges('username',100)
if __name__ == "__main__":
env_player = SimpleRLPlayer(
player_configuration=PlayerConfiguration("RL Player", None),
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
)
opponent = RandomPlayer(
player_configuration=PlayerConfiguration("Random player", None),
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
)
second_opponent = MaxDamagePlayer(
player_configuration=PlayerConfiguration("Max damage player", None),
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
)
# Output dimension
n_action = len(env_player.action_space)
model = Sequential()
model.add(Dense(128, activation="elu", input_shape=(1, 10)))
# Our embedding have shape (1, 10), which affects our hidden layer
# dimension and output dimension
# Flattening resolve potential issues that would arise otherwise
model.add(Flatten())
model.add(Dense(64, activation="elu"))
model.add(Dense(n_action, activation="linear"))
memory = SequentialMemory(limit=10000, window_length=1)
# Ssimple epsilon greedy
policy = LinearAnnealedPolicy(
EpsGreedyQPolicy(),
attr="eps",
value_max=1.0,
value_min=0.05,
value_test=0,
nb_steps=10000,
)
# Defining our DQN
dqn = DQNAgent(
model=model,
nb_actions=len(env_player.action_space),
policy=policy,
memory=memory,
nb_steps_warmup=1000,
gamma=0.5,
target_model_update=1,
delta_clip=0.01,
enable_double_dqn=True,
)
dqn.compile(Adam(lr=0.00025), metrics=["mae"])
class EmbeddedRLPlayer(Player):
def choose_move(self, battle):
if np.random.rand() < 0.01: # avoids infinite loops
return self.choose_random_move(battle)
embedding = SimpleRLPlayer.embed_battle(self, battle)
action = dqn.forward(embedding)
return SimpleRLPlayer._action_to_move(self, action, battle)
emb_player = EmbeddedRLPlayer(
player_configuration=PlayerConfiguration("Embedded RL Player", None),
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
)
# Training
env_player.play_against(
env_algorithm=dqn_training,
opponent=opponent,
env_algorithm_kwargs={"dqn": dqn, "nb_steps": NB_TRAINING_STEPS},
)
model.save("model_%d" % NB_TRAINING_STEPS)
asyncio.get_event_loop().run_until_complete(final_tests()) My virtualenv runs python 3.6 with the following pacakges:
|
It worked just fine! It was the way I called the coroutine, it seems. |
If you call player.send_challenges or player.accept_challenges, you get this error:
RuntimeWarning: coroutine 'final_tests' was never awaited
final_tests()
If you wrap it under an async function and call it with await, you get this:
RuntimeError: Task <Task pending coro=<final_tests() running at pokerl.py:98> cb=[_run_until_complete_cb() at C:\Users\Username\Anaconda3\lib\asyncio\base_events.py:158]> got Future attached to a different loop
The text was updated successfully, but these errors were encountered: