Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending or accepting challenges gives an error about coroutines #43

Closed
Gummygamer opened this issue May 6, 2020 · 9 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@Gummygamer
Copy link

Gummygamer commented May 6, 2020

If you call player.send_challenges or player.accept_challenges, you get this error:

RuntimeWarning: coroutine 'final_tests' was never awaited
final_tests()

If you wrap it under an async function and call it with await, you get this:

RuntimeError: Task <Task pending coro=<final_tests() running at pokerl.py:98> cb=[_run_until_complete_cb() at C:\Users\Username\Anaconda3\lib\asyncio\base_events.py:158]> got Future attached to a different loop

@hsahovic hsahovic self-assigned this May 6, 2020
@hsahovic hsahovic added the bug Something isn't working label May 6, 2020
@hsahovic
Copy link
Owner

hsahovic commented May 6, 2020

Hi @Gummygamer ! Thanks for opening this issue.

I do not think that we have a final_tests coroutine in poke-env. Would you mind sharing a snippet to reproduce the issue?

@Gummygamer
Copy link
Author

I just defined that function at the RL test as a placeholder for sending or accepting challenges, it's just this:

async def final_tests():
await env_player.send_challenges('Gummygamer',100)

if I change to accepting challenges, I get the same issue. I can send the whole code for further inspection, but it's almost identical to the RL example at the documentation.

@Gummygamer
Copy link
Author

pokerl.zip

@hsahovic
Copy link
Owner

hsahovic commented May 6, 2020

Thanks, I'll take a look at it!

@hsahovic hsahovic added this to To do in Poke-env - general May 7, 2020
@hsahovic
Copy link
Owner

hsahovic commented May 8, 2020

Hi!

I took a look at the code you sent: the problem is that you are trying to send challenges with an agent using the open AI API. This API is only meant for training models, and currently does not support this use-case directly - it is planned for later but requires additional work.

You can work around that by defining an agent similar to this one:

class EmbeddedRLPlayer(Player):
    def choose_move(self, battle):
        if np.random.rand() < 0.01:  # avoids infinite loops
            return self.choose_random_move(battle)
        embedding = SimpleRLPlayer.embed_battle(self, battle)  
        action = self.dqn.forward(embedding)
        return SimpleRLPlayer._action_to_move(self, action, battle)

Once instantiated, you can use this agent to challenge a human with send_challenges.

@hsahovic hsahovic closed this as completed May 8, 2020
Poke-env - general automation moved this from To do to Done May 8, 2020
@Gummygamer
Copy link
Author

It still happens the same way after embedding.
pokerl.zip

@hsahovic hsahovic reopened this May 9, 2020
Poke-env - general automation moved this from Done to In progress May 9, 2020
@hsahovic
Copy link
Owner

hsahovic commented May 9, 2020

@Gummygamer that's unexpected; I'll post a full version later today

@hsahovic
Copy link
Owner

hsahovic commented May 9, 2020

I confirm that I was able to battle the dqn agent using the embedded player; here is the exact code (up to the username) that I used:

# -*- coding: utf-8 -*-
import numpy as np
import tensorflow as tf

from poke_env.player_configuration import PlayerConfiguration
from poke_env.player.env_player import Gen8EnvSinglePlayer
from poke_env.player.random_player import RandomPlayer
from poke_env.player.player import Player
from poke_env.server_configuration import LocalhostServerConfiguration

from rl.agents.dqn import DQNAgent
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam

import asyncio


# We define our RL player
# It needs a state embedder and a reward computer, hence these two methods
class SimpleRLPlayer(Gen8EnvSinglePlayer):
    def embed_battle(self, battle):
        # -1 indicates that the move does not have a base power
        # or is not available
        moves_base_power = -np.ones(4)
        moves_dmg_multiplier = np.ones(4)
        for i, move in enumerate(battle.available_moves):
            moves_base_power[i] = (
                move.base_power / 100
            )  # Simple rescaling to facilitate learning
            if move.type:
                moves_dmg_multiplier[i] = move.type.damage_multiplier(
                    battle.opponent_active_pokemon.type_1,
                    battle.opponent_active_pokemon.type_2,
                )

        # We count how many pokemons have not fainted in each team
        remaining_mon_team = (
            len([mon for mon in battle.team.values() if mon.fainted]) / 6
        )
        remaining_mon_opponent = (
            len([mon for mon in battle.opponent_team.values() if mon.fainted]) / 6
        )

        # Final vector with 10 components
        return np.concatenate(
            [
                moves_base_power,
                moves_dmg_multiplier,
                [remaining_mon_team, remaining_mon_opponent],
            ]
        )

    def compute_reward(self, battle) -> float:
        return self.reward_computing_helper(
            battle, fainted_value=2, hp_value=1, victory_value=30
        )


class MaxDamagePlayer(RandomPlayer):
    def choose_move(self, battle):
        # If the player can attack, it will
        if battle.available_moves:
            # Finds the best move among available ones
            best_move = max(battle.available_moves, key=lambda move: move.base_power)
            return self.create_order(best_move)

        # If no attack is available, a random switch will be made
        else:
            return self.choose_random_move(battle)
            


NB_TRAINING_STEPS = 1
NB_EVALUATION_EPISODES = 1

tf.random.set_seed(0)
np.random.seed(0)


# This is the function that will be used to train the dqn
def dqn_training(player, dqn, nb_steps):
    dqn.fit(player, nb_steps=nb_steps)
    player.complete_current_battle()


def dqn_evaluation(player, dqn, nb_episodes):
    # Reset battle statistics
    player.reset_battles()
    dqn.test(player, nb_episodes=nb_episodes, visualize=False, verbose=False)

    print(
        "DQN Evaluation: %d victories out of %d episodes"
        % (player.n_won_battles, nb_episodes)
    )
    
    
async def final_tests():
    await emb_player.send_challenges('username',100)


if __name__ == "__main__":
    env_player = SimpleRLPlayer(
        player_configuration=PlayerConfiguration("RL Player", None),
        battle_format="gen8randombattle",
        server_configuration=LocalhostServerConfiguration,
    )

    opponent = RandomPlayer(
        player_configuration=PlayerConfiguration("Random player", None),
        battle_format="gen8randombattle",
        server_configuration=LocalhostServerConfiguration,
    )

    second_opponent = MaxDamagePlayer(
        player_configuration=PlayerConfiguration("Max damage player", None),
        battle_format="gen8randombattle",
        server_configuration=LocalhostServerConfiguration,
    )

    # Output dimension
    n_action = len(env_player.action_space)

    model = Sequential()
    model.add(Dense(128, activation="elu", input_shape=(1, 10)))

    # Our embedding have shape (1, 10), which affects our hidden layer
    # dimension and output dimension
    # Flattening resolve potential issues that would arise otherwise
    model.add(Flatten())
    model.add(Dense(64, activation="elu"))
    model.add(Dense(n_action, activation="linear"))

    memory = SequentialMemory(limit=10000, window_length=1)

    # Ssimple epsilon greedy
    policy = LinearAnnealedPolicy(
        EpsGreedyQPolicy(),
        attr="eps",
        value_max=1.0,
        value_min=0.05,
        value_test=0,
        nb_steps=10000,
    )

    # Defining our DQN
    dqn = DQNAgent(
        model=model,
        nb_actions=len(env_player.action_space),
        policy=policy,
        memory=memory,
        nb_steps_warmup=1000,
        gamma=0.5,
        target_model_update=1,
        delta_clip=0.01,
        enable_double_dqn=True,
    )

    dqn.compile(Adam(lr=0.00025), metrics=["mae"])
    
    class EmbeddedRLPlayer(Player):
      def choose_move(self, battle):
        if np.random.rand() < 0.01:  # avoids infinite loops
            return self.choose_random_move(battle)
        embedding = SimpleRLPlayer.embed_battle(self, battle)  
        action = dqn.forward(embedding)
        return SimpleRLPlayer._action_to_move(self, action, battle)
    
    emb_player = EmbeddedRLPlayer(
        player_configuration=PlayerConfiguration("Embedded RL Player", None),
        battle_format="gen8randombattle",
        server_configuration=LocalhostServerConfiguration,
    )

    # Training
    env_player.play_against(
        env_algorithm=dqn_training,
        opponent=opponent,
        env_algorithm_kwargs={"dqn": dqn, "nb_steps": NB_TRAINING_STEPS},
    )
    model.save("model_%d" % NB_TRAINING_STEPS)

    asyncio.get_event_loop().run_until_complete(final_tests())

My virtualenv runs python 3.6 with the following pacakges:

absl-py==0.9.0
aiologger==0.5.0
alabaster==0.7.12
appdirs==1.4.3
astor==0.8.1
astunparse==1.6.3
asynctest==0.13.0
attrs==19.3.0
Babel==2.8.0
black==19.10b0
bleach==3.1.5
cachetools==4.1.0
certifi==2020.4.5.1
cfgv==3.1.0
chardet==3.0.4
click==7.1.2
cloudpickle==1.3.0
coverage==5.1
dataclasses==0.7
distlib==0.3.0
docutils==0.16
entrypoints==0.3
filelock==3.0.12
flake8==3.7.9
future==0.18.2
gast==0.2.2
google-auth==1.14.2
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.28.1
gym==0.17.2
h5py==2.10.0
identify==1.4.15
idna==2.9
imagesize==1.2.0
importlib-metadata==1.6.0
importlib-resources==1.5.0
Jinja2==2.11.2
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keras-rl2==1.0.3
keyring==21.2.1
libcst==0.3.4
Markdown==3.2.2
MarkupSafe==1.1.1
mccabe==0.6.1
more-itertools==8.2.0
mypy-extensions==0.4.3
nodeenv==1.3.5
numpy==1.18.4
oauthlib==3.1.0
opt-einsum==3.2.1
packaging==20.3
pathspec==0.8.0
pkginfo==1.5.0.1
pluggy==0.13.1
pre-commit==2.3.0
protobuf==3.11.3
psutil==5.7.0
py==1.8.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.5.0
pyflakes==2.1.1
pyglet==1.5.0
Pygments==2.6.1
pyparsing==2.4.7
pyre-check==0.0.46
pyre-extensions==0.0.18
pytest==5.4.2
pytest-asyncio==0.12.0
pytest-cov==2.8.1
pytest-timeout==1.3.4
pytz==2020.1
pywatchman==1.4.1
PyYAML==5.3.1
readme-renderer==26.0
regex==2020.5.7
requests==2.23.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
rsa==4.0
scipy==1.4.1
six==1.14.0
snowballstemmer==2.0.0
Sphinx==3.0.3
sphinx-rtd-theme==0.4.3
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
tabulate==0.8.7
tb-nightly==1.14.0a20190603
tensorboard==2.1.1
tensorboard-plugin-wit==1.6.0.post3
tensorflow==2.0.0b1
tensorflow-estimator==2.1.0
termcolor==1.1.0
tf-estimator-nightly==1.14.0.dev2019060501
toml==0.10.0
tqdm==4.46.0
twine==3.1.1
typed-ast==1.4.1
typing-extensions==3.7.4.2
typing-inspect==0.6.0
urllib3==1.25.9
virtualenv==20.0.20
wcwidth==0.1.9
webencodings==0.5.1
websockets==8.1
Werkzeug==1.0.1
wrapt==1.12.1
zipp==3.1.0

@Gummygamer
Copy link
Author

It worked just fine! It was the way I called the coroutine, it seems.

Poke-env - general automation moved this from In progress to Done May 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

2 participants