Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Which level OBL was uploaded in the March 2021 push? #28

Open
hoseasiu opened this issue Sep 1, 2021 · 8 comments
Open

Which level OBL was uploaded in the March 2021 push? #28

hoseasiu opened this issue Sep 1, 2021 · 8 comments

Comments

@hoseasiu
Copy link

hoseasiu commented Sep 1, 2021

Hi Hengyuan,

We've been trying out the OBL model that you had uploaded, and it's a very good agent - certainly the most human and performant of the learning-based agents I've played with. Two questions came up when we tried it that we were hoping you could clarify.

  1. The paper refers to multiple levels of OBL bots, but only one was uploaded, and it wasn't clear which one this was from the readme or the bot name. Which level was it? In our (human) interactions with it, it occasionally played cards without full information, especially when given a hint on a newly-drawn card, which seems to indicate deviation from optimal grounded policy and make it a higher-order OBL behavior to me?

  2. We also noticed that the bot sometimes makes incorrect play attempts on cards with full information, again typically when the cards are newly drawn and hinted towards. This seems to be a case where learned convention at higher levels is overriding optimal grounded policy? Is that consistent with your experience?

Thanks!
Hosea

@hengyuan-hu
Copy link
Contributor

The OBL agent available here is a level 4 agent, so it is not the grounded policy.

That is very abnormal. How do you play with the agent? If you are using the UI in the SPARTA repo and the convert_model.py script here, then the converted model is wrong. The OBL agents use a public-private network, which is different from the model in the convert_model.py

@0xJchen
Copy link

0xJchen commented Sep 2, 2021

Hi, hengyuan. Does it make sense to interpret the public-private network as a way for accelerating training & inference?
In the LBS paper, the model is described as follows:

To avoid having to re-unroll the policies for the other agents from the beginning of the game for each of the sampled τ , LBS uses a specific RNN architecture.

My understanding is that—— instead of each agent individually unfolding their priv_s through their own lstm(like in SAD ), now agents can share the encoding of the public observation, and their private observations are just simply forwarded through an MLP. I wonder if my understanding is correct.

@hoseasiu
Copy link
Author

hoseasiu commented Sep 2, 2021

@hengyuan-hu Thanks for the response on the OBL level, that makes sense to me.

@keenlooks did most of the work to make the OBL model work with a slightly modified version of the webapp from the SPARTA repo, but from what I gather, he didn't use the convert_model.py script. It was based on the code from https://github.com/facebookresearch/hanabi_SAD/blob/master/pyhanabi/tools/obl_model.py @keenlooks - any other details there?

@keenlooks
Copy link

keenlooks commented Sep 2, 2021

@hoseasiu that's correct. I created a forward method in the class in obl_model.py with the inputs/outputs SPARTA expected, loaded the weight values from obl.pthw, then exported the class via torch.jit.save.

@hengyuan-hu
Copy link
Contributor

@keenlooks That sounds right. Have you checked the selfplay score of the converted jit model?
@hoseasiu Can you specify a bit more on "makes incorrect play attempts on cards with full information, again typically when the cards are newly drawn and hinted towards". If the card is newly drawn, how does the bot know the full info? Have you hinted both color and rank? For this bot I think if you hint the color of the newly drawn card it will have a very high tendency to play it, a conventions learned from previous level OBL belief.

@hengyuan-hu
Copy link
Contributor

@peppacat If we use a private lstm, then we not only need to sample a hand (o_private), but also the entire history of my hand (tau_private). Therefore we have to use a network structure where the recurrent part does not depend on tau_private, both feed-forward & public-private network satisfy this requirement.

@hoseasiu
Copy link
Author

hoseasiu commented Sep 4, 2021

By "newly drawn," I just mean that it's the newest card in the bot's hand. It will have been there for at least long enough for it to receive two hints that give it full information on the card, but in the interim, the bot didn't draw anything new, so it's still the newest card. In the cases we saw, the bot would play that newest card after the second applicable hint, even though when taken together, the revealed information on that card gave it perfect information that the card was in fact not playable. We can post some examples the next time we test.

@keenlooks
Copy link

@keenlooks I have not checked the self-play score of the converted JIT model. @hoseasiu do you now if you all have been able to check that yet?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants