-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new game: 2048 #894
Add new game: 2048 #894
Conversation
…e game to never end
Very nice, thanks @Jazeem !! I'm on vacation at the moment but will take a look when I'm back next week. |
Sure @lanctot I'm curious to see if any algorithms implemented in open_spiel would be able to reach 2048. I tried the default mcts implementation, but it was not faring better than a random player. |
Hi @Jazeem, Hmm.. MCTS performing similarly to random-- even for MCTS with random rollouts-- I'm surprised. I suspect a bug in the game implementation. How many simulations did MCTS have? And how long is the maximum length of each game? Other algorithms: you could also try some RL algorithms like DQN. Or, expectimax with a heuristic evaluation function. |
Oh wait, depends on what you decided for the reward structure. If there are no intermediate rewards then I can see why it would be super hard for vanilla MCTS but still I would expect it to be better than random. |
@lanctot yeah you are correct. I think it's because there's no intermediate rewards. When I ran the simulation over 100 games, MCTS and random couldn't win once. |
This is really impressive! I suspect the terminal condition and reward function still need some tweaking. 2048 doesn't end when the player reaches 2048, it keeps going forever. And the score isn't win/lose, it increases based on the number and size of the merged tiles. The reason I mention this is that current SOTA algorithms can get scores of ~500k, which IIUC is far beyond reaching 2048. Disclaimer: I am not an owner of OpenSpiel |
I was experimenting around the terminal condition and reward function. My initial assumption was that setting the game to end at 2048 might make it easier to solve. I've since then modified the reward system to give a reward for every new tile unlocked. But like you suggested it might be better to keep the game forever like the original and modify Rewards() as score from current action and Returns() as total score. |
I recommend we make this as consistent with the original scoring (or with other RL implementations) as possible.
I think this should be configurable by a game parameter. All games in OpenSpiel are episodic: the tests assume this, so they should end in finite time (e.g. the game simulation tests can't go on forever) . I recommend you make a flag like "max_steps" or "max_game_length" parameter that plays 2048 for a finite number of steps or a flag that ends the episode after getting 2048 as the default, and then supporting the infinite game as a non-default parameter value. |
Can you clarify how much of the code in here is based on this implementation? If you actually copied code from that repos, we likely have to put their copyright notice in the file before we can import. (And also it'd be a nice courtesy to tag the authors to let them know, they might have some comments / preferences). |
open_spiel/games/2048.cc
Outdated
return BoardAt(r, c).value == 0; | ||
} | ||
|
||
Coordinate GetVector(int direction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure you can annotate this function with constexpr: https://en.cppreference.com/w/cpp/language/constexpr , which will execute it at compile time instead of at run time. It might be simple enough that the compiler will do that anyway, but if not, this could speed things up slightly.
open_spiel/games/2048.cc
Outdated
GameType::ChanceMode::kExplicitStochastic, | ||
GameType::Information::kPerfectInformation, | ||
GameType::Utility::kGeneralSum, | ||
GameType::RewardModel::kTerminal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer a terminal reward, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Forgot to change this, initially the code was written as a win/loss game. Fixed it now.
void TwoZeroFourEightState::DoApplyAction(Action action) { | ||
if (IsChanceNode()) { | ||
// The original 2048 game starts with two random tiles | ||
if (!extra_chance_turn_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think you're right. This is needed due to the explicit stochastic actions. Can you add that as a comment? If the action was applied implicitly then you wouldn't need this.
constexpr int kMoveRight = 1; | ||
constexpr int kMoveDown = 2; | ||
constexpr int kMoveLeft = 3; | ||
inline const std::vector<Action> kPlayerActions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could statically define a std::array<Action, 4> instead of generating a vector each time this is called (which is tight in a loop in a few places!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LegalActions() expects a vector as returns. That's why kept this as vector too.
open_spiel/games/2048.cc
Outdated
current_player_ = 0; | ||
for (int r = 0; r < kRows; r++) { | ||
for (int c = 0; c < kColumns; c++) { | ||
SetBoard(r, c, Tile(board_seq[r * kRows + c], false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this should be r * kCols + c
. It doesn't matter now because kRows == kCols, but this is a bug waiting to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Yes, you are correct. Fixed it now.
reverse(y.begin(), y.end()); | ||
break; | ||
case kMoveLeft: | ||
case kMoveDown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? Shouldn't kMoveLeft and kMoveDown do different things? As in shouldn't one reverse x and the other y? Or can you add a comment about what this does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Simplified this.
return {}; | ||
} | ||
if (IsChanceNode()) { | ||
return LegalChanceOutcomes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LegalChanceOutcomes doesn't seem to be defined anywhere. Is this ChanceOutcomes()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referencing backgammon.cc for writing this.
As I understand, LegalChanceOutcomes() is defined in spiel.h and it uses the output of ChanceOutcomes() to return vector<Action>
Hi @Jazeem, something changes on the MacOS behavior via GitHub Actions, can you apply these changes to trigger the tests again: https://github.com/deepmind/open_spiel/pull/907/files |
Yeah it'll take more than just that one line, now there seems to be issues with the Jax versions .. :( Keep an eye on #907, might only be resolved tomorrow. |
Sure. Will do 👍 |
Ok tests are passing, so you just need to update the Jax and etc. package versions (one line change in install.sh) |
@lanctot |
Hi @Jazeem, it's now been merged, but a fair bit changed from internal review. Please take a look through it let us know what you think. You can still do the almost-infinite version by setting max_tile to a very high value. |
@lanctot It looks great 👍 Only thing I noticed is that if a player/agent makes a move that doesn't change anything on the board, then additional chance tiles are not added. So there could be a few inconsequential moves like that, so not sure if those should be counted in
|
Clone of https://github.com/gabrielecirulli/2048