Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new game: 2048 #894

Merged
merged 48 commits into from
Aug 20, 2022
Merged

Add new game: 2048 #894

merged 48 commits into from
Aug 20, 2022

Conversation

Jazeem
Copy link
Contributor

@Jazeem Jazeem commented Jul 29, 2022

@lanctot
Copy link
Collaborator

lanctot commented Jul 30, 2022

Very nice, thanks @Jazeem !!

I'm on vacation at the moment but will take a look when I'm back next week.

@lanctot lanctot mentioned this pull request Jul 30, 2022
@Jazeem
Copy link
Contributor Author

Jazeem commented Jul 30, 2022

Sure @lanctot

I'm curious to see if any algorithms implemented in open_spiel would be able to reach 2048. I tried the default mcts implementation, but it was not faring better than a random player.

@lanctot
Copy link
Collaborator

lanctot commented Jul 31, 2022

Hi @Jazeem,

Hmm.. MCTS performing similarly to random-- even for MCTS with random rollouts-- I'm surprised. I suspect a bug in the game implementation. How many simulations did MCTS have? And how long is the maximum length of each game?

Other algorithms: you could also try some RL algorithms like DQN. Or, expectimax with a heuristic evaluation function.

@lanctot
Copy link
Collaborator

lanctot commented Jul 31, 2022

Oh wait, depends on what you decided for the reward structure. If there are no intermediate rewards then I can see why it would be super hard for vanilla MCTS but still I would expect it to be better than random.

@Jazeem
Copy link
Contributor Author

Jazeem commented Jul 31, 2022

@lanctot yeah you are correct. I think it's because there's no intermediate rewards.

When I ran the simulation over 100 games, MCTS and random couldn't win once.
But when I reduced the winning condition to tile 128 instead of 2048, MCTS could win 100 games vs 48 games for random.

@steventrouble
Copy link

steventrouble commented Aug 2, 2022

This is really impressive!

I suspect the terminal condition and reward function still need some tweaking. 2048 doesn't end when the player reaches 2048, it keeps going forever. And the score isn't win/lose, it increases based on the number and size of the merged tiles.

The reason I mention this is that current SOTA algorithms can get scores of ~500k, which IIUC is far beyond reaching 2048.

Disclaimer: I am not an owner of OpenSpiel

@Jazeem
Copy link
Contributor Author

Jazeem commented Aug 3, 2022

Hi @steventrouble

I was experimenting around the terminal condition and reward function. My initial assumption was that setting the game to end at 2048 might make it easier to solve.

I've since then modified the reward system to give a reward for every new tile unlocked. But like you suggested it might be better to keep the game forever like the original and modify Rewards() as score from current action and Returns() as total score.

@lanctot
Copy link
Collaborator

lanctot commented Aug 11, 2022

Hi @steventrouble

I was experimenting around the terminal condition and reward function. My initial assumption was that setting the game to end at 2048 might make it easier to solve.

I recommend we make this as consistent with the original scoring (or with other RL implementations) as possible.

I've since then modified the reward system to give a reward for every new tile unlocked. But like you suggested it might be better to keep the game forever like the original and modify Rewards() as score from current action and Returns() as total score.

I think this should be configurable by a game parameter. All games in OpenSpiel are episodic: the tests assume this, so they should end in finite time (e.g. the game simulation tests can't go on forever) . I recommend you make a flag like "max_steps" or "max_game_length" parameter that plays 2048 for a finite number of steps or a flag that ends the episode after getting 2048 as the default, and then supporting the infinite game as a non-default parameter value.

@lanctot
Copy link
Collaborator

lanctot commented Aug 11, 2022

Clone of https://github.com/gabrielecirulli/2048

Can you clarify how much of the code in here is based on this implementation? If you actually copied code from that repos, we likely have to put their copyright notice in the file before we can import. (And also it'd be a nice courtesy to tag the authors to let them know, they might have some comments / preferences).

return BoardAt(r, c).value == 0;
}

Coordinate GetVector(int direction) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you can annotate this function with constexpr: https://en.cppreference.com/w/cpp/language/constexpr , which will execute it at compile time instead of at run time. It might be simple enough that the compiler will do that anyway, but if not, this could speed things up slightly.

GameType::ChanceMode::kExplicitStochastic,
GameType::Information::kPerfectInformation,
GameType::Utility::kGeneralSum,
GameType::RewardModel::kTerminal,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer a terminal reward, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Forgot to change this, initially the code was written as a win/loss game. Fixed it now.

void TwoZeroFourEightState::DoApplyAction(Action action) {
if (IsChanceNode()) {
// The original 2048 game starts with two random tiles
if (!extra_chance_turn_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think you're right. This is needed due to the explicit stochastic actions. Can you add that as a comment? If the action was applied implicitly then you wouldn't need this.

constexpr int kMoveRight = 1;
constexpr int kMoveDown = 2;
constexpr int kMoveLeft = 3;
inline const std::vector<Action> kPlayerActions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could statically define a std::array<Action, 4> instead of generating a vector each time this is called (which is tight in a loop in a few places!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LegalActions() expects a vector as returns. That's why kept this as vector too.

current_player_ = 0;
for (int r = 0; r < kRows; r++) {
for (int c = 0; c < kColumns; c++) {
SetBoard(r, c, Tile(board_seq[r * kRows + c], false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this should be r * kCols + c. It doesn't matter now because kRows == kCols, but this is a bug waiting to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, you are correct. Fixed it now.

reverse(y.begin(), y.end());
break;
case kMoveLeft:
case kMoveDown:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? Shouldn't kMoveLeft and kMoveDown do different things? As in shouldn't one reverse x and the other y? Or can you add a comment about what this does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Simplified this.

return {};
}
if (IsChanceNode()) {
return LegalChanceOutcomes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LegalChanceOutcomes doesn't seem to be defined anywhere. Is this ChanceOutcomes()?

Copy link
Contributor Author

@Jazeem Jazeem Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referencing backgammon.cc for writing this.
As I understand, LegalChanceOutcomes() is defined in spiel.h and it uses the output of ChanceOutcomes() to return vector<Action>

@lanctot
Copy link
Collaborator

lanctot commented Aug 17, 2022

Hi @Jazeem, something changes on the MacOS behavior via GitHub Actions, can you apply these changes to trigger the tests again: https://github.com/deepmind/open_spiel/pull/907/files

@lanctot
Copy link
Collaborator

lanctot commented Aug 17, 2022

Yeah it'll take more than just that one line, now there seems to be issues with the Jax versions .. :(

Keep an eye on #907, might only be resolved tomorrow.

@Jazeem
Copy link
Contributor Author

Jazeem commented Aug 17, 2022

Keep an eye on #907, might only be resolved tomorrow.

Sure. Will do 👍

@lanctot
Copy link
Collaborator

lanctot commented Aug 17, 2022

Keep an eye on #907, might only be resolved tomorrow.

Sure. Will do 👍

Ok tests are passing, so you just need to update the Jax and etc. package versions (one line change in install.sh)

@lanctot
Copy link
Collaborator

lanctot commented Aug 18, 2022

@Jazeem thanks, looking good! Are you done fixing things up for the moment?

I've contacted @tewalds who will review the import, would be good to pull this one in now if you're done making changes.

@Jazeem
Copy link
Contributor Author

Jazeem commented Aug 18, 2022

@lanctot
Yes, done fixing things up for now.

@lanctot lanctot added imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync. and removed merged internally The code is now submitted to our internal repo and will be merged in the next github sync. labels Aug 18, 2022
@OpenSpiel OpenSpiel merged commit f4121ea into google-deepmind:master Aug 20, 2022
@lanctot
Copy link
Collaborator

lanctot commented Aug 20, 2022

Hi @Jazeem, it's now been merged, but a fair bit changed from internal review. Please take a look through it let us know what you think. You can still do the almost-infinite version by setting max_tile to a very high value.

@Jazeem
Copy link
Contributor Author

Jazeem commented Aug 22, 2022

@lanctot It looks great 👍

Only thing I noticed is that if a player/agent makes a move that doesn't change anything on the board, then additional chance tiles are not added. So there could be a few inconsequential moves like that, so not sure if those should be counted in

int MaxGameLength() const override { return 2 * 2 * max_tile_; }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants