New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved algorithm reaching 32k tile #27

Merged
merged 6 commits into from Jul 5, 2014

Conversation

Projects
None yet
5 participants
@xificurk
Contributor

xificurk commented Jun 7, 2014

This PR improves the algorithm, so that it consistently reaches 32k tile. Here (https://www.dropbox.com/s/9463l6ztznj3zy0/2048.png) is a "Game over" screenshot of one of the best games I've seen it play.

First there are couple of minor code tweaks. I have also moved caching and search termination code to score_tilechoose_node method. Imho, it makes more sense to use it after player's deterministic move, instead of after random computer move, although the benchmarking I tried did not show significant improvements. The two most important commits are new heuristic scoring and adaptive search depth limit.

New heuristic scoring

It uses a combination of three factors - number of empty tiles, number of available merge moves and monotonicity of the row/column.
An empty tile is in fact as important as the ability to create it by merging tiles of the same value.
The monotonicity of the row/column enforces generally good position for building up higher and higher tiles. Look at the code for more details, it's pretty self-explanatory. The calculation uses third powers of tile values, so that higher tiles get more severe penalty if they are moved in the middle of the board.

Adaptive search depth limit

This gives a huge speed boost, especially early in the game. The idea is that you need to search the game tree really deep only if the board gets complicated. A good measure of the board complexity is the number of distinct tiles. With this patch AI alternates between extremely fast segments (right after it builds a new highest tile) and slow segments (right before it builds a new highest tile).
I've measured a time it took to get to a certain tile for the original algorithm and the new one, here is how much time this patch saves:
2k: 50%
4k: 45%
8k: 30%
16k: 5%
As you can see, the time to reach winning 2k tile was cut in half and it stays faster roughly up to a point when it builds 16k tile.

Benchmark results

(for final results see the last post)

There is ceratinly a room for tweaking the code, but I doubt it will be possible to get another significant improvement while keeping the computational cost at a sane level.

@@ -213,17 +236,32 @@ static inline int get_max_rank(board_t board) {
return maxrank;
}
static inline int count_distinct_tiles(board_t board) {

This comment has been minimized.

@nneonneo

nneonneo Jun 7, 2014

Owner

A slightly faster approach would be to create a 16-bit value uint16_t bitset = 0, with bit i set if tile i is in the input. You can build it in one pass over the board (while(board) { bitset |= 1<<(board & 0xf); board >>= 4; }) and then count the number of bits set in the word (int count = 0; while(bitset) { count += bitset & 1; bitset >>= 1; }).

@nneonneo

nneonneo Jun 7, 2014

Owner

A slightly faster approach would be to create a 16-bit value uint16_t bitset = 0, with bit i set if tile i is in the input. You can build it in one pass over the board (while(board) { bitset |= 1<<(board & 0xf); board >>= 4; }) and then count the number of bits set in the word (int count = 0; while(bitset) { count += bitset & 1; bitset >>= 1; }).

This comment has been minimized.

@xificurk

xificurk Jun 8, 2014

Contributor

Oh, that's clever, thanks! I'll update the code incorporating your suggestions and force push the PR later today.

@xificurk

xificurk Jun 8, 2014

Contributor

Oh, that's clever, thanks! I'll update the code incorporating your suggestions and force push the PR later today.

Show outdated Hide outdated 2048.cpp
@nneonneo

This comment has been minimized.

Show comment
Hide comment
@nneonneo

nneonneo Jun 7, 2014

Owner

This is really nice work! I only had a few comments on the code (above). Otherwise it looks very good.

I took it for a test run, and on this first run it beat my previous record already (382960 to 377792). Later today I'll run my automated test harness to gather statistics on the new approach, and let you know how it fares!

Owner

nneonneo commented Jun 7, 2014

This is really nice work! I only had a few comments on the code (above). Otherwise it looks very good.

I took it for a test run, and on this first run it beat my previous record already (382960 to 377792). Later today I'll run my automated test harness to gather statistics on the new approach, and let you know how it fares!

@nneonneo nneonneo referenced this pull request Jun 7, 2014

Closed

road to 32768 #26

Show outdated Hide outdated 2048.cpp
@nemesix2001

This comment has been minimized.

Show comment
Hide comment
@nemesix2001

nemesix2001 Jun 11, 2014

good stuff! I can confirm similar results with another scoring function but with the adaptive search strategy... very goog idea indeed ;)

nemesix2001 commented Jun 11, 2014

good stuff! I can confirm similar results with another scoring function but with the adaptive search strategy... very goog idea indeed ;)

@xificurk

This comment has been minimized.

Show comment
Hide comment
@xificurk

xificurk Jun 11, 2014

Contributor

@nemesix2001 Interesting - what were the results of your scoring before this patch? Did it help just a little or was there a significant jump to higher scores?

Contributor

xificurk commented Jun 11, 2014

@nemesix2001 Interesting - what were the results of your scoring before this patch? Did it help just a little or was there a significant jump to higher scores?

@nemesix2001

This comment has been minimized.

Show comment
Hide comment
@nemesix2001

nemesix2001 Jun 11, 2014

With a fixed depth of 7 I was able to get these results
2048 100%
4096 100%
8192 96%
16384 53%

while with adaptive search I now get 32k ~5% and 16k jumps up to ~80%

nemesix2001 commented Jun 11, 2014

With a fixed depth of 7 I was able to get these results
2048 100%
4096 100%
8192 96%
16384 53%

while with adaptive search I now get 32k ~5% and 16k jumps up to ~80%

@rpdelaney

This comment has been minimized.

Show comment
Hide comment
@rpdelaney

rpdelaney Jun 12, 2014

I'd also be interested to know how each of these two unrelated changes affects scoring performance by itself. That is, we want to make sure that each change is improving performance over the status quo on its own: this helps avoid merging code that actually weakens the scoring performance when that code is introduced in the same pull request with strong code that obscures other weaknesses.

In this case, I'm suspicious that the adaptive search is improving performance more than the changes to the evaluation. It is possible that adaptive searching with a simpler evaluation would score even higher if that meant the search tree could be built deeper or wider.

rpdelaney commented Jun 12, 2014

I'd also be interested to know how each of these two unrelated changes affects scoring performance by itself. That is, we want to make sure that each change is improving performance over the status quo on its own: this helps avoid merging code that actually weakens the scoring performance when that code is introduced in the same pull request with strong code that obscures other weaknesses.

In this case, I'm suspicious that the adaptive search is improving performance more than the changes to the evaluation. It is possible that adaptive searching with a simpler evaluation would score even higher if that meant the search tree could be built deeper or wider.

@xificurk

This comment has been minimized.

Show comment
Hide comment
@xificurk

xificurk Jun 12, 2014

Contributor

@rpdelaney I don't have exactly the numbers you're asking for. To save time I did benchmarks with CPROB_THRESH_BASE = 0.001f and SEARCH_DEPTH_LIMIT = 7.

Original heuristic scoring (153 runs):
image

New heuristic scoring (1000 runs):
image

New heuristic scoring + CPROB_THRESH_BASE = 0.0001f + adaptive search depth limit (362 runs - updated version from my first post):
image

As you can see wider search and higher depth limit (or adaptive one) gives it a rather small nudge. The main improvement comes from the new heuristic scoring. Also note that the changes to the heuristic scoring are only in the initial table building, so that patch does NOT affect speed at all (well, ok, maybe a couple of milliseconds :-)).

Contributor

xificurk commented Jun 12, 2014

@rpdelaney I don't have exactly the numbers you're asking for. To save time I did benchmarks with CPROB_THRESH_BASE = 0.001f and SEARCH_DEPTH_LIMIT = 7.

Original heuristic scoring (153 runs):
image

New heuristic scoring (1000 runs):
image

New heuristic scoring + CPROB_THRESH_BASE = 0.0001f + adaptive search depth limit (362 runs - updated version from my first post):
image

As you can see wider search and higher depth limit (or adaptive one) gives it a rather small nudge. The main improvement comes from the new heuristic scoring. Also note that the changes to the heuristic scoring are only in the initial table building, so that patch does NOT affect speed at all (well, ok, maybe a couple of milliseconds :-)).

@rpdelaney

This comment has been minimized.

Show comment
Hide comment
@rpdelaney

rpdelaney Jun 12, 2014

Well, this looks promising indeed. Thank you.

Now I'm just musing, so take this as you will... Vasik Rajlich said he would test small changes to Rybka by having it play against itself with a fixed time control of 0.0005 seconds per move, or even faster. He found that evaluating tweaks based on millions of games played at these infinitesimally small time controls was much more reliable than fewer games at slower time controls: that is, a million games played at a faster-than-light time control would approximate performance over infinity better than 1000 games played at a standard time control.

We can take him as an authority on chess engines, and it has some plausibility here too. Running more games would go a long way to eliminating statistical noise from new tile randomness. Your win percentages in testing might be lower, but the resolution of the measurement would be higher. Just a thought :)

Thanks again. This looks like great work on a fascinating problem.

rpdelaney commented Jun 12, 2014

Well, this looks promising indeed. Thank you.

Now I'm just musing, so take this as you will... Vasik Rajlich said he would test small changes to Rybka by having it play against itself with a fixed time control of 0.0005 seconds per move, or even faster. He found that evaluating tweaks based on millions of games played at these infinitesimally small time controls was much more reliable than fewer games at slower time controls: that is, a million games played at a faster-than-light time control would approximate performance over infinity better than 1000 games played at a standard time control.

We can take him as an authority on chess engines, and it has some plausibility here too. Running more games would go a long way to eliminating statistical noise from new tile randomness. Your win percentages in testing might be lower, but the resolution of the measurement would be higher. Just a thought :)

Thanks again. This looks like great work on a fascinating problem.

@nneonneo

This comment has been minimized.

Show comment
Hide comment
@nneonneo

nneonneo Jun 27, 2014

Owner

Has the patch been updated for my comments? Once that's done I think it is ready to merge.

Owner

nneonneo commented Jun 27, 2014

Has the patch been updated for my comments? Once that's done I think it is ready to merge.

@xificurk

This comment has been minimized.

Show comment
Hide comment
@xificurk

xificurk Jun 27, 2014

Contributor

@nneonneo Yes, it was. You can merge it as it is now, but I want to give you heads up - I ran a couple of optimization algorithms to find a better heuristic scoring and it seems promising. I think within a week or two (after a bit more benchmarking) I'll have another patch ready.

Contributor

xificurk commented Jun 27, 2014

@nneonneo Yes, it was. You can merge it as it is now, but I want to give you heads up - I ran a couple of optimization algorithms to find a better heuristic scoring and it seems promising. I think within a week or two (after a bit more benchmarking) I'll have another patch ready.

Heuristic scoring:
- Take into account average value of tiles on board, thus forcing
  earlier merges of high tiles.
- Tune up individual parameters for better results - the chosen values
  are based on the results of CMA ES.
@xificurk

This comment has been minimized.

Show comment
Hide comment
@xificurk

xificurk Jul 5, 2014

Contributor

As I mentioned in my previous post, I did further refinement of heuristic scoring. Since this PR was not yet merged I've pushed an additional commit directly here.

Here are the results of the benchmark (246 runs totally):
image

The best run scored 829,300 points in 29,205 moves, which is pretty good considering that you should reach 65k tile around move 29,800.

Contributor

xificurk commented Jul 5, 2014

As I mentioned in my previous post, I did further refinement of heuristic scoring. Since this PR was not yet merged I've pushed an additional commit directly here.

Here are the results of the benchmark (246 runs totally):
image

The best run scored 829,300 points in 29,205 moves, which is pretty good considering that you should reach 65k tile around move 29,800.

@nneonneo

This comment has been minimized.

Show comment
Hide comment
@nneonneo

nneonneo Jul 5, 2014

Owner

This is really impressive. Nice job! I will definitely merge this PR. You've made quite an improvement!

Writing up what you did would make a very cool blog post or Stack Overflow answer, I'm sure :)

Owner

nneonneo commented Jul 5, 2014

This is really impressive. Nice job! I will definitely merge this PR. You've made quite an improvement!

Writing up what you did would make a very cool blog post or Stack Overflow answer, I'm sure :)

nneonneo added a commit that referenced this pull request Jul 5, 2014

Merge pull request #27 from xificurk/ng
Improved algorithm reaching 32k tile

@nneonneo nneonneo merged commit 7dca304 into nneonneo:master Jul 5, 2014

@tnmichael309

This comment has been minimized.

Show comment
Hide comment
@tnmichael309

tnmichael309 Jul 8, 2014

@xificurk : Hi, I'm new to CMA-ES, I'm wondering how do you apply this method to parameter tuning?

I'm new to this algorithm, I'm wondering how do you define solutions points and estimate those population size and so on.

It would be great help for me by understanding how to tune the 2048 heuristics' weights as examples.

Thank you in advance :)

tnmichael309 commented Jul 8, 2014

@xificurk : Hi, I'm new to CMA-ES, I'm wondering how do you apply this method to parameter tuning?

I'm new to this algorithm, I'm wondering how do you define solutions points and estimate those population size and so on.

It would be great help for me by understanding how to tune the 2048 heuristics' weights as examples.

Thank you in advance :)

kcwu added a commit to kcwu/2048-c that referenced this pull request Jul 30, 2014

@tnmichael309

This comment has been minimized.

Show comment
Hide comment
@tnmichael309

tnmichael309 May 22, 2015

We've used MS-TD learning and TD(lambda) to improve our 2048 AI.

The performance, source codes and papers references can be found here:
https://github.com/CGI-LAB/Taiwan_Bot_Tournament_2048
https://github.com/tnmichael309/2048AI

though codes are ugly....

we can have performance (1000 games)
Average 446116
Max 833300
32768 rate 33.5%
Speed 500 moves/sec

similar to the results with a slightly better avg/max/32768 rate, but large speed boost.
(no multithread, Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz CPU)

tnmichael309 commented May 22, 2015

We've used MS-TD learning and TD(lambda) to improve our 2048 AI.

The performance, source codes and papers references can be found here:
https://github.com/CGI-LAB/Taiwan_Bot_Tournament_2048
https://github.com/tnmichael309/2048AI

though codes are ugly....

we can have performance (1000 games)
Average 446116
Max 833300
32768 rate 33.5%
Speed 500 moves/sec

similar to the results with a slightly better avg/max/32768 rate, but large speed boost.
(no multithread, Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz CPU)

@nneonneo

This comment has been minimized.

Show comment
Hide comment
@nneonneo

nneonneo May 22, 2015

Owner

That's very impressive! Clearly the heuristics are very strong.

How well can it perform with deeper searching? If you make it work harder (maybe to 20 moves/sec) then can it get to 65536 sometimes?

Owner

nneonneo commented May 22, 2015

That's very impressive! Clearly the heuristics are very strong.

How well can it perform with deeper searching? If you make it work harder (maybe to 20 moves/sec) then can it get to 65536 sometimes?

@nneonneo

This comment has been minimized.

Show comment
Hide comment
@nneonneo

nneonneo May 22, 2015

Owner

Also, I see that there was a Threes! bot developed. I wonder if my threes bot (https://github.com/nneonneo/threes-ai) would have performed better or not. It is based on the same kind of deep search + heuristic approach, plus a bit of heuristic optimization with CMA-ES.

Owner

nneonneo commented May 22, 2015

Also, I see that there was a Threes! bot developed. I wonder if my threes bot (https://github.com/nneonneo/threes-ai) would have performed better or not. It is based on the same kind of deep search + heuristic approach, plus a bit of heuristic optimization with CMA-ES.

@tnmichael309

This comment has been minimized.

Show comment
Hide comment
@tnmichael309

tnmichael309 May 23, 2015

Hi, @nneonneo

Well, the bot for threes we developed did not further trained with TD(lambda)., but still trained with ms-td learning we proposed on TAAI. (Papers on Springer: http://download-v2.springer.com/static/pdf/229/chp%253A10.1007%252F978-3-319-13987-6_34.pdf?token2=exp=1432349702~acl=%2Fstatic%2Fpdf%2F229%2Fchp%25253A10.1007%25252F978-3-319-13987-6_34.pdf*~hmac=a1b5d1ef89c00ffbfdecc464bf9f5a3e82d53c420b4961a10b41242991e48a7c)

current result for threes (the source code is not updated yet)
Reaching rate ------ 1-ply ------ 2-ply ------ 3-ply (Search depth)
1536------ ------ ------ 28.4% ------ 93.8% ------ 97.4%
3072------ ------ ------ 0.8% ------ 48.5% ------ 65.8%
6144------ ------ ------ 0.0% ------ 1.0% ------ 9.0%
Maximum score------ 207096 ------ 709,341------ 771,108
Average score ------ 35,763 ------ 158,151------221,326
Speed (moves/sec) 21,537 ------ 14,099 ------ 298

For 2048, we run 10 thousands of games,
using 3-ply expectimax search (actually: play -> rand -> play -> rand -> play -> leaf, 2.5-ply...)
and we can have one 65536-tile(The score was 1064160)

Yet since I used bitboard with 'f' representing 32768,
I did not have much time to make further revision for it to show 65536.
and I did not run deeper search.

If we want to have 65536-tile, we should keep adding more stages to MS-TD learning to have 65536-tile(as mentioned in the paper), then we can have better performance using same search depth, and even try to have deeper search.

Thank you.

tnmichael309 commented May 23, 2015

Hi, @nneonneo

Well, the bot for threes we developed did not further trained with TD(lambda)., but still trained with ms-td learning we proposed on TAAI. (Papers on Springer: http://download-v2.springer.com/static/pdf/229/chp%253A10.1007%252F978-3-319-13987-6_34.pdf?token2=exp=1432349702~acl=%2Fstatic%2Fpdf%2F229%2Fchp%25253A10.1007%25252F978-3-319-13987-6_34.pdf*~hmac=a1b5d1ef89c00ffbfdecc464bf9f5a3e82d53c420b4961a10b41242991e48a7c)

current result for threes (the source code is not updated yet)
Reaching rate ------ 1-ply ------ 2-ply ------ 3-ply (Search depth)
1536------ ------ ------ 28.4% ------ 93.8% ------ 97.4%
3072------ ------ ------ 0.8% ------ 48.5% ------ 65.8%
6144------ ------ ------ 0.0% ------ 1.0% ------ 9.0%
Maximum score------ 207096 ------ 709,341------ 771,108
Average score ------ 35,763 ------ 158,151------221,326
Speed (moves/sec) 21,537 ------ 14,099 ------ 298

For 2048, we run 10 thousands of games,
using 3-ply expectimax search (actually: play -> rand -> play -> rand -> play -> leaf, 2.5-ply...)
and we can have one 65536-tile(The score was 1064160)

Yet since I used bitboard with 'f' representing 32768,
I did not have much time to make further revision for it to show 65536.
and I did not run deeper search.

If we want to have 65536-tile, we should keep adding more stages to MS-TD learning to have 65536-tile(as mentioned in the paper), then we can have better performance using same search depth, and even try to have deeper search.

Thank you.

@nneonneo

This comment has been minimized.

Show comment
Hide comment
@nneonneo

nneonneo May 23, 2015

Owner

I just used the trick of having F+F=F in my AI, so that if two 32768 are
touching, it merges them.

Very cool that you got 65536. That's a big achievement.

On Fri, May 22, 2015, 19:55 Kun-Hao Yeh notifications@github.com wrote:

Hi, @nneonneo https://github.com/nneonneo

Well, the bot for threes we developed did not further trained with
TD(lambda)., but still trained with ms-td learning we proposed on TAAI.
(Papers on Springer:
http://download-v2.springer.com/static/pdf/229/chp%253A10.1007%252F978-3-319-13987-6_34.pdf?token2=exp=1432349702~acl=%2Fstatic%2Fpdf%2F229%2Fchp%25253A10.1007%25252F978-3-319-13987-6_34.pdf*~hmac=a1b5d1ef89c00ffbfdecc464bf9f5a3e82d53c420b4961a10b41242991e48a7c
http://download-v2.springer.com/static/pdf/229/chp%253A10.1007%252F978-3-319-13987-6_34.pdf?token2=exp=1432349702%7Eacl=%2Fstatic%2Fpdf%2F229%2Fchp%25253A10.1007%25252F978-3-319-13987-6_34.pdf*%7Ehmac=a1b5d1ef89c00ffbfdecc464bf9f5a3e82d53c420b4961a10b41242991e48a7c
)

current result for threes (the source code is not updated yet)
Reaching rate 1-ply 2-ply 3-ply (Search depth)
1536 28.4% 93.8% 97.4%
3072 0.8% 48.5% 65.8%
6144 0.0% 1.0% 9.0%
Maximum score 207096 709,341 771,108
Average score 35,763 158,151 221,326
Speed (moves/sec) 21,537 14,099 298

For 2048, we run 10 thousands of games,
using 3-ply expectimax search (actually: play -> rand -> play -> rand ->
play -> leaf, 2.5-ply...)
and we can have one 65536-tile(The score was 1064160)

Yet since I used bitboard with 'f' representing 32768,
I did not have much time to make further revision for it to show 65536.
and I did not run deeper search.

If we want to have 32768-tile, we should keep adding more stages to MS-TD
learning to have 65536-tile(as mentioned in the paper), and then we can try
to have deeper search.

Thank you.


Reply to this email directly or view it on GitHub
#27 (comment).

Owner

nneonneo commented May 23, 2015

I just used the trick of having F+F=F in my AI, so that if two 32768 are
touching, it merges them.

Very cool that you got 65536. That's a big achievement.

On Fri, May 22, 2015, 19:55 Kun-Hao Yeh notifications@github.com wrote:

Hi, @nneonneo https://github.com/nneonneo

Well, the bot for threes we developed did not further trained with
TD(lambda)., but still trained with ms-td learning we proposed on TAAI.
(Papers on Springer:
http://download-v2.springer.com/static/pdf/229/chp%253A10.1007%252F978-3-319-13987-6_34.pdf?token2=exp=1432349702~acl=%2Fstatic%2Fpdf%2F229%2Fchp%25253A10.1007%25252F978-3-319-13987-6_34.pdf*~hmac=a1b5d1ef89c00ffbfdecc464bf9f5a3e82d53c420b4961a10b41242991e48a7c
http://download-v2.springer.com/static/pdf/229/chp%253A10.1007%252F978-3-319-13987-6_34.pdf?token2=exp=1432349702%7Eacl=%2Fstatic%2Fpdf%2F229%2Fchp%25253A10.1007%25252F978-3-319-13987-6_34.pdf*%7Ehmac=a1b5d1ef89c00ffbfdecc464bf9f5a3e82d53c420b4961a10b41242991e48a7c
)

current result for threes (the source code is not updated yet)
Reaching rate 1-ply 2-ply 3-ply (Search depth)
1536 28.4% 93.8% 97.4%
3072 0.8% 48.5% 65.8%
6144 0.0% 1.0% 9.0%
Maximum score 207096 709,341 771,108
Average score 35,763 158,151 221,326
Speed (moves/sec) 21,537 14,099 298

For 2048, we run 10 thousands of games,
using 3-ply expectimax search (actually: play -> rand -> play -> rand ->
play -> leaf, 2.5-ply...)
and we can have one 65536-tile(The score was 1064160)

Yet since I used bitboard with 'f' representing 32768,
I did not have much time to make further revision for it to show 65536.
and I did not run deeper search.

If we want to have 32768-tile, we should keep adding more stages to MS-TD
learning to have 65536-tile(as mentioned in the paper), and then we can try
to have deeper search.

Thank you.


Reply to this email directly or view it on GitHub
#27 (comment).

@tnmichael309

This comment has been minimized.

Show comment
Hide comment
@tnmichael309

tnmichael309 May 24, 2015

We've put our AI on the website: http://2048.aigames.nctu.edu.tw/

This is the record of 65536.
65536

And thanks for the trick sharing :)

tnmichael309 commented May 24, 2015

We've put our AI on the website: http://2048.aigames.nctu.edu.tw/

This is the record of 65536.
65536

And thanks for the trick sharing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment