Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gate timeout error messages #125

Closed
OhJayGee opened this issue Jun 8, 2018 · 16 comments
Closed

gate timeout error messages #125

OhJayGee opened this issue Jun 8, 2018 · 16 comments
Labels
bug Something isn't working

Comments

@OhJayGee
Copy link

OhJayGee commented Jun 8, 2018

I am seeing a number of errors from lichess-bot following the same pattern (see below).
All of them lead to a abort/game over or loss on time.

Does anybody else see that? is that actual connectivity, because the URLs seem correct, and the machines' internet connectivity was fine as well...

best regards,
OJG

+++ https://lichess.org/bG43Aj81/black Bullet vs BOT F_Me_Im_Famous(1500?) ERROR:backoff:Backing off api_post(...) for 0.9s (requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://lichess.org/api/bot/game/bG43Aj81/move/d1e2) ERROR:backoff:Giving up api_post(...) after 2 tries (requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/bG43Aj81/move/d1e2) Abandoning game due to connection error Traceback (most recent call last): File "lichess-bot.py", line 150, in play_game li.make_move(game.id, best_move) File "/ChessEngines/lichess-bot/lichess.py", line 71, in make_move return self.api_post(ENDPOINTS["move"].format(game_id, move)) File "/ChessEngines/lichess-bot/.venv/lib/python3.6/site-packages/backoff/_sync.py", line 99, in retry ret = target(*args, **kwargs) File "/ChessEngines/lichess-bot/lichess.py", line 61, in api_post response.raise_for_status() File "/ChessEngines/lichess-bot/.venv/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/bG43Aj81/move/d1e2 --- https://lichess.org/bG43Aj81/black Game over

@ddugovic
Copy link
Contributor

ddugovic commented Jun 8, 2018

I've seen similar messages, but only the case where the opponent hastily aborts the game before the first move.

@ShailChoksi ShailChoksi added the bug Something isn't working label Jun 9, 2018
@ShailChoksi
Copy link
Collaborator

This happens because:

  1. Lichess sends game started notification
  2. Lichess-bot starts the engine. While engine is starting:
    a. Opponent aborts
    b. Lichess send game ended notification
  3. Engine boots up and tries to play a move

If anyone has a good idea of how to fix this, make a PR or I will get to it once I have some time.

@OhJayGee
Copy link
Author

OhJayGee commented Jun 9, 2018

I am rather seeing this in the middle of a game, not at the beginning... I'll try to get some examples with actual timestamps. Is there a way I can turn more verbose logging?

@ShailChoksi
Copy link
Collaborator

@OhJayGee you can pass the -v flag: python lichess-bot.py -v for verbose logging

@gbtami
Copy link
Contributor

gbtami commented Jun 10, 2018

This can happen if game is finishing by any reason while BOT is thinking and then wants to li.make_move().

@careless25 maybe I'm wrong but I think 2.b. (Lichess send game ended notification) is a non existing thing. If this statement is correct I can imagine a workaround like this:

If HTTPError raised by li.make_move() we can look at BOTs ongoing games (using "/api/account/playing" endpoint) and if our current game is not in the ongoing games list then we know it's over.
Otherwise re raise the exception.

What do you think, is this a usable idea or there is something better?

@ddugovic
Copy link
Contributor

It seems the workaround is necessary in any event, since Lichess is online and messages take time to propagate.

@OhJayGee
Copy link
Author

newer observation: in the last week I did not observe any timeout flags caused by this error. Did anything change on the backend?

@drbeco
Copy link

drbeco commented Jun 26, 2018

Thank you @gbtami , for linking bug #128. Here it is a log and a picture that express exactly that:

Here a complete log, since BOOT until error. Very small, because it is in the first move.

# [DEBUG file:xadreco.c line:596]: Starting optarg loop...
# DEBUG MACRO compiled value: 1
# Debug verbose level set at: 1
# play random: no. seed: -r 0
# connection: -c lichess
# wait: -x wait_xboard
# book: -b livro.txt
# Xadreco version 5.84 build 180626.010155 (C) 1994-2018, by Dr. Beco
# Xadreco comes with ABSOLUTELY NO WARRANTY;
# This is free software, and you are welcome to redistribute it
# under certain conditions; Please, visit http://www.fsf.org/licenses/gpl.html
# for details.

# scanf: xboard
# 
# feature done=0
# feature myname="Xadreco 5.84"
# feature ping=0 setboard=1 playother=1 san=0 usermove=0 time=1 draw=1 sigint=0 sigterm=1 reuse=0 analyze=1 variants="normal" colors=0 ics=0 name=0 pause=0 nps=0 debug=1 memory=0 smp=0 exclude=0 setscore=0
# feature done=1
# scanf: protover
# xboard: ignoring protover
# scanf: 2
# xboard: ignoring 2
# scanf: post
# xboard: post. Xadreco will show what its thinking. (1)
# scanf: easy
# xboard: easy. Xadreco stop thinking. (1)
# scanf: ping
# scanf: 123
# pong 123
# scanf: force
# scanf: setboard
# xboard: setboard. Xadreco will set a board position.
# scanf: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR
# xboard: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR
# scanf: w
# xboard: w
# scanf: KQkq
# xboard: KQkq
# scanf: -
# xboard: -
# scanf: 0
# xboard: 0
# scanf: 1
# xboard: 1
# scanf: force
# xboard: force. Xadreco is in force mode.
# scanf: setboard
# xboard: setboard. Xadreco will set a board position.
# scanf: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR
# xboard: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR
# scanf: w
# xboard: w
# scanf: KQkq
# xboard: KQkq
# scanf: -
# xboard: -
# scanf: 0
# xboard: 0
# scanf: 1
# xboard: 1
# scanf: level
# scanf: 0
# scanf: 0:10
# scanf: 0
# xadreco level: 10.0s+0.0s por 60 lances: ajustado para st 0.166667 s por lance
# scanf: go
# xboard: go. Xadreco is now white.
# xadreco : Error. I don't know what to play... Playing a random move (compjoga)!
# move f2f4
ERROR:backoff:Giving up api_post(...) after 1 tries (requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/QzqnWDGC/move/f2f4)
ERROR:backoff:Giving up play_game(...) after 12 tries (requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/QzqnWDGC/move/f2f4)
Traceback (most recent call last):
  File "/home/beco/Documents/fontes/python/lichess-bot/logging_pool.py", line 16, in __call__
    result = self.__callable(*args, **kwargs)
  File "/home/beco/Documents/fontes/python/lichess-bot/.venv/lib/python3.5/site-packages/backoff/_sync.py", line 99, in retry
    ret = target(*args, **kwargs)
  File "lichess-bot.py", line 126, in play_game
    play_first_move(game, engine, board, li)
  File "lichess-bot.py", line 173, in play_first_move
    li.make_move(game.id, best_move)
  File "/home/beco/Documents/fontes/python/lichess-bot/lichess.py", line 71, in make_move
    return self.api_post(ENDPOINTS["move"].format(game_id, move))
  File "/home/beco/Documents/fontes/python/lichess-bot/.venv/lib/python3.5/site-packages/backoff/_sync.py", line 99, in retry
    ret = target(*args, **kwargs)
  File "/home/beco/Documents/fontes/python/lichess-bot/lichess.py", line 61, in api_post
    response.raise_for_status()
  File "/home/beco/Documents/fontes/python/lichess-bot/.venv/lib/python3.5/site-packages/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/QzqnWDGC/move/f2f4

You can see, it is the start position. Xadreco sent f2f4 as first move by the time opponent had already aborted the game:

image

lichess-bot tried to forward this move to the site and got a 400 error

400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/QzqnWDGC/move/f2f4

It seems that the bot needs to check if the game is still ongoing before trying to forward the move to the engine. The engine has no way to know that it don't need to move.

To the engine, just send a quit command and it will rest again.

@ddugovic
Copy link
Contributor

ddugovic commented Jun 26, 2018

Arguably HTTP 409 Conflict makes more sense than HTTP 400 Bad Request due to the opponent "editing" the game by aborting it.

Either way, 12 retries in the face of an HTTP 4xx response seems weird since those indicate Client Errors.

Hmm... maybe something like the following would help:
https://github.com/litl/backoff/blob/a2a0132eed09d8e3e8329c09014e6689fd8ef4b1/README.rst#L95

@drbeco
Copy link

drbeco commented Jun 29, 2018

Here another twist:

1. Engine has only 1 second before losing on time
2. Engine spends 3 seconds thinking
3. Web page gives victory and a rematch button is enabled, clicked, and no response
4. Log shows engine finally decided to play, bot gave 400 error and hangs
5. Bot asks engine to quit after error (probably it was on buffer already)
6. Engine quits and say good bye
7. Bot tries something else I don't get, and get a 404 (accept, probably when I clicked rematch)

Here the complete log:

# xadreco : N.80. Pretas. Tempo 3.000000s. Acumulado: 3.000000s. Hora: 2018-06-29 13:33:03
# xadreco : offer draw (1) value: -3.15
# move c1e3
# offer draw
# xadreco : N.81. Brancas. Tempo 178.000000s. Acumulado: 178.000000s. Hora: 2018-06-29 13:36:01
DEBUG:chess.engine:<PopenProcess at 0x75624770 (pid=28212)> >> move c1e3
DEBUG:chess.engine:<PopenProcess at 0x75624770 (pid=28212)> >> offer draw
DEBUG:urllib3.connectionpool:Resetting dropped connection: lichess.org
DEBUG:urllib3.connectionpool:https://lichess.org:443 "POST /api/bot/game/eNJKQTr6/move/c1e3 HTTP/1.1" 400 None
ERROR:backoff:Giving up api_post(...) after 1 tries (requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/eNJKQTr6/move/c1e3)
Traceback (most recent call last):
  File "lichess-bot.py", line 151, in play_game
    li.make_move(game.id, best_move)
  File "/home/beco/Documents/fontes/python/lichess-bot/lichess.py", line 71, in make_move
    return self.api_post(ENDPOINTS["move"].format(game_id, move))
  File "/home/beco/Documents/fontes/python/lichess-bot/.venv/lib/python3.5/site-packages/backoff/_sync.py", line 99, in retry
    ret = target(*args, **kwargs)
  File "/home/beco/Documents/fontes/python/lichess-bot/lichess.py", line 61, in api_post
    response.raise_for_status()
  File "/home/beco/Documents/fontes/python/lichess-bot/.venv/lib/python3.5/site-packages/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/eNJKQTr6/move/c1e3
DEBUG:chess.engine:<PopenProcess at 0x75624770 (pid=28212)> << quit
# scanf: quit
# xadreco : # Thanks for playing Xadreco.
# xadreco : sai ( 0 )
DEBUG:urllib3.connectionpool:Resetting dropped connection: lichess.org
DEBUG:urllib3.connectionpool:https://lichess.org:443 "POST /api/challenge/eNJKQTr6/accept HTTP/1.1" 404 None
ERROR:backoff:Giving up api_post(...) after 1 tries (requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://lichess.org/api/challenge/eNJKQTr6/accept)
DEBUG:urllib3.connectionpool:https://lichess.org:443 "POST /api/challenge/TgesCs9v/accept HTTP/1.1" 404 None
ERROR:backoff:Giving up api_post(...) after 1 tries (requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://lichess.org/api/challenge/TgesCs9v/accept)

So, it is kind of the same error, but instead of "abort" it was a game that ended due to normal reasons (timeout in this case).

Solution: Bot should see the game is ended, discard any output from engine and ask engine to quit.

@ddugovic
Copy link
Contributor

ddugovic commented Jun 30, 2018

@drbeco I agree, but isn't that more related to #131 ?

UCI specifies how much time the engine has remaining on the clock, and Stockfish already uses that clock time (via python-chess, via this library) to protect against a time loss.

@drbeco
Copy link

drbeco commented Jul 2, 2018

Hi @ddugovic , I think it is not related to protocol.

Both Xboard and UCI specifiees how much time the engine has left on the clock. Xboard actually have many ways to deliver time to the engine.

  • It is the engines job to play as it wishes with the time it have.
  • It is the communication bot's task to keep things smooth in any case, even if the engine lose in time.

To tell that the engine author should use better the time control and never loses on time is to displace responsibilities from what each program have, mixing things.

Of course I understand engine authors that would not even "imagine" an engine that may lose on time, or that will resign without a fight, or whatever. Most of engine authors simply tries to deliver the most "powerful" move, whatever that means.

Xadreco is much more like a human, it does lose on time, it resigns, it offer draws, if sacrifices without sound continuation; man, it even play random moves every now and them (literally "random").

So, if a engine loses on time, the communication bot should command it (whatever protocol) to stop.

  • Xboard: the engine should receive a force (if reuse is set) or a quit (if xreuse or feature reuse=0) is set.
  • UCI: once lost on time, it should receive a command stop or quit

Also, if the engine reply with a move before it could process the command to stop, lichess-bot should just ignore it, instead of trying to post it and get an 404 error in a finished game.

ShailChoksi added a commit that referenced this issue Jul 4, 2018
Do not retry on HTTPError due to opponent abort #125
@ddugovic
Copy link
Contributor

ddugovic commented Jul 5, 2018

@drbeco :

  • I agree that engines should be able to lose on time & that if the server notifies the client that the game has ended, the client should ignore engine moves. (Arguably, the client may be able to detect a time loss, except an opponent may add time to the clock so the client would need to check with the server anyway.) If the engine believes it has lost on time, it should be permitted to resign (even if server-side, the opponent added time to the clock so the engine hasn't actually lost on time). Currently I don't think the Lichess server notifies the client about a time loss.
  • I agree that if the engine is commanded to stop, and it replies with a move, the move should be ignored. I'm not aware of what would command the engine to stop.
  • Recently e2c9ad1 was merged so the client wouldn't repeatedly submit the engine move after a time loss (or other reason the server would send an HTTP 4xx-class "client error" response).

@OhJayGee
Copy link
Author

OhJayGee commented Jul 6, 2018

Got a new incident:
https://lichess.org/ovJEbL91/white#130
ERROR:backoff:Backing off api_post(...) for 0.6s (requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://lichess.org/api/bot/game/ovJEbL91/move/d5b4) ERROR:backoff:Giving up api_post(...) after 2 tries (requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/ovJEbL91/move/d5b4)
The question here is: did the bot lose on time and didn't know (like you guys talked about earlier on this thread?) - or did the bot lose on time because of 504 Server Error - wherever that came from?

In a 3'+1 game the bot normally should never lose on time as it can move within 20msec, plus 37ms ping time to the lichess server, and I have configured a 500ms overhead time, so 1 second is plenty. Yet 0.6s penalty for server error could have triggered loss on time.

@ddugovic
Copy link
Contributor

ddugovic commented Jul 6, 2018

I too (running in OVH!) have observed a single HTTP 504 response:

$ sudo journalctl -u lichess-bot.service | grep -A 2 -B 10 504
Jul 04 15:36:32 vps178028 python3[27801]: +++ Process Free. Total Queued: 0. Total Used: 0
Jul 04 15:36:46 vps178028 python3[27801]:     Accept Bullet casual challenge from Purusa(2242)
Jul 04 15:36:46 vps178028 python3[27801]: --- Process Queue. Total Queued: 1. Total Used: 0
Jul 04 15:36:46 vps178028 python3[27801]: --- Process Used. Total Queued: 0. Total Used: 1
Jul 04 15:36:47 vps178028 python3[27801]: +++ https://lichess.org/LYwbnM4e/white Bullet vs Purusa(2242)
Jul 04 15:37:54 vps178028 python3[27801]: --- https://lichess.org/LYwbnM4e/white Game over
Jul 04 15:37:54 vps178028 python3[27801]: +++ Process Free. Total Queued: 0. Total Used: 0
Jul 04 16:15:00 vps178028 python3[27801]: Something went wrong. Game is starting and we don't have a queued process
Jul 04 16:15:00 vps178028 python3[27801]: --- Process Used. Total Queued: 0. Total Used: 1
Jul 04 16:15:00 vps178028 python3[27801]: +++ https://lichess.org/tfiUOVkq/white UltraBullet vs AI level 8
Jul 04 16:16:35 vps178028 python3[27801]: ERROR:backoff:Backing off api_post(...) for 0.6s (requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://lichess.org/api/bot/game/tfiUOVkq/move/g1g5)
Jul 04 16:16:37 vps178028 python3[27801]: ERROR:backoff:Giving up api_post(...) after 2 tries (requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://lichess.org/api/bot/game/tfiUOVkq/move/g1g5)
Jul 04 16:16:37 vps178028 python3[27801]: Abandoning game due to connection error

@niklasf
Copy link

niklasf commented Jul 6, 2018

This is very likely a problem of Lichess itself, not of the bot. One of the database servers is slower than usual, leading to timeouts every now and then. We're working on it ...

Edit: The issue I am talking about started on the 23rd of June. Timeouts should be much less frequent before that.

ShailChoksi added a commit that referenced this issue Aug 27, 2018
Fix #125 gate timeout error messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants