New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async readAll in httpclient produces garbled output. #8994

Closed
treeform opened this Issue Sep 17, 2018 · 28 comments

Comments

Projects
None yet
8 participants
@treeform
Copy link
Contributor

treeform commented Sep 17, 2018

When I run this code it crashes with json error about 25% of the time:

import httpclient, asyncdispatch, json

proc test*() {.async.} =
  var client = newAsyncHttpClient()
  var response = await client.get("https://raw.githubusercontent.com/treeform/chrono/master/tzdata/dstchanges.json")
  echo parseJson(await response.body)

waitFor test()

Json url endpoint is static, but if you print out the json body it looks like chunks are missing.

If you run the script do you get the same error?

I think some thing is wrong with readAll + async in https://github.com/nim-lang/Nim/blob/master/lib/pure/httpclient.nim#L196

Nim Compiler Version 0.18.1 [Windows: amd64]
Compiled at 2018-09-07
Copyright (c) 2006-2018 by Andreas Rumpf

git hash: e81fe6d32f485f721fccd87e922addc5715a5fdd
active boot switches: -d:release
@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Sep 17, 2018

The missing data can happen at the end, but usually it appears to be in the middle, maybe the data is just out of order, its hard to tell.

I am trying to look into: if the async loop does read callbacks out of order it might case the data to be out of order.

@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Sep 17, 2018

It also fails on mac:

Nim Compiler Version 0.18.1 [MacOSX: amd64]
Compiled at 2018-09-07
Copyright (c) 2006-2018 by Andreas Rumpf

git hash: 90025e242db1f4b2f2bca18
@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Sep 17, 2018

On mac it fails 100% of the time.

@dm1try

This comment has been minimized.

Copy link

dm1try commented Sep 17, 2018

On mac it fails 100% of the time.

confirmed for devel branch
but it works without a problem on osx with 0.18.0; so it looks like some regression

@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Sep 17, 2018

@dm1try Have you tried running it over and over. It might have a small chance of failing?

@dm1try

This comment has been minimized.

Copy link

dm1try commented Sep 17, 2018

I've just tried 50 times with repeat 50 ./app and there is no any fail =) on devel it fails 100% of the time. I can do some bisect a little bit later

@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Sep 17, 2018

Cool. Thank you for verifying!

@skilchen

This comment has been minimized.

Copy link
Contributor

skilchen commented Sep 17, 2018

This regression was introduced by:

Author: Emery Hemingway <ehmry@posteo.net>
Date:   Tue Aug 7 17:36:56 2018 +0200

  AsyncHttpClient: return from requests before body completion
  
  Store the body completion future at the client and wait for it to
  complete before issuing additional requests. This allows the body
  FutureStream reader to drain the stream and read buffers to be freed
  asynchronously.
  
  Fix #8109

It seems clear that

  when client is AsyncHttpClient:
    if not client.parseBodyFut.isNil:
      # let the current operation finish before making another request
      await client.parseBodyFut
      client.parseBodyFut = nil

is at the wrong place in httpclient. If you perform only a single request (which you always should, but this might be controversial...) then this await client.parseBodyFut will not be executed.

If you move the block when client is AsyncHttpClient... below the assignment to result (ie. to the end of requestAux), then your issue should go away.

I don't know if this move reintroduces #8109 which @ehmry tried to fix.

@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Sep 17, 2018

Wow thank you for the detective work!

I think it mostly works with smaller bodies, this is why the bug has gone unnoticed. When the body is large and requires a ton of reads this happens.

Do you have a patch that fixes it?

@dom96 did the patch that fixed the prev bug #8109 and created this one https://github.com/nim-lang/Nim/pull/8559/files . I hope he can help us.

@dom96

This comment has been minimized.

Copy link
Member

dom96 commented Sep 17, 2018

It's been a long day so I don't quite understand the cause of this as explained by @skilchen.

But the bug that was fixed has a repro so you should be able to verify whatever solution you come up with.

@dm1try

This comment has been minimized.

Copy link

dm1try commented Sep 17, 2018

I don't quite understand the cause of this

+1

If you move the block when client is AsyncHttpClient...

this action will revert (in the weird way :) ) the previous work; in result you will have same blocking behaviour for parsing a body; so this solution is not so good

interesting thing that this is only reproducible with enabled ssl; any thoughts?(some limit values?)

@skilchen

This comment has been minimized.

Copy link
Contributor

skilchen commented Sep 17, 2018

maybe we could add something like:

proc responseCompleted*(client: AsyncHttpClient) {.async.} = 
  if not client.parseBodyFut.isNil:
    # let the client parse the complete body
    await client.parseBodyFut
    client.parseBodyFut = nil

to httpclient.nim. Then you would have to call:

await client.responseCompleted()

before doing anything with the response.body:

import httpclient, asyncdispatch, json

proc test*() {.async.} =
  var client = newAsyncHttpClient()
  var response = await client.get("https://raw.githubusercontent.com/treeform/chrono/master/tzdata/dstchanges.json")
  await client.responseCompleted()
  echo parseJson(await response.body)

waitFor test()

Something like this (adapted from #8109) would then continue to work:

import asyncdispatch
import httpclient
import strutils

proc test*() {.async.} =
  var client = newAsyncHttpClient()
  let r = await client.get("http://stream.meetup.com/2/rsvps")
  for i in 1..10:
    let b = await r.bodyStream.read()
    echo intToStr(i, 2), " ", b

waitFor test()

ehmry added a commit to ehmry/Nim that referenced this issue Sep 18, 2018

ehmry added a commit to ehmry/Nim that referenced this issue Sep 18, 2018

ehmry added a commit to ehmry/Nim that referenced this issue Sep 18, 2018

AsyncHttpClient: new procedures for downloading to streams
Add "getStream" for "HttpClient" and "AsyncHttpClient" to download
request body into a stream provided by the caller. In the async case the
body may begin processing before "getStream" completes.

Ref nim-lang#8994
Fix nim-lang#8109
@ehmry

This comment has been minimized.

Copy link
Contributor

ehmry commented Sep 18, 2018

My bad, I say revert the commit for #8109 and nobody has to figure out what went wrong and we keep the old behavior, in the case of processing bodies asynchronously (the low-memory, large body situation) procedures may be provided to GET into streams prepared by the caller, see 72106d8.

ehmry added a commit to ehmry/Nim that referenced this issue Sep 19, 2018

ehmry added a commit to ehmry/Nim that referenced this issue Sep 19, 2018

AsyncHttpClient: new procedures for downloading to streams
Add "getStream" for "HttpClient" and "AsyncHttpClient" to download
request body into a stream provided by the caller. In the async case the
body may begin processing before "getStream" completes.

Ref nim-lang#8994
Fix nim-lang#8109
@skilchen

This comment has been minimized.

Copy link
Contributor

skilchen commented Sep 19, 2018

Could you please check, if adding

await sleepAsync(1)

after Line 1002 in httpclient.nim

fixes the garbling issue #8994 without reintroducing #8109?

@dm1try

This comment has been minimized.

Copy link

dm1try commented Sep 20, 2018

It seems to me that the problem in theFutureStream implementation.
complete method assumes that he calls some "client" callback(?):

proc complete*[T](future: FutureStream[T]) =
## Completes a ``FutureStream`` signalling the end of data.
future.finished = true
if not future.cb.isNil:
future.cb()

while "under pressure"(when a bunch of read callbacks still in a dispatcher queue) it calls some of scheduled read callback; in result the client code misses this peace of stream data(in our case this is 4KB of HTTP response body)
in the context of current issue FutureStream.complete proc is called by httpclient here

there is some controversial guard in read callback implementation:

if not resFut.finished:
resFut.complete(res)

which assumes that this callback can be called many times(?). while the comment above it says
# We don't want this callback called again.
future.cb = nil

anyway by removing the guard that mentioned above you can check that some of read callbacks are called twice

Error: unhandled exception: An attempt was made to complete a Future more than once. Details:
  Future ID: 3802

# first completion
Traceback (most recent call last)
hey.nim(25)              hey
asyncdispatch.nim(1658)  waitFor
asyncdispatch.nim(1515)  poll
asyncdispatch.nim(1281)  runOnce
asyncdispatch.nim(189)   processPendingCallbacks
asyncmacro.nim(36)       parseBody_continue
httpclient.nim(1008)     parseBodyIter
asyncstreams.nim(35)     complete
asyncstreams.nim(47)     :anonymous
asyncstreams.nim(99)     :anonymous
asyncfutures.nim(137)    complete

# second completion
Traceback (most recent call last)
hey.nim(25)              hey
asyncdispatch.nim(1658)  waitFor
asyncdispatch.nim(1515)  poll
asyncdispatch.nim(1281)  runOnce
asyncdispatch.nim(189)   processPendingCallbacks
asyncstreams.nim(47)     :anonymous
asyncstreams.nim(99)     :anonymous
asyncfutures.nim(137)    complete

one call from FutureStream#complete, one call from pending dispatcher queue.

ehmry added a commit to ehmry/Nim that referenced this issue Sep 24, 2018

AsyncHttpClient: queue pending read futures locally
Maintain a queue of pending read futures at the FutureStream object.
This keeps the code simple and allows readers to be completed in order,
but most often only a single reader will be queued at a time.

Ref nim-lang#7126
Fix nim-lang#8994
Fix nim-lang#9024

ehmry added a commit to ehmry/Nim that referenced this issue Sep 24, 2018

AsyncHttpClient: queue pending read futures locally
Maintain a queue of pending read futures at the FutureStream object.
This keeps the code simple and allows readers to be completed in order,
but most often only a single reader will be queued at a time.

Ref nim-lang#7126
Fix nim-lang#8994
Fix nim-lang#9004
Fix nim-lang#9024

ehmry added a commit to ehmry/Nim that referenced this issue Sep 24, 2018

AsyncHttpClient: queue pending read futures locally
Maintain a queue of pending read futures at the FutureStream object.
This keeps the code simple and allows readers to be completed in order,
but most often only a single reader will be queued at a time.

Ref nim-lang#7126
Fix nim-lang#8994
Fix nim-lang#9024
@chrisokuda

This comment has been minimized.

Copy link
Contributor

chrisokuda commented Sep 27, 2018

Is there any work-around at this time?

If not, was the previous bug a showstopper? This bug loses data non-deterministically for large-ish async requests, so it is definitely a showstopper for my usage of 0.19.0.

@Araq

This comment has been minimized.

Copy link
Member

Araq commented Sep 28, 2018

@mcintyrechris Sorry about this, but we intend to release 0.19.2 with the most pressing regressions fixed.

@ehmry

This comment has been minimized.

Copy link
Contributor

ehmry commented Sep 28, 2018

PR #9003 is a fix for this.

@skilchen

This comment has been minimized.

Copy link
Contributor

skilchen commented Sep 28, 2018

@mcintyrechris you could try if my proposal from above works as a temporary workaround until PR #9003 has been merged or another solution has been implemented:

add

await sleepAsync(1)

after Line 1002 in httpclient.nim so that you will have:

  when client is AsyncHttpClient:
    await sleepAsync(1)
    client.bodyStream.complete()
  else:
    client.bodyStream.setPosition(0)
@chrisokuda

This comment has been minimized.

Copy link
Contributor

chrisokuda commented Sep 28, 2018

@skilchen I gave it a shot (the await sleepAsync(1) line), but it did not fix the issue, I still got only a partial JSON blob for several requests. I also added a "sleeping".debugEcho just below the line above, to make sure it was actually using my code (I use choosenim so I wasn't sure I was editing in the right file location), and confirmed it was sleeping.

@Araq No worries, I can still use 0.18.0 (only disappointed I can't use the new features), I was just worried that with all the recent "newbies" to Nim from all the recent HN posts, I didn't want anyone to get frustrated right away and give up on Nim (this language is great for my usage). Personally, I think a non-deterministic data-loss bug is way worse than completely breaking "infinite streams" because I imagine more people will be using non-infinite requests most of the time, which means most people are possibly getting this bug and may not know it. For example, when donwloading a CSV blob, I am missing several lines but there is no way to know it except that my unit tests count the lines for a known request and tell me that I'm missing several. I do, however, understand that bugs are an expected part of new releases, so thanks to you and the team for working on them so quickly!

@dom96

This comment has been minimized.

Copy link
Member

dom96 commented Sep 28, 2018

@skilchen

This comment has been minimized.

Copy link
Contributor

skilchen commented Sep 28, 2018

The right thing to do, would be to come up with a solution similar to Python's http.client, i.e. expose the body as a readable stream as soon as the headers from the request have been received. Then it would be possible to read from that stream in whatever mode the client prefers, e.g. per line, per fixed blocksize or all at once.

@dom96

This comment has been minimized.

Copy link
Member

dom96 commented Oct 2, 2018

The right thing to do, would be to come up with a solution similar to Python's http.client, i.e. expose the body as a readable stream as soon as the headers from the request have been received.

Unless I'm terribly misunderstanding you, that's exactly what we have right now. The Response object contains a FutureStream which represents the body and you can read from it whenever you want. The only problem with this is that the stream gets filled even if nobody is reading from it.

@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Oct 2, 2018

I think we should revert this to last working point like @ehmry said. We can't seem to figure out how to fix it. We can fix the streaming stuff later when the right person comes along...

Most people don't want to stream HTTP get requests, but everyone wants to get the correct data back.

@ehmry

This comment has been minimized.

Copy link
Contributor

ehmry commented Oct 3, 2018

There is a fix at fd539d2 / #9003 that uses a Deque for serializing read futures.

@dm1try

This comment has been minimized.

Copy link

dm1try commented Oct 3, 2018

@treeform waiting for a right person is a good proposal =)

anyway I politely disagree with the suggestion about reverting. #9024 shows that the issue lies in the FutureStream implementation. It means that any client code that uses it will have the same problem.
ex. downloadFile will be broken. so the code based on the snippet that provided by you

import httpclient, asyncdispatch, json

proc test*() {.async.} =
  var client = newAsyncHttpClient()
  await client.downloadFile("https://raw.githubusercontent.com/treeform/chrono/master/tzdata/dstchanges.json", "my.json")
  echo parseJson(readFile("my.json"))

waitFor test()

will lead to the same error even on version 0.18

as a short-term solution, I suggest #9159 (or #9003 mentioned above many times)
in the long term, async core might be updated with some synchronization primitives(as example, some of them already implemented in asyncdispatch2)

dom96 added a commit that referenced this issue Oct 4, 2018

@dom96

This comment has been minimized.

Copy link
Member

dom96 commented Oct 4, 2018

Either I'm missing something fundamental or all of the attempted fixes so far managed to work around this bug instead of actually fixing it. See my fix here: #9183.

@treeform

This comment has been minimized.

Copy link
Contributor

treeform commented Oct 4, 2018

Wow your fix is amazing. Everything works.

dom96 added a commit that referenced this issue Oct 4, 2018

@Araq Araq closed this in 21ecf64 Oct 9, 2018

krux02 added a commit to krux02/Nim that referenced this issue Oct 15, 2018

Fixes nim-lang#8994. FutureStream read procedure data loss no longer …
…occurs. (nim-lang#9183)

* Fixes nim-lang#8994. FutureStream read procedure data loss no longer occurs.

* Optimises the fix for nim-lang#8994.

narimiran added a commit to narimiran/Nim that referenced this issue Oct 31, 2018

Fixes nim-lang#8994. FutureStream read procedure data loss no longer …
…occurs. (nim-lang#9183)

* Fixes nim-lang#8994. FutureStream read procedure data loss no longer occurs.

* Optimises the fix for nim-lang#8994.

narimiran added a commit to narimiran/Nim that referenced this issue Nov 1, 2018

Fixes nim-lang#8994. FutureStream read procedure data loss no longer …
…occurs. (nim-lang#9183)

* Fixes nim-lang#8994. FutureStream read procedure data loss no longer occurs.

* Optimises the fix for nim-lang#8994.

narimiran added a commit that referenced this issue Nov 1, 2018

Fixes #8994. FutureStream read procedure data loss no longer occurs. (#…
…9183)

* Fixes #8994. FutureStream read procedure data loss no longer occurs.

* Optimises the fix for #8994.

narimiran added a commit that referenced this issue Nov 1, 2018

Fixes #8994. FutureStream read procedure data loss no longer occurs. (#…
…9183)

* Fixes #8994. FutureStream read procedure data loss no longer occurs.

* Optimises the fix for #8994.

(cherry picked from commit 21ecf64)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment