Exchange#finished returns true before response body is ready #197

ttilberg · 2021-09-28T15:30:15Z

Hello, I'm trying to intercept the response body of an XHR request. After looking through the spec files, I saw that Ferrum keeps track of all of the network traffic, which is convenient for this case. (I tried using on('Network.responseReceived') hooks, but couldn't work it out.)

While waiting for the correct Exchange to be finished? I kept running into Ferrum::BrowserError: No data found for resource with given identifier. Example code:

    response = Timeout.timeout(15) do
      # Find ajax request for search results
      until xhr = browser.network.traffic.find { |exchange| exchange.request.url =~ /search\/bySize/ }
        print '.'
        sleep 0.2
      end
      # Wait for response to complete
      until xhr.finished?
        print 'x'
        sleep 0.2
      end
      xhr.response.body
    end

After some research, I've learned that instead you should wait for Network.loadingFinished before trying to get the response body.

I wanted to highlight a few notes in Ferrum's API as I wonder if Exchange#finished? should wait for that event, or if the Response object c/should be more clever. (Or both).

# network.rb #subscribe
# ...
      @page.on("Network.responseReceived") do |params|
        if exchange = select(params["requestId"]).last
          response = Network::Response.new(@page, params)
          exchange.response = response
        end
      end

      @page.on("Network.loadingFinished") do |params|
        exchange = select(params["requestId"]).last
        if exchange && exchange.response
          exchange.response.body_size = params["encodedDataLength"]
        end
      end

# exchange.rb
# ...
      def finished?
        blocked? || response || error
      end

Note how the exchange is finished? when it has a response object. But also notice that this response attribute is assigned in the Network.responseReceived hook, not the Network.loadingFinished hook. There is, however, a property that gets set at Network.loadingFinished, body_size, so I tried keying off that:

    response = Timeout.timeout(15) do
      # Find ajax request for search results
      until xhr = browser.network.traffic.find { |exchange| exchange.request.url =~ /search\/bySize/ }
        print '.'
        sleep 0.2
      end
      # Wait for response to complete
      until xhr&.response&.body_size
        print 'x'
        sleep 0.2
      end
      xhr.response.body
    end

With this change, this code no longer raises the noted exception, though the API is being used in a strange way.

I feel that Exchange#finished? should account for the response being fully loaded and prepared to query. Maybe this isn't sensible due to existing usages for #finished? and things like streamed content. So, worst case, maybe something new?
I wonder about an attribute Response#finished? or loaded? or ready?
- I also wonder if the api for querying the response.body should wait for an attribute to signal that the response is finished loading, leveraging the standard Ferrum timeouts, similar to how other calls on the browser block until CDP has reported it's ready.

The text was updated successfully, but these errors were encountered:

route · 2021-10-21T05:49:31Z

Great research! I'm open to all options. Everything you said sounds correct, we just need to take only one direction.

As for me I usually start to play with code and then after a few implementations see a solution, so I cannot tell right now what's better or best.

Are you going to work on a PR?

ttilberg · 2021-10-21T15:07:14Z

If you don't know of a reason not to, I'll change the #finished? flag to represent when the response is ready, not just received. PR to come.

route · 2021-10-21T15:14:19Z

Ok! Sounds right

hkmaly · 2021-12-27T04:30:39Z

Grrr ... I tried to use it on something else and got Ferrum::BrowserError: No resource with given identifier found ... AND when I tried to use inspect, I found the resource was loaded from cache, although it's not mentioned anywhere in that response object (all from*Cache are false).

rmarot · 2022-02-11T23:07:35Z

@ttilberg I am also facing this issue, do you still plan to open a PR on your suggestions (which look good !) ? Thanks !

ttilberg · 2022-02-12T06:35:35Z

I was having a hard time figuring out how to create the tests to set up the delayed payload, and I kind of got discouraged and tabled it. Please feel free!

fixes #197

…309) fixes #197

fixes #197

…310) fixes #197

ttilberg mentioned this issue Oct 21, 2021

Document how to download files #169

Closed

route added a commit that referenced this issue Nov 20, 2022

fix: finished? should return true only when response is fully loaded

c5408be

fixes #197

route mentioned this issue Nov 20, 2022

fix: finished? should return true only when response is fully loaded #309

Merged

route closed this as completed in #309 Nov 20, 2022

route added a commit that referenced this issue Nov 20, 2022

fix: finished? should return true only when response is fully loaded (#…

81ab27a

…309) fixes #197

route added a commit that referenced this issue Nov 20, 2022

fix: Setting content directly does not allow for use of DOM queries

c45d1af

fixes #197

route added a commit that referenced this issue Nov 20, 2022

fix: Setting content directly does not allow for use of DOM queries

8486d1a

fixes #197

route added a commit that referenced this issue Nov 20, 2022

fix: Setting content directly does not allow for use of DOM queries

3d6ffae

fixes #197

route added a commit that referenced this issue Nov 20, 2022

fix: Setting content directly does not allow for use of DOM queries (#…

277f04e

…310) fixes #197

ttilberg mentioned this issue Feb 7, 2023

Add support for browser.on(:response). #294

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exchange#finished returns true before response body is ready #197

Exchange#finished returns true before response body is ready #197

ttilberg commented Sep 28, 2021 •

edited

Loading

route commented Oct 21, 2021 •

edited

Loading

ttilberg commented Oct 21, 2021

route commented Oct 21, 2021

hkmaly commented Dec 27, 2021

rmarot commented Feb 11, 2022

ttilberg commented Feb 12, 2022

Exchange#finished returns true before response body is ready #197

Exchange#finished returns true before response body is ready #197

Comments

ttilberg commented Sep 28, 2021 • edited Loading

route commented Oct 21, 2021 • edited Loading

ttilberg commented Oct 21, 2021

route commented Oct 21, 2021

hkmaly commented Dec 27, 2021

rmarot commented Feb 11, 2022

ttilberg commented Feb 12, 2022

ttilberg commented Sep 28, 2021 •

edited

Loading

route commented Oct 21, 2021 •

edited

Loading