No content returned for no-cache pages #120

Open
thetylerhayes opened this Issue Sep 9, 2013 · 5 comments

Comments

Projects
None yet
5 participants

It looks like the issue is that the assumption in @arikfr's workaround — that bytesAvailable will return > 0 when parsing no-cache pages (because this means Ghost.py won't find this QNetworkReply in the cache object) — is incorrect. As best I can tell, there's no data in the cache and bytesAvailable == 0. Here's debug logging when trying ghost.open('https://twitter.com'):

DEBUG:ghost:Ghost: [PySide.QtCore.QUrl('https://twitter.com/')] bytesAvailable()= 0
INFO:ghost:Ghost: Resource loaded: https://twitter.com/ 200
DEBUG:ghost:Ghost: [PySide.QtCore.QUrl('https://abs.twimg.com/a/1378257772/t1/css/t1_core_logged_out.bundle.css')] bytesAvailable()= 0
INFO:ghost:Ghost: Resource loaded: https://abs.twimg.com/a/1378257772/t1/css/t1_core_logged_out.bundle.css 200
DEBUG:ghost:Ghost: [PySide.QtCore.QUrl('https://abs.twimg.com/a/1378257772/t1/css/t1_more.bundle.css')] bytesAvailable()= 0
INFO:ghost:Ghost: Resource loaded: https://abs.twimg.com/a/1378257772/t1/css/t1_more.bundle.css 200

This may have been what @arikfr was talking about in his fix (#79) when he discussed the async nature of Qt potentially being a problem.

If anybody else can take a look at this, it'd be a huge bit of awesome. I dug around for a while (#33) but came up empty-handed.

Contributor

arikfr commented Oct 4, 2013

@thetylerhayes can you try opening the following pages and report if you experience the same issue:

  1. www.etsy.com
  2. https://twitter.com/arikfr

@arikfr I included quite a bit of extra reporting for multiple sites in #33, as I mentioned. I've also already moved on to using other projects to accomplish my goals because the wait was too long. Feel free to close.

Has there been any progress on this? The page.content is still returning "None" when the no-cache on my side as of today, and would love to know if there is a fix.

Owner

jeanphix commented Feb 7, 2014

@ChrisTruncer still no progress on this, if you want to get the current frame content, you can use: https://github.com/jeanphix/Ghost.py/blob/master/ghost/ghost.py#L502

I found some code that seems to be helpful here: https://gitorious.org/qtwebkit/performance/source/50304e2aaf75529f4891d6a67d0f24c1773d98a8:host-tools/mirror/main.cpp

I've forked the repo and added the relavent code. It's not pretty and it's not passing all the tests, but I have been able to use this to read a bunch of no-cache pages. My diffs:
rfmcpherson/Ghost.py@jeanphix:master...master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment