Skip to content

Loading…

Refresh http header is not followed #92

Closed
rbirkby opened this Issue · 8 comments

5 participants

@rbirkby

Some websites still send the (non-standard) Netscape-era Refresh HTTP header. This header is supported by all browsers (IE, Firefox, Chrome etc)

eg: Webkit's implementation can be found here:
http://trac.webkit.org/browser/trunk/Source/WebCore/platform/network/HTTPParsers.cpp#L106

The websites which send this Refresh HTTP header are not obscure sites which can be ignored. For example, I've received the refresh header from microsoft.com:

*   Trying 207.46.19.254... connected
* Connected to www.microsoft.com (207.46.19.254) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: www.microsoft.com
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Connection: Close
< Pragma: no-cache
< cache-control: no-cache
< Refresh: 0.1
< Content-Type: text/html; charset=iso-8859-1
< 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/strict.dtd">
<!-- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"> -->
<HTML>
<HEAD>
<META HTTP-EQUIV="Refresh" CONTENT="0.1">
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
<META HTTP-EQUIV="Expires" CONTENT="-1">
<TITLE></TITLE>
</HEAD>
<BODY><P></BODY>
</HTML>
@mikeal
request member

What does the refresh header do?

Why does it not work in request? You should just be able to request({headers:{refresh:0.1}})

@rbirkby
@mikeal
request member

so, refresh === Connection: close?

if that is the case then it'll need to be fixed in core.

@rbirkby

I see it as similar to a 302 redirect. However, this only really makes sense for a Refresh with a time of 0. In the transcript above, should request pause for 0.1s before redirect/refreshing? That would seem a bit crazy.

@katiecrain

i'm having this issue as well.

i'm trying to crawl a site that has an interim loading page, the response header has Refresh that is not being followed. i've looked at the code, and i'm considering a work around for this, but would like to know if any attempt to fix this is in the works.

here's the debug infoz.

DEBUG: 200 http://www.mysite.com/blarg (response 44716)
DEBUG: | Date: Tue, 08 Nov 2011 19:32:11 GMT
DEBUG: | Server: Apache
DEBUG: | Vary: *
DEBUG: | Cache-control: max-age=86400
DEBUG: | Expires: Wed, 09 Nov 2011 19:32:11 GMT
DEBUG: | Refresh: 70; url=/finalresultspage
DEBUG: | Set-cookie: wosid=5KA7AdtwpyYFxtMtnDLtr0; version="1"; path=/WebObjects/mysite.woa,woinst=427; version="1"; path=/WebObjects/mysite.woa
DEBUG: | Content-length: 37900
DEBUG: | Content-type: text/html
DEBUG: | Connection: close

@pedrofaustino

@mikeal when you said "needs to be fixed in core" do you mean http?

@spollack

any news on this one? handling of meta-refresh as a redirect within request would be fantastic.

@mikeal
request member

Is this still an issue?

This is so old I'm closing, if it is actually still an issue just let me know and I'll re-open.

@mikeal mikeal closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.