Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh http header is not followed #92

Closed
rbirkby opened this Issue Oct 24, 2011 · 10 comments

Comments

Projects
None yet
7 participants
@rbirkby
Copy link

rbirkby commented Oct 24, 2011

Some websites still send the (non-standard) Netscape-era Refresh HTTP header. This header is supported by all browsers (IE, Firefox, Chrome etc)

eg: Webkit's implementation can be found here:
http://trac.webkit.org/browser/trunk/Source/WebCore/platform/network/HTTPParsers.cpp#L106

The websites which send this Refresh HTTP header are not obscure sites which can be ignored. For example, I've received the refresh header from microsoft.com:

*   Trying 207.46.19.254... connected
* Connected to www.microsoft.com (207.46.19.254) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: www.microsoft.com
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Connection: Close
< Pragma: no-cache
< cache-control: no-cache
< Refresh: 0.1
< Content-Type: text/html; charset=iso-8859-1
< 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/strict.dtd">
<!-- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"> -->
<HTML>
<HEAD>
<META HTTP-EQUIV="Refresh" CONTENT="0.1">
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
<META HTTP-EQUIV="Expires" CONTENT="-1">
<TITLE></TITLE>
</HEAD>
<BODY><P></BODY>
</HTML>
@mikeal

This comment has been minimized.

Copy link
Member

mikeal commented Oct 24, 2011

What does the refresh header do?

Why does it not work in request? You should just be able to request({headers:{refresh:0.1}})

@rbirkby

This comment has been minimized.

Copy link
Author

rbirkby commented Oct 24, 2011

It's a response header, not a request header. See the transcript.

On 24 Oct 2011, at 18:55, Mikeal Rogers
reply@reply.github.com
wrote:

What does the refresh header do?

Why does it not work in request? You should just be able to request({headers:{refresh:0.1}})

Reply to this email directly or view it on GitHub:
#92 (comment)

@mikeal

This comment has been minimized.

Copy link
Member

mikeal commented Oct 24, 2011

so, refresh === Connection: close?

if that is the case then it'll need to be fixed in core.

@rbirkby

This comment has been minimized.

Copy link
Author

rbirkby commented Oct 24, 2011

I see it as similar to a 302 redirect. However, this only really makes sense for a Refresh with a time of 0. In the transcript above, should request pause for 0.1s before redirect/refreshing? That would seem a bit crazy.

@katiecrain

This comment has been minimized.

Copy link

katiecrain commented Nov 9, 2011

i'm having this issue as well.

i'm trying to crawl a site that has an interim loading page, the response header has Refresh that is not being followed. i've looked at the code, and i'm considering a work around for this, but would like to know if any attempt to fix this is in the works.

here's the debug infoz.

DEBUG: 200 http://www.mysite.com/blarg (response 44716)
DEBUG: | Date: Tue, 08 Nov 2011 19:32:11 GMT
DEBUG: | Server: Apache
DEBUG: | Vary: *
DEBUG: | Cache-control: max-age=86400
DEBUG: | Expires: Wed, 09 Nov 2011 19:32:11 GMT
DEBUG: | Refresh: 70; url=/finalresultspage
DEBUG: | Set-cookie: wosid=5KA7AdtwpyYFxtMtnDLtr0; version="1"; path=/WebObjects/mysite.woa,woinst=427; version="1"; path=/WebObjects/mysite.woa
DEBUG: | Content-length: 37900
DEBUG: | Content-type: text/html
DEBUG: | Connection: close

@pedrofaustino

This comment has been minimized.

Copy link

pedrofaustino commented Dec 31, 2012

@mikeal when you said "needs to be fixed in core" do you mean http?

@spollack

This comment has been minimized.

Copy link
Contributor

spollack commented Nov 21, 2013

any news on this one? handling of meta-refresh as a redirect within request would be fantastic.

@mikeal

This comment has been minimized.

Copy link
Member

mikeal commented Aug 27, 2014

Is this still an issue?

This is so old I'm closing, if it is actually still an issue just let me know and I'll re-open.

@mikeal mikeal closed this Aug 27, 2014

@polomoshnov

This comment has been minimized.

Copy link

polomoshnov commented Jun 27, 2016

@teebu

This comment has been minimized.

Copy link

teebu commented Sep 21, 2017

has any progress been made to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.