Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User agent not set correctly when specifying a base url #290

Closed
dvdbng opened this issue Sep 3, 2015 · 10 comments · Fixed by #375
Closed

User agent not set correctly when specifying a base url #290

dvdbng opened this issue Sep 3, 2015 · 10 comments · Fixed by #375

Comments

@dvdbng
Copy link
Contributor

dvdbng commented Sep 3, 2015

Test script:

function main(splash)
  local url = 'http://httpbin.org/headers?nocache=' .. math.random()
  splash:set_user_agent('FooBar/1.0')
  assert(splash:go(url, 'http://httpbin.org'))
  assert(splash:wait(0.5))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

Uses Mozilla/5.0 instead of FooBar/1.0.

Specifying the user agent as a header works.

@pawelmhm
Copy link
Member

you're right, and it also seems like "accept" header is missing too, looks like no headers are set on request, simple test reproducing this:

pawelmhm@a84c3d7

Without baseurl there are "Accept" and "User-Agent" headers.

@pawelmhm
Copy link
Member

Aha! This is because user_agent is set on web_page object and when we pass "base_url" we never load url into webpage but we bypass webpage completely

@pawelmhm
Copy link
Member

hey @kmike @Youwotma seems like we dont set UA at all for all "manual" (not loaded via qwebpage) requests, e.g. if you do splash:http_get() or splash:http_post() request from Splash will not have UA. If you do go() request will have default Splash UA. Should we consider this a bug?

By default if you use HTTP API or Lua splash:go() without base url you get UA: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) splash Safari/538.1" so maybe we could set this as default UA for all requests.

@kmike
Copy link
Member

kmike commented Jan 22, 2016

I think it makes sense to use default headers in http_get and http_post.

@pawelmhm
Copy link
Member

and same for go with base url right? so we should have default UA for all those requests. I'm not sure how websites would handle reqeusts without any UA, perhaps no big deal but could be trouble in some cases?

@kmike
Copy link
Member

kmike commented Jan 22, 2016

yeah, right

@pawelmhm
Copy link
Member

and did you mean UA or all headers? Because QWebPage also adds "Accept: text/html" header, which is also missing

@kmike
Copy link
Member

kmike commented Jan 22, 2016

I think Accept: text/html is not a good default for http_get/http_post. E.g. requests library uses Accept: */* which makes more sense.

Re other header: I think it is fine to use headers set by splash:set_custom_headers in http_get/http_post, but there should be a way to override it just for a single request. Currently we only adding or replacing headers using headers argument; it works because default set of headers is empty.

@pawelmhm
Copy link
Member

there should be a way to override it just for a single request

I see in docs that there is request:set_header(name, value) so that would probably be the way to do it?

@pawelmhm
Copy link
Member

would be nice to have some settings where users could set default headers, but I see that #279 is stalled now, perhaps worth prioritizing it sounds like a really useful PR and lots of work already done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants