Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for setting headers #63

Merged
merged 4 commits into from Apr 30, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Expand Up @@ -191,6 +191,14 @@ However, you can tell MetaInspector to allow these redirections with the option
# And this will allow HTTP => HTTPS ("safe") and HTTPS => HTTP ("unsafe") redirections
page = MetaInspector.new('facebook.com', :allow_redirections => :all)

### Headers

By default, no headers are set.
If you want to set custom headers then use the `headers` object:

# Set the User-Agent header
page = MetaInspector.new('facebook.com', :headers => {'User-Agent' => 'My custom User-Agent'})

### HTML Content Only

MetaInspector will try to parse all URLs by default. If you want to raise an exception when trying to parse a non-html URL (one that has a content-type different than text/html), you can state it like this:
Expand Down
7 changes: 5 additions & 2 deletions lib/meta_inspector/document.rb
Expand Up @@ -3,7 +3,7 @@
module MetaInspector
# A MetaInspector::Document knows about its URL and its contents
class Document
attr_reader :timeout, :html_content_only, :allow_redirections, :warn_level
attr_reader :timeout, :html_content_only, :allow_redirections, :warn_level, :headers

include MetaInspector::Exceptionable

Expand All @@ -14,18 +14,21 @@ class Document
# => allow_redirections: when :safe, allows HTTP => HTTPS redirections. When :all, it also allows HTTPS => HTTP
# => document: the html of the url as a string
# => warn_level: what to do when encountering exceptions. Can be :warn, :raise or nil
# => headers: object containing custom headers for the request
def initialize(initial_url, options = {})
options = defaults.merge(options)
@timeout = options[:timeout]
@html_content_only = options[:html_content_only]
@allow_redirections = options[:allow_redirections]
@document = options[:document]
@headers = options[:headers]
@warn_level = options[:warn_level]
@exception_log = options[:exception_log] || MetaInspector::ExceptionLog.new(warn_level: warn_level)
@url = MetaInspector::URL.new(initial_url, exception_log: @exception_log)
@request = MetaInspector::Request.new(@url, allow_redirections: @allow_redirections,
timeout: @timeout,
exception_log: @exception_log) unless @document
exception_log: @exception_log,
headers: @headers) unless @document
@parser = MetaInspector::Parser.new(self, exception_log: @exception_log)
end

Expand Down
5 changes: 4 additions & 1 deletion lib/meta_inspector/request.rb
Expand Up @@ -17,6 +17,7 @@ def initialize(initial_url, options = {})
@allow_redirections = options[:allow_redirections]
@timeout = options[:timeout]
@exception_log = options[:exception_log]
@headers = options[:headers]

response # as soon as it is set up, we make the request so we can fail early
end
Expand All @@ -43,7 +44,9 @@ def response
end

def fetch
request = open(url, {:allow_redirections => @allow_redirections})
options = {:allow_redirections => @allow_redirections}
options.merge(@headers) if @headers.is_a?(Hash)
request = open(url, options)

@url.url = request.base_uri.to_s

Expand Down
8 changes: 8 additions & 0 deletions spec/document_spec.rb
Expand Up @@ -89,4 +89,12 @@
tar_url.title
end
end

describe 'headers' do
it "should set the User-Agent header" do
headers = {'User-Agent' => 'metainspector v0.1'}
page_request = MetaInspector::Document.new('http://pagerankalert.com', headers: headers)
page_request.headers.should eq(headers)
end
end
end