Ronin Web is a Ruby library for Ronin that provides support for web scraping and spidering functionality.
Ruby
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
data/ronin/web
lib/ronin
spec
.document
.gitignore
.rspec
.ruby-version
.travis.yml
.yardopts Enable yard-parameters. Jun 19, 2013
COPYING.txt
ChangeLog.md
Gemfile
README.md
Rakefile
gemspec.yml
ronin-web.gemspec

README.md

Ronin Web

Build Status

Description

{Ronin::Web} is a Ruby library for Ronin that provides support for web scraping and spidering functionality.

Features

  • HTML/XML parsing/building (using Nokogiri).
  • Automated Web Browsing (using Mechanize)
  • Provides popular User Agent strings.
  • Integrates Spidr into {Ronin::Web::Spider}.
  • Provides {Ronin::Web::Server}, a Sinatra based Web Server.
  • Provides {Ronin::Web::Proxy}, a Sinatra based Web Proxy.

Synopsis

Start the Ronin console with Ronin Web preloaded:

$ ronin-web

Examples

Get a web-page:

Web.get('http://www.rubyinside.com/')

Get only the body of the web-page:

Web.get_body('http://www.rubyinside.com/')

Get a Mechanize agent:

agent = Web.agent

Parse HTML:

Web.html(open('some_file.html'))
# => <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# <html>
#   <head>
#     <script type="text/javascript" src="redirect.js"></script>
#   </head>
# </html>

Build a HTML document:

doc = Web.build_html do
  html {
    head {
      script(:type => 'text/javascript', :src => 'redirect.js')
    }
  }
end

puts doc.to_html
# <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# <html><head><script src="redirect.js" type="text/javascript"></script></head></html>

Parse XML:

Web.xml(some_text)
# => <?xml version="1.0"?>
# <users>
#   <user>
#     <name>admin</name>
#     <password>0mni</password>
#   </user>
# </users>

Build a XML document:

doc = Web.build_xml do
  playlist {
    mp3 {
      file { text('02 THE WAIT.mp3') }
      artist { text('Evil Nine') }
      track { text('The Wait feat David Autokratz') }
      duration { text('1000000000') }
    }
  }
end

puts doc.to_xml
# <?xml version="1.0"?>
# <playlist>
#   <mp3>
#     <file>02 THE WAIT.mp3</file>
#     <artist>Evil Nine</artist>
#     <track>The Wait feat David Autokratz</track>
#     <duration>1000000000</duration>
#   </mp3>
# </playlist>

Spider a web site:

Web::Spider.host('www.example.com') do |spider|
  spider.every_url do |url|
    # ...
  end

  spider.every_page do |page|
    # ...
  end
end

Serve files via a Web Server:

require 'ronin/web/server'

Web.server do
  file '/opensearch.xml', '/tmp/test.xml'
  directory '/downloads/', '/tmp/downloads/'
end

Web.server.get '/test' do
  'Test 1 2 1 2'
end

Requirements

Install

$ gem install ronin-web

Edge

$ git clone git://github.com/ronin-ruby/ronin-web.git
$ cd ronin-web/
$ bundle install
$ ./bin/ronin-web

License

Ronin Web - A Ruby library for Ronin that provides support for web scraping and spidering functionality.

Copyright (c) 2006-2013 Hal Brodigan (postmodern.mod3 at gmail.com)

This file is part of Ronin Web.

Ronin is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Ronin is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Ronin. If not, see http://www.gnu.org/licenses/.