DSL-ify the tarantula API #9

glv opened this Issue Jun 21, 2009 · 2 comments


None yet
1 participant

glv commented Jun 21, 2009

The way tests configure the Tarantula crawler is way too Java-like. It should be more declarative and Rubyish. This will make Tarantula easier to use and document, and also easier to extend and enhance. (The fact that the refactoring described in #8 causes an incompatible change to the Tarantula API is due to the low level of abstraction of the current API.)


glv commented Sep 4, 2009

Here are some examples of what a revised API might look like.

This is not a promise to implement all of these features. I think all of these feature ideas would be useful, but I already know some of them will be very costly and difficult to implement. I'm including them all here because I want to come up with an API syntax and style that will accommodate a lot of different things.

I don't yet know whether I'll maintain the current association with test cases; it seems that it might be better to have a standalone tarantula_config.rb file or something like that, with a custom Tarantula runner that doesn't depend on test/unit or RSpec.

Basic crawl, starting from '/'


Basic crawl, starting from '/' and '/admin'

Tarantula.crawl('both') do |t|
  t.root_page '/'
  t.root_page '/admin'

Crawl with the Tidy handler

# the operand to the crawl method, if supplied, will be used
# as the tab label in the report.
Tarantula.crawl("tidy") do |t|
  t.add_handler :tidy

Reorder requests on the queue

This is necessary to fix this bug

Tarantula.crawl do |t|
  # Treat the following controllers as "resourceful",
  # reordering appropriately (see my comment on
  # <http://github.com/relevance/tarantula/issues#issue/3>)
  t.resources 'post', 'comment'

  # For the 'news' controller, order the actions this way:
  t.reorder_for 'news', :actions => %w{show read unread mark_read}

  # For the 'history' controller, supply a comparison function:
  t.reorder_for 'history', :compare => lambda{|x, y| ... }

(Unlike most of the declarations in this example document, these will
need to be reusable across multiple crawl blocks somehow.)

Selectively allowing errors

Tarantula.crawl("ignoring not-found users") do |t|
  t.allow_errors :not_found, %r{/users/\d+/}
  # or
  t.allow_errors :not_found, :controller => 'users', :action => 'show'


Tarantula.crawl("attacks") do |t|
  t.attack :xss, :input => "<script>gotcha!</script>", :output => :input
  t.attack :sql_injection, :input => "a'; DROP TABLE posts;"
  t.times_to_crawl 2

We should have prepackaged attack suites that understand various techniques.

Tarantula.crawl("xss suite") do |t|
  t.attack :xss, :suite => 'standard'

Tarantula.crawl("sql injection suite") do |t|
  t.attack :sql_injection, :suite => 'standard'


Tarantula.crawl do |t|
  t.times_to_crawl 2
  t.stop_after 2.minutes


Tarantula.crawl do |t|
  # :valid input uses SQL types and knowledge of model validations
  # to attempt to generate valid input.  You can override the defaults.
  t.fuzz_with :valid_input do |f|
    f.fuzz(Post, :title) { random_string(1..40) }
    f.fuzz(Person, :phone) { random_string("%03d-%03d-%04d") }

  # The point of fuzzing is to keep trying a lot of things to 
  # see if you can find breakage.
  t.crawl_for 45.minutes

Tarantula.crawl do |t|
  # :typed_input uses SQL types to generate "reasonable" but probably
  # invalid input (e.g., numeric fields will get strings of digits, 
  # but they'll be too large or negative; date fields will get dates,
  # but very far in the past or future; string fields will get very 
  # large strings.)
  t.fuzz_with :typed_input
  t.crawl_for 30.minutes

Tarantula.crawl do |t|
  # :random_input just plugs in random strings everywhere.
  t.fuzz_with :random_input
  t.crawl_for 2.hours

glv commented Sep 26, 2009

As of commit a99e859, a very basic version of this is in place on the 'dslify' branch.

There is a new command, tarantula, that works like this:

$ tarantula <filenames>

It processes all the files, runs the crawls as directed, and exits with a nonzero status if there were any problems. (A rake task wrapper is not in place yet.)

The configuration language is very simple. Here's an example:

fixtures :all


crawl('admin') do |t|
  t.post '/session', :login => 'admin', :password => 'admin'

  t.root_page '/admin'

Under the covers, things are still based on test/unit, to piggy-back on the Rails integration test support. The configuration object passed into the crawl block is a testcase, so that things like post and follow_redirect! can still work. The particulars of that will probably change, but I expect the basic idea to remain; part of the strength of Tarantula is that it works within the app using the integration testing interface, and there's no sense rewriting all of that to be independent of test/unit. The goal of this new interface is to get all of that out of the developers' faces, rather than to eliminate it altogether.

At the moment, some things are hacked together, it won't work under Ruby 1.9, and test coverage is weak. I plan to fix those problems first, and then start fleshing out the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment