Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Python
tree: 2aac43ce0a

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
burp
scripts
weka
.gitignore
README.md
distribute_setup.py
error_log_fetcher.txt
process_urls.py
setup.py
validate_urls.py

README.md

BURP

The Better URL Reputation Platform

By Khalid Aziz, Peter Li, Christopher Moran, and Ethan Romba

Installation

  1. Install the BURP fork of python-whois

  2. Install Weka 3.7.7+

  3. Add your Weka installation directory to your CLASSPATH (instructions)

  4. Execute the following command:

    python setup.py install
    

Usage

BURP

Run BURP from the command line, passing the URL you would like to classify:

burp [URL]

HTML Analyzer

The BURP HTML analyzer is optimized for retrieving and analyzing HTML from URLs:

from burp.html import HTMLAnalyzer
analyzer = HTMLAnalyzer(url)
analysis = analyzer.analyze()
...
analyzer.loadUrl(url2)
analysis2 = analyzer.analyze()
...

To analyze an HTML string directly, be sure to call the setUrl() method with the URL where the HTML originated from:

from burp.html import HTMLAnalyzer
html = '<html>Hello World!</html>'
analyzer = HTMLAnalyzer()
analyzer.loadHtml(html)
analyzer.setUrl('http://www.example.com')
analysis = analyzer.analyze()

The analyze() method returns a dictionary with the following format:

{
  "numCharacters":                Int,
  "percentWhitespace":            Float,
  "percentScriptContent":         Float,
  "numIframes":                   Int,
  "numScripts":                   Int,
  "numScriptsWithWrongExtension": Int,
  "numEmbeds":                    Int,
  "numObjects":                   Int,
  "numSuspiciousObjects":         Int,
  "numHyperlinks":                Int,
  "numMetaRefresh":               Int,
  "numHiddenElements":            Int,
  "numSmallElements":             Int,
  "hasDoubleDocuments":           Bool,
  "numUnsafeIncludedUrls":        Int,
  "numExternalUrls":              Int,
  "percentUnknownElements":       Float
}

Running the HTML Test Suite

python setup.py test
Something went wrong with that request. Please try again.