Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Updated README

  • Loading branch information...
commit b9b3a86e9aed7d83c3e2d26c235dd78776c49335 1 parent f645d53
@eromba eromba authored
Showing with 77 additions and 0 deletions.
  1. +77 −0 README.md
View
77 README.md
@@ -0,0 +1,77 @@
+BURP
+====
+
+The Better URL Reputation Platform
+
+By Khalid Aziz, Peter Li, Christopher Moran, and Ethan Romba
+
+Installation
+------------
+
+1. Install the [BURP fork of python-whois](https://github.com/eecs-354-burp/python-whois)
+
+2. Install [Weka](http://www.cs.waikato.ac.nz/ml/weka/) 3.7.7+
+
+3. Add your Weka installation directory to your CLASSPATH ([instructions](http://weka.wikispaces.com/CLASSPATH))
+
+4. Execute the following command:
+
+ python setup.py install
+
+Usage
+-----
+
+### BURP
+
+Run BURP from the command line, passing the URL you would like to classify:
+
+ burp [URL]
+
+### HTML Analyzer
+
+The BURP HTML analyzer is optimized for retrieving and analyzing HTML from URLs:
+
+ from burp.html import HTMLAnalyzer
+ analyzer = HTMLAnalyzer(url)
+ analysis = analyzer.analyze()
+ ...
+ analyzer.loadUrl(url2)
+ analysis2 = analyzer.analyze()
+ ...
+
+To analyze an HTML string directly, be sure to call the `setUrl()` method with the URL where the HTML originated from:
+
+ from burp.html import HTMLAnalyzer
+ html = '<html>Hello World!</html>'
+ analyzer = HTMLAnalyzer()
+ analyzer.loadHtml(html)
+ analyzer.setUrl('http://www.example.com')
+ analysis = analyzer.analyze()
+
+The `analyze()` method returns a dictionary with the following format:
+
+ {
+ "numCharacters": Int,
+ "percentWhitespace": Float,
+ "percentScriptContent": Float,
+ "numIframes": Int,
+ "numScripts": Int,
+ "numScriptsWithWrongExtension": Int,
+ "numEmbeds": Int,
+ "numObjects": Int,
+ "numSuspiciousObjects": Int,
+ "numHyperlinks": Int,
+ "numMetaRefresh": Int,
+ "numHiddenElements": Int,
+ "numSmallElements": Int,
+ "hasDoubleDocuments": Bool,
+ "numUnsafeIncludedUrls": Int,
+ "numExternalUrls": Int,
+ "percentUnknownElements": Float
+ }
+
+Running the HTML Test Suite
+---------------------------
+
+ python setup.py test
+
Please sign in to comment.
Something went wrong with that request. Please try again.