Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also compare across forks.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.
base fork: eecs-354-burp/BURP
base: 2aac43ce0a
...
head fork: eecs-354-burp/BURP
compare: 053014fdc2
  • 2 commits
  • 3 files changed
  • 0 commit comments
  • 1 contributor
Showing with 82 additions and 75 deletions.
  1. +82 −0 README.md
  2. +0 −36 burp/html/.gitignore
  3. +0 −39 burp/url/.gitignore
View
82 README.md
@@ -0,0 +1,82 @@
+BURP
+====
+
+The Better URL Reputation Platform
+
+By Khalid Aziz, Peter Li, Christopher Moran, and Ethan Romba
+
+Installation
+------------
+
+1. Install [Weka](http://www.cs.waikato.ac.nz/ml/weka/) 3.7.7+
+
+2. Follow [these instructions](http://weka.wikispaces.com/CLASSPATH) to add the weka.jar file to your CLASSPATH
+
+3. Install the [BURP fork of python-whois](https://github.com/eecs-354-burp/python-whois):
+
+ git clone https://github.com/eecs-354-burp/python-whois
+ cd python-whois
+ python setup.py install
+
+4. Install BURP:
+
+ git clone https://github.com/eecs-354-burp/BURP
+ cd BURP
+ python setup.py install
+
+Usage
+-----
+
+### BURP
+
+Run BURP from the command line, passing the URL you would like to classify:
+
+ burp [URL]
+
+### HTML Analyzer
+
+The BURP HTML analyzer is optimized for retrieving and analyzing HTML from URLs:
+
+ from burp.html import HTMLAnalyzer
+ analyzer = HTMLAnalyzer(url)
+ analysis = analyzer.analyze()
+ ...
+ analyzer.loadUrl(url2)
+ analysis2 = analyzer.analyze()
+ ...
+
+To analyze an HTML string directly, be sure to call the `setUrl()` method with the URL where the HTML originated from:
+
+ from burp.html import HTMLAnalyzer
+ html = '<html>Hello World!</html>'
+ analyzer = HTMLAnalyzer()
+ analyzer.loadHtml(html)
+ analyzer.setUrl('http://www.example.com')
+ analysis = analyzer.analyze()
+
+The `analyze()` method returns a dictionary with the following format:
+
+ {
+ "numCharacters": Int,
+ "percentWhitespace": Float,
+ "percentScriptContent": Float,
+ "numIframes": Int,
+ "numScripts": Int,
+ "numScriptsWithWrongExtension": Int,
+ "numEmbeds": Int,
+ "numObjects": Int,
+ "numSuspiciousObjects": Int,
+ "numHyperlinks": Int,
+ "numMetaRefresh": Int,
+ "numHiddenElements": Int,
+ "numSmallElements": Int,
+ "hasDoubleDocuments": Bool,
+ "numUnsafeIncludedUrls": Int,
+ "numExternalUrls": Int,
+ "percentUnknownElements": Float
+ }
+
+Running the HTML Test Suite
+---------------------------
+
+ python setup.py test
View
36 burp/html/.gitignore
@@ -1,36 +0,0 @@
-*.py[cod]
-
-# C extensions
-*.so
-
-# Packages
-*.egg
-*.egg-info
-dist
-build
-eggs
-parts
-bin
-var
-sdist
-develop-eggs
-.installed.cfg
-lib
-lib64
-
-# Installer logs
-pip-log.txt
-
-# Unit test / coverage reports
-.coverage
-.tox
-nosetests.xml
-
-# Translations
-*.mo
-
-# Mr Developer
-.mr.developer.cfg
-.project
-.pydevproject
-
View
39 burp/url/.gitignore
@@ -1,39 +0,0 @@
-# ubuntu saves
-*~
-
-*.py[cod]
-
-# C extensions
-*.so
-
-# Packages
-*.egg
-*.egg-info
-dist
-build
-eggs
-parts
-bin
-var
-sdist
-develop-eggs
-.installed.cfg
-lib
-lib64
-
-# Installer logs
-pip-log.txt
-
-# Unit test / coverage reports
-.coverage
-.tox
-nosetests.xml
-
-# Translations
-*.mo
-
-# Mr Developer
-.mr.developer.cfg
-.project
-.pydevproject
-

No commit comments for this range

Something went wrong with that request. Please try again.