Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

N.B.: I'm currently going through, organising, and tidying my code as I upoad it. So this repo's still a work-in-progress which I'm updating (reasonably) regularly


Codebase for Phishalytics: the measurement infrastructure system I designed and built to research phishing and malware attacks on Twitter during my PhD studies at Royal Holloway, University of London.



phishalytics terminal screenshot

Interacting with Phishalytics is carried out via an SSH connection in a terminal window. The server-side interface uses GNU Screen. The Screenshot above shows Phishalytics during one of our measurement studies. The layout consists of 18 windows; 16 small and 2 large. The two larger windows display a development area and the system monitor (htop command showing CPU and RAM usage, top processes, etc).

The 16 smaller windows in the above screenshot, labelled s1 to s16, show the following:



  • gglsbl: Python client library for Google Safe Browsing Update API v4
  • tweepy: Python client library for Twitter API


Core Services

Used to run the main measurement study experiments:

Service name Description File
Twitter Stream Stream public tweets that contain URLs via Twitter's filter stream API and save into local database
Twitter Sample Stream Stream a small random sample of all public tweets via Twitter's sample stream API and save into local database
Update GSB Update our local copy of GSB blacklist using Safe Browsing Update API (v4)
Update Phishtank and Openphish Update our local copies of Openphish and Phishtank blacklists
Comprehensive GSB Twitter Lookup Looks up all tweeted URLs in GSB blacklist since measurement experiment began. Gets progresively slower as experiment duration increases
Fast GSB Twitter Lookup Looks up all tweeted URLs in GSB blacklist from past 24 hours (approx. 1 million)
Comprehensive PT and OP Twitter Lookup Looks up all tweeted URLs in both Openphish and Phishtank blacklists since measurement experiment began
Fast PT and OP Twitter Lookup Looks up all tweeted URLs in Openphish and Phishtank blacklists blacklists from past 24 hours (approx. 1 million)
GSB Timestamp Lookup Lookup timestamps for when URLs were added to GSB
Twitter Search API Lookup Determine when blacklisted URLs were first tweeted using Twitter's search API
Trending Hashtags Retrieve and save current global trending hashtags from Twitter's trends/place API. Uses WOEID=1 for global location.
Post Twitter Collection Processing Computes and saves metadata such as: lookup redirections chains, num URL hops, landing page URL, calculate Levenshtein distance, determine if trending hashtags used, etc.
Compare GSB Updates Calculate size of GSB blacklist on each update and across versions
Status Monitor Check everything is functioning correctly, check all feeds are live, etc. Send error notification emails to admin to alert if any problem
Trending Hashtags London Prints a list of currently trending hashtags in London, UK. Updates every 30 seconds

Other Services

Experimental setups, tests, supporting system, etc:

Service name Description File
ASCII Text Used at ISG open day stall to showcase my measurement infrastructure. Displays ASCII text of project title and authors in main GNU screen window, whilst experiments ran in other windows. Requires asciimatics
Bityl Click Stats Leverages the Bitly API to access click stats for Bitly URLs collected via Twitter's Stream API
CertStream Leverages CertStream-Python (library to see SSL certs as they're issued live) to create a dataset of potentially suspicious SSL domain certificates. For later verification with
CertStream Blacklist URL Lookup Check existing dataset of potentially suspicious SSL domain certificates for blacklist membership
Compare GSB URL Hash Prefixes Compares URL hash prefixes in GSB blacklist to determine hash collisions and unique URL hashes
Compare to Blacklists Used to count total number of blocked URLs that also appear in GSB, OP, or PT
Count Num Domains Count (and extract) total domain names in tweeted URLs dataset


Research papers that feature results obtained with Phishalytics:

Winner of the Best Paper and Best Student Paper awards:
BELL, S., AND KOMISARCZUK, P. "Measuring the Effectiveness of Twitter’s URL Shortener ( at Protecting Users from Phishing and Malware Attacks". In Proceedings of the Australasian Computer Science Week Multiconference (2020). Link to paper.

BELL, S., AND KOMISARCZUK, P. "An Analysis of Phishing Blacklists: Google Safe Browsing, OpenPhish, and PhishTank". In Proceedings of the Australasian Computer Science Week Multiconference (2020). Link to paper.

BELL, S., PATERSON, K., AND CAVALLARO, L. "Catch Me (On Time) If You Can: Understanding the Effectiveness of Twitter URL Blacklists". arXiv preprint arXiv:1912.02520 (2019). Link to paper.


No description, website, or topics provided.



No releases published


No packages published