Traptor -- A distributed Twitter feed
For Easy Integration of DataDog and LogFactory
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Master Ansible repository
Package for topic modelling plus automated text preprocessing
This is the Arabic light stemming-friendly snowball stemmer.
Kafka Consumer Lag Checking
Web interface to bumblebee app
Docker with polipo proxy and tor
Pics or it didn't happen. A robust library for rendering HTML to PNG/JPG using PhantomJS.
This repository hosts code and schema information related to the Memex Crawl Data Repository (CDR)
Tool to view the current Pulse Theme Bootstrap components, and easily modify the theme if necessary.
Simple webapp for mocking requests
Package to facilitate URL clustering
Access Java classes from Python
Automated regex generator for URL groups
A project to attempt to automatically login to a website given a single seed
Simple bootstrap for a basic storm development platform
source code to readthedocs.org
Memex Data Dashboard
Interactive Graph Visualization
Github repo for automated builds of phpmyadmin on docker hub.