Skip to content
The video OCR processor for Richmond Sunlight.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Richmond Sunlight Video Processor

The video OCR processor for Richmond Sunlight.

Maintainability Build Status


This downloads video from the Virginia General Assembly's floor-session video archive and subjects it to various types of analysis. At this writing, that includes OCRing the on-screen chyrons, facial recognition, and closed-caption extraction. To come: voice pitch analysis and improved facial recognition.


The video processor was put together, piece by piece, over a decade, as a series of Bash and PHP scripts. This is an effort to consolidate those, and turn them into their own project. At the moment, it's still a series of Bash and PHP scripts, lashed together with twine, but isolating them as their own project will make it easier to standardize them and improve ment.


It lives on a compute-optimized EC2 instance. Source updates are delivered via Travis CI -> S3, which the instance pulls updates from on boot. (Note that the includes/ directory is pulled from the deploy branch of repository on each build.) The instance is stopped by default, and only started once rs-machine identifies a new video's availability. rs-machine communicates this information via SQS, though it fires up the rs-video-processor EC2 instance directly. rs-video-processor grabs the first entry from SQS to run through its processing pipeline, and continues to loop over available SQS entries so long as they exist. When the queue is finished, it shuts itself down.

You can’t perform that action at this time.