PDF Liberation Hackathon - Federal Communications Commission Challenge
As part of regular business process, the Federal Communications Commission writes and releases many documents. These documents are public notices, rule-makings, proposed rules and many other prose based discussions of technical issues relating to spectrum, broadcasting, broadband, media and other communications issues. In general the legal industry has a need for these documents to not only contain the proper history, content and technical discussions, but also contain standard formatting that the legal industry has developed. This combination of content and formatting fundamentally requires the FCC to release PDF documents. These documents result in less than desirable search, retrieval and display.
For this challenge, the FCC is less concerned about data contained within PDFs and more interested in developing conversion techniques that result in a) the same exact text content in a text document (say markdown) and b) an ability to save the formatting (e.g. line formatting, legal footnoting, legal notation etc) which can be applied to said text document. We hope to generate a complete library of all FCC legal documents currently in PDF and restore these as fully web enabled documents