The Patent-Analytics project is a demonstration of using the HPCC Systems
HPCC platform to build an application to provide analysis of USPTO Patent
The data was obtaind by downloading the USPTO Patent Filings from the
Google repository. See:
The bibliography files are small and redundant, but they provide another
list so that I can check for completeness.
Optional early patents (back to 1921), estimate to be about 30 GBytes,
data is not compressed. This is very dirty data, from a OCR of paper
Currently using only the machine readable filings.