Scraper and parser of Cleveland City Council's records and the produced text.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
CityRecordTextExtractor
MiscTests
OrdExtractor
depts
parserator
pdfminer
.gitignore
.project
.pydevproject
Notesonin-orderparsing.txt
PDFBoxOutputTestandComments.txt
ParsedTextRequirements.txt
README.md
city-record-contents-formatting.md
records.py
rules.txt
test.txt
util.py

README.md

drocer

Cleveland City Council's records are available as PDFs from 1996-present.

As PDFs, there's many things that you cannot to do with them: search in multiple files at one time; easily extract data from them for further analysis; browse legislation by a particular topic; and more.

In short, PDFs make it very difficult for programmers, journalists, researchers to use the data inside them.

What's here:

The year folders (e.g. 2017) consist of plain text files extracted from PDFs of The City Record, Cleveland City Council's legislative notes The files are not recommended for reading. There are additional small errors like misspellings and improper spacing especially for lists. Tables and graphics in the City Record are also missing from these plain text files. Check the original PDFs or download them by year in bulk to verify if any information is missing.

Additional documentation is scattered at:

city-record-contents-formatting.md - boilerplate text that is used in the city record and what it means.

Learn more about our goals, project roadmap, and how you can help at:

http://www.opencleveland.org/projects/drocer/

https://github.com/opencleveland/drocer/wiki