This uses regular expressions (in php, but can be any language) get data from the NYC EDC newsletters
PHP
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md
getdatafromEDCnewsletters.php
reg_ex.md
results_sample.txt

README.md

NYCEDC Newsletter Prose Data Scraper

This uses regex (in php, but can be any language) get data from the NYC EDC newsletters

See script run.

Process:

First, we extreacted the text from the PDF files using a Mac "Get Text" tool to extract the data. This was for expedency; this was originally intended to run in Ruby against text returned from Tabula that was not converted into charts.

Second, A set of regular expressions was written (and then converted to PHP) to convert the data of textual indicators in the monthly report to a csv file output format that can be useful to the EDC team and larger community.

We analyzed the discrepencies in descriptions from year to year (to account for the changes in decsriptions/summaries, Coverage included 2005-2013.

Thanks for the opportunity.