Implement name conversion strategy for raw results files #5

zstumgoren · 2013-07-26T18:15:35Z

Create a module to standardize names of raw result files. Raw results will be stored on S3 using the standardized name.

Standardized names should:

be resolvable back to raw file names
encapsulate enough information about the contained results to link up to metadata via API

Naming Convention

See #4 for details on naming convention

Standardization should generate a composite file name that reflects metadata captured in our data admin.

File name components should include:

election date - YYYYMMDD
state - postal code
race type - general, primary-dem, primary runoff-dem, etc.
jurisdiction - OCD id of the jurisdiction, or geographic area, for which results are provided. For example, a file for MD that contains precinct-level results for Anne Arundel County could use a slugified version of the plain OCD name
race_code that denotes types of races covered in the data file. Optional element that should only be used when state provides data for single race in distinct file. For example, Louisiana provides precinct-level results, by parish, for each race. This field could also be expanded, on a state-by-state basis, to handle arbitrary groupings of results (e.g. separate files for state leg., federal, local).
reporting level - precinct, city, county, state, etc.
file type extension - db, csv, html, json, xml, etc.

Format

File name components separated by double underscores; component sub-parts separated by single underscores.

<YYYYMMDD>__<state>__<race_type>__<jurisdiction>__[<race_code>__]<level>.<ext>

Examples

Louisiana Congressional District 1 precinct level results, Jefferson Davis Parish
20121106__la__general__jefferson_davish_parish__cd_1__precinct.html

Allegeny County precincnt results for general election (contains multiple race types)
20121106__md__general__allegany_county__precinct.csv

Implementation

Standardized name should be generated during file download process (in state-specific fetch.py modules).

Each state directory should have a 2-column mappings.txt file that contains standardized name and link to raw result file. The raw link should point to result file located at source agency or to copy of raw file archived on S3. The latter would be used in cases where result files are not scrapable (e.g. if agency provided a database dump).

## mappings.txt ##
standard_name, raw_source_name
20121106__md__general__anne_arundel_county__precinct.csv, http://www.elections.state.md.us/elections/2012/election_data/Anne_Arundel_By_Precinct_2012_General.csv

The text was updated successfully, but these errors were encountered:

dwillis · 2013-07-26T18:20:35Z

Would specify that race_code should only be used when files are specific to single race.

zstumgoren · 2013-07-26T18:36:26Z

Yep, for time being that's way to go. If necessary down the road, we could expand its usage to account for arbitrary partitioning of results. For example, if a state partitioned results by race type into separate files for state leg, federal, local. Can't think of any examples of that right now, so we can deal with that on a state-by-state basis if need arises. Meantime, i'll tweak note next to the race_code field.

dwillis · 2013-08-09T22:00:11Z

What about where a file contains not a single jurisdiction but multiple ones? For example, MD's results by state legislative district are in a single file with all districts. Proposing something like: 20121106__md__general__state_legislative.csv

dwillis · 2013-08-09T22:43:23Z

Where in the process should the writing to mappings.txt occur? What happens when we run the fetcher again - should the script check the file to see if the mapping is already in there? If so, should we think about moving away from text files?

ghing · 2014-10-03T02:13:33Z

@zstumgoren, @dwillis It seems like this can be closed as it's been addressed by the Datasource API.

ghost assigned zstumgoren Jul 26, 2013

dwillis closed this as completed Oct 3, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement name conversion strategy for raw results files #5

Implement name conversion strategy for raw results files #5

zstumgoren commented Jul 26, 2013

dwillis commented Jul 26, 2013

zstumgoren commented Jul 26, 2013

dwillis commented Aug 9, 2013

dwillis commented Aug 9, 2013

ghing commented Oct 3, 2014

Implement name conversion strategy for raw results files #5

Implement name conversion strategy for raw results files #5

Comments

zstumgoren commented Jul 26, 2013

Naming Convention

Format

Examples

Implementation

dwillis commented Jul 26, 2013

zstumgoren commented Jul 26, 2013

dwillis commented Aug 9, 2013

dwillis commented Aug 9, 2013

ghing commented Oct 3, 2014