You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Standardization should generate a composite file name that reflects metadata captured in our data admin.
File name components should include:
election date - YYYYMMDD
state - postal code
race type - general, primary-dem, primary runoff-dem, etc.
jurisdiction - OCD id of the jurisdiction, or geographic area, for which results are provided. For example, a file for MD that contains precinct-level results for Anne Arundel County could use a slugified version of the plain OCD name
race_code that denotes types of races covered in the data file. Optional element that should only be used when state provides data for single race in distinct file. For example, Louisiana provides precinct-level results, by parish, for each race. This field could also be expanded, on a state-by-state basis, to handle arbitrary groupings of results (e.g. separate files for state leg., federal, local).
reporting level - precinct, city, county, state, etc.
file type extension - db, csv, html, json, xml, etc.
Format
File name components separated by double underscores; component sub-parts separated by single underscores.
Louisiana Congressional District 1 precinct level results, Jefferson Davis Parish
20121106__la__general__jefferson_davish_parish__cd_1__precinct.html
Allegeny County precincnt results for general election (contains multiple race types)
20121106__md__general__allegany_county__precinct.csv
Implementation
Standardized name should be generated during file download process (in state-specific fetch.py modules).
Each state directory should have a 2-column mappings.txt file that contains standardized name and link to raw result file. The raw link should point to result file located at source agency or to copy of raw file archived on S3. The latter would be used in cases where result files are not scrapable (e.g. if agency provided a database dump).
Yep, for time being that's way to go. If necessary down the road, we could expand its usage to account for arbitrary partitioning of results. For example, if a state partitioned results by race type into separate files for state leg, federal, local. Can't think of any examples of that right now, so we can deal with that on a state-by-state basis if need arises. Meantime, i'll tweak note next to the race_code field.
What about where a file contains not a single jurisdiction but multiple ones? For example, MD's results by state legislative district are in a single file with all districts. Proposing something like: 20121106__md__general__state_legislative.csv
Where in the process should the writing to mappings.txt occur? What happens when we run the fetcher again - should the script check the file to see if the mapping is already in there? If so, should we think about moving away from text files?
Create a module to standardize names of raw result files. Raw results will be stored on S3 using the standardized name.
Standardized names should:
Naming Convention
See #4 for details on naming convention
Standardization should generate a composite file name that reflects metadata captured in our data admin.
File name components should include:
election date
- YYYYMMDDstate
- postal coderace type
- general, primary-dem, primary runoff-dem, etc.jurisdiction
- OCD id of the jurisdiction, or geographic area, for which results are provided. For example, a file for MD that contains precinct-level results for Anne Arundel County could use a slugified version of the plain OCD namerace_code
that denotes types of races covered in the data file. Optional element that should only be used when state provides data for single race in distinct file. For example, Louisiana provides precinct-level results, by parish, for each race. This field could also be expanded, on a state-by-state basis, to handle arbitrary groupings of results (e.g. separate files for state leg., federal, local).reporting level
- precinct, city, county, state, etc.file type extension
- db, csv, html, json, xml, etc.Format
File name components separated by double underscores; component sub-parts separated by single underscores.
Examples
Implementation
Standardized name should be generated during file download process (in state-specific fetch.py modules).
Each state directory should have a 2-column mappings.txt file that contains standardized name and link to raw result file. The raw link should point to result file located at source agency or to copy of raw file archived on S3. The latter would be used in cases where result files are not scrapable (e.g. if agency provided a database dump).
The text was updated successfully, but these errors were encountered: