Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Implement name conversion strategy for raw results files #5
Create a module to standardize names of raw result files. Raw results will be stored on S3 using the standardized name.
Standardized names should:
See #4 for details on naming convention
Standardization should generate a composite file name that reflects metadata captured in our data admin.
File name components should include:
File name components separated by double underscores; component sub-parts separated by single underscores.
Standardized name should be generated during file download process (in state-specific fetch.py modules).
Each state directory should have a 2-column mappings.txt file that contains standardized name and link to raw result file. The raw link should point to result file located at source agency or to copy of raw file archived on S3. The latter would be used in cases where result files are not scrapable (e.g. if agency provided a database dump).
Yep, for time being that's way to go. If necessary down the road, we could expand its usage to account for arbitrary partitioning of results. For example, if a state partitioned results by race type into separate files for state leg, federal, local. Can't think of any examples of that right now, so we can deal with that on a state-by-state basis if need arises. Meantime, i'll tweak note next to the