Skip to content

An extensive data cleaning/engineering project. Final result used to create /coloradoplot dashboard.

Notifications You must be signed in to change notification settings

ryayoung/crime-data-engineering

Repository files navigation

Crime data engineering

(Combined notebook code here - auto-generated file with all notebooks combined)

The goal is to create a single pair of datasets (one by county, one by district) that can be used by a web front-end (docs & source here, live app here) to visualize a wide variety of geocoded public colorado data on a map. The resulting data has ~350 geocoded metrics for each county for 8 consecutive years, and ~140 geocoded metrics for each school district.

The problem

  • Public data for Colorado on data.colorado.gov comes from a variety of sources.
  • It is NOT clean
  • It does NOT have consistently formatted keys upon which we can join
  • Most of it is NOT yet geocoded
  • Each dataset has some sort of version of a 'county' or 'district' column, but they are formatted differently

This project will:

  • Create a consistent naming format for counties and districts and a way to repeatedly convert new columns to this format
  • Get geographic boundary vectors and center coordinates for each district and each county
  • Clean and engineer source data for analysis/visualization
  • Join all data for counties and districts respectively by year, including geographic vectors

To run this project yourself, simply follow the notebooks in numeric order, 1-7

About

An extensive data cleaning/engineering project. Final result used to create /coloradoplot dashboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published