Skip to content

Latest commit

 

History

History

analyzing-chicago-crimes

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Analyzing Chicago Crimes

In this recipe we'll learn how to analyze the Chicago Crimes dataset with Apache Pinot and Streamlit

Pinot Version 0.9.0
Schema config/schema.json
Table Config config/table.json
Ingestion Job config/job-spec.yml

Clone this repository and navigate to this recipe:

git clone git@github.com:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/analyzing-chicago-crimes

Download the Chicago Crimes dataset:

curl "https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD&bom=true&query=select+*" -o data/Crimes_-_2001_to_Present.csv

Setup Python environment:

pipenv shell
pipenv install

Clean up the data so that it's sorted by the Beat column:

python data_cleanup.py

Spin up a Pinot cluster using Docker Compose:

docker-compose up

Open another tab to add the crimes table:

docker exec -it manual-pinot-controller-chicago bin/pinot-admin.sh AddTable   \
  -tableConfigFile /config/table.json   \
  -schemaFile /config/schema.json \
  -exec

Import Chicago Crimes CSV file into Pinot:

docker exec -it manual-pinot-controller-chicago bin/pinot-admin.sh LaunchDataIngestionJob \
  -jobSpecFile /config/job-spec.yml \
  -values pinot-controller

Run Streamlit app:

streamlit run app.py

Navigate to http://localhost:8501/