In [1]:
# HIDDEN
Base.displaysize() = (5, 80)
using DataFrames
using CSV

## Granularity

The granularity of your data is what each record in your data represents. For example, in the Calls dataset each record represents a single case of a police call.

In [3]:
# HIDDEN
calls = CSV.read("data/calls_julia.csv")

Unnamed: 0_level_0,Day,CASENO,OFFENSE,CVLEGEND,BLKADDR
Unnamed: 0_level_1,String,Int64,String,String,String⍰
1,Sunday,17091420,BURGLARY AUTO,BURGLARY - VEHICLE,2500 LE CONTE AVE
2,Sunday,17038302,BURGLARY AUTO,BURGLARY - VEHICLE,BOWDITCH STREET & CHANNING WAY
3,Sunday,17049346,THEFT MISD. (UNDER $950),LARCENY,2900 CHANNING WAY
4,Sunday,17091319,THEFT MISD. (UNDER $950),LARCENY,2100 RUSSELL ST
5,Sunday,17044238,DISTURBANCE,DISORDERLY CONDUCT,TELEGRAPH AVENUE & DURANT AVE
⋮,⋮,⋮,⋮,⋮,⋮


In the Stops dataset, each record represents a single incident of a police stop.

In [4]:
# HIDDEN
stops = CSV.read("data/stops_julia.csv")

Unnamed: 0_level_0,Incident Number,Call Date/Time,Location,Incident Type
Unnamed: 0_level_1,String,Dates…,String,String
1,2015-00004825,2015-01-26T00:10:00,SAN PABLO AVE / MARIN AVE,T
2,2015-00004829,2015-01-26T00:50:00,SAN PABLO AVE / CHANNING WAY,T
3,2015-00004831,2015-01-26T01:03:00,UNIVERSITY AVE / NINTH ST,T
4,2015-00004848,2015-01-26T07:16:00,2000 BLOCK BERKELEY WAY,1194
5,2015-00004849,2015-01-26T07:43:00,1700 BLOCK SAN PABLO AVE,1194
⋮,⋮,⋮,⋮,⋮


On the other hand, we could have received the Stops data in the following format:

In [19]:
# HIDDEN
using Dates
stops[!, :Call_Date] = Date.(stops[:, Symbol("Call Date/Time")])
by(stops, :Call_Date, :Call_Date => length)

Unnamed: 0_level_0,Call_Date,Call_Date_length
Unnamed: 0_level_1,Date,Int64
1,2015-01-26,46
2,2015-01-27,57
3,2015-12-04,41
4,2015-01-28,56
5,2015-11-06,52
⋮,⋮,⋮


In this case, each record in the table corresponds to a single date instead of a single incident. We would describe this table as having a coarser granularity than the one above. It's important to know the granularity of your data because it determines what kind of analyses you can perform. Generally speaking, too fine of a granularity is better than too coarse; while we can use grouping and pivoting to change a fine granularity to a coarse one, we have few tools to go from coarse to fine.

## Granularity Checklist

You should have answers to the following questions after looking at the granularity of your datasets. We will answer them for the Calls and Stops datasets.

**What does a record represent?**

In the Calls dataset, each record represents a single case of a police call. In the Stops dataset, each record represents a single incident of a police stop.

**Do all records capture granularity at the same level? (Sometimes a table will contain summary rows.)**

Yes, for both Calls and Stops datasets.

**If the data were aggregated, how was the aggregation performed? Sampling and averaging are are common aggregations.**

No aggregations were performed as far as we can tell for the datasets. We do keep in mind that in both datasets, the location is entered as a block location instead of a specific address.

**What kinds of aggregations can we perform on the data?**

For example, it's often useful to aggregate individual people to demographic groups or individual events to totals across time.

In this case, we can aggregate across various granularities of date or time. For example, we can find the most common hour of day for incidents with aggregation. We might also be able to aggregate across event locations to find the regions of Berkeley with the most incidents.