# pickup_dropoff_counts.pkl Data Explorer

In [1]:
import pickle

# Top level dictionary:
A list of keys representing a single state from the complete FAMAIL data model state-space: `(x_grid, y_grid, time_of_day, day_of_week)`
### Key index definition
- 0: x_grid index
- 1: y_grid index
- 2: time bucket $\in$ [0,287]
- 3: day-of-week index

In [2]:
pickup_dropoff_counts = pickle.load(open('../datasets/pickup_dropoff_counts.pkl', 'rb'))

pickup_dropoff_keys = list(pickup_dropoff_counts.keys())
print(f'Number of keys: {len(pickup_dropoff_keys)}')

for key, _ in zip(pickup_dropoff_counts, range(10)):
    print(key)

Number of keys: 7464960
(1, 1, 1, 1)
(1, 1, 1, 2)
(1, 1, 1, 3)
(1, 1, 1, 4)
(1, 1, 1, 5)
(1, 1, 1, 6)
(1, 1, 2, 1)
(1, 1, 2, 2)
(1, 1, 2, 3)
(1, 1, 2, 4)


# Each (x_grid, y_grid, time, day) key contains:
The number of pickups and dropoffs for the state, aggregated from `taxi_record_07_50drivers.pkl`, `taxi_record_08_50drivers.pkl`, and `taxi_record_09_50drivers.pkl`.

### Example
`(x_grid, y_grid, time, day)` $\rarr$ `(pickups, dropoffs)` <br>
`(4, 22, 201, 2)` $\rarr$ `(20, 14)`

### Key Space Details

**Dimensions**:
- x_grid: 48 unique values (latitude dimension)
- y_grid: 90 unique values (longitude dimension)
- time: 288 unique values (5-min buckets, 1-indexed)
- day: 6 unique values (Mon-Sat, 1-indexed)

**Theoretical Maximum**: 48 × 90 × 288 × 6 = 7,464,960 possible keys

**Observed Coverage** (sparse):
- Non-zero keys: ~234,000 (~3.1% of maximum)
- Indicates sparse activity: most spatiotemporal cells have no events

### Count Distribution

**Pickup Counts per Cell** (observed):
- Mean: ~1.39 pickups per non-zero cell
- Median: 1 pickup
- Max: Several hundred (at high-traffic locations)
- Distribution: Heavy right tail (typical for count data)

### Value index definition
- Key[0]: number of pickups
- Key[1]: number of dropoffs

In [7]:
val_element_list = ['Number of pickups', 'Number of dropoffs']

print(f'For example key {pickup_dropoff_keys[0]}')
for i, value in enumerate(pickup_dropoff_counts[pickup_dropoff_keys[0]]):
    print(f'Index [{0}] = {pickup_dropoff_counts[pickup_dropoff_keys[0]][i]} -- {val_element_list[i]}')

For example key (1, 1, 1, 1)
Index [0] = 0 -- Number of pickups
Index [0] = 0 -- Number of dropoffs
