# Momma Tables.where() and all her little predicates

Filtering datatables is done withe the .where() method for which there are a whole slew of "predicates" that indicate the filtering condition. As always, you learn best by practicing.

In [1]:
# Load the required modules
import numpy as np
from datascience import * 

## Philadelphia Services
To keep things interesting, we'll take a look at a dataset available on [OpenDataPhilly](https://opendataphilly.org/datasets/) that contains services available in the city.

In [2]:
path = 'data/'
data = path + "philly_services.csv"
services = Table.read_table(data)
services

FileNotFoundError: [Errno 2] No such file or directory: 'philly_services.csv'

## Predicates
The .where() method of datatables takes a variety of "predicates," which are express the condition you wish to use to filter the data.

## Services in the Temple zip code

Temple's zip code ia 19122. Let's find the services within the Temple zip code.

The general format of the .where() methods is .where(column_name, predicate), so we want to filter on the 'Zip Code' column and use the predicate are.equal_to(). A new table is returned with just the rows that match our condition.

In [None]:
temple_zipcode = services.where('Zip Code', are.equal_to(19122))
temple_zipcode

**Challenge:** Try finding all services with a zipcode of 19107

## Looking for Food Services
Now suppose we wanted all the services where "Food" is part of the Category column. We need a different predicate. are.containing() works when the specified column contains strings and looks to see if it contains the desired substring. To see if the text in the "Category" column contains the subtring "Food" we do this:

In [None]:
food = services.where('Category', are.containing("Food"))
food

**Challenge:** Find the places serving food that serve children

## Chaining multiple conditions
Suppose we want to find just the food services in the Temple zipcode. You can "chain" the where conditions, applying each to the result of the previous.

In [None]:
services.where('Zip Code', are.equal_to(19122)).where('Category', are.containing("Food"))

## Filtering by locations
The dataset include a column "LatLon," but we really need two separate columns: one with the latitude and one with the longitude. To split this column requires a few advanced tricks we haven't covered yet, so I'm going to just show you how I did it with some comments as explanation, but you should come back to this after we learn about how to define function.

Note: This technique may prove useful in the future is you use other datasets from OpenDataPhilly

In [None]:
# Create a function that splits a string on the comma and returns either the first or second piece.
def string_split(string, col):
    return float(string.split(',')[col])

# Vectorize the function so that we can apply it to an numpy array and it will operate on every element of the array
v_string_split = np.vectorize(string_split)

In [None]:
# Split the stings, creating two new numpy array
lat_lon = services.column('LatLon')
lon = v_string_split(lat_lon, 0)
lat = v_string_split(lat_lon, 1)

In [None]:
# We chain three methods:
# First, drop the old LatLon column, then add each of the new columns
services = services.drop('LatLon').with_column('Lat', lat).with_column('Lon', lon)

## Making a Map
Following [this example](https://inferentialthinking.com/chapters/08/5/Bike_Sharing_in_the_Bay_Area.html?highlight=map) in our textbook, let's use the new Lat Lon columns to make a map showing the locations of these services.

You can zoom in, zoom out, pan, or click on a marker to see the label.

In [None]:
# Marker.map_table(stations.select('lat', 'long', 'name').relabel('name', 'labels'))
Marker.map_table(services.select('Lat', 'Lon', 'Organization Name').relabel('Organization Name', 'labels'))

## All services North of Temple

The coordinates of the Temple Bell Tower are: 39.9813° N, 75.1544° W
Let's use the where filter to find only the services north of the bell tower.

In [None]:
bell_tower_lat = 39.9813

northern_services = services.where('Lat', are.above(bell_tower_lat))
Marker.map_table(northern_services.select('Lat', 'Lon', 'Organization Name').relabel('Organization Name', 'labels'))

**Challenge:** Plot just the sevices in the Temple zipcode.

# Summary
I hope you had fun. The where() method is not that hard once you understand the basic idea. I encourage you to explore this data set and see if you can discover any interesting patterns!