# City Maps

---

Overall we want to...

1. get location data in the form of points
2. Filter the locations to johnson county
3. Assign variables to the extracted data
4. Visualize the data using maps and our dimensions

## Setup

Ensure the libraries have been installed and can be found without errors:

In [26]:
import numpy as np
import pandas as pd
import geopandas as gpd
import json
from shapely.geometry import Point

## Data Acquisition

---

**Google Takeout**: Our parsed location data

*Parsing* (@todo)

For each datapoint we want the following attributes:

1. Johnson county
2. weekend
3. night
4. 'business hours' (M-F, 8-5:30)
5. (not required) Campus building
6. Year of school

In [22]:
locations_f = '../data/Takeout/Location History Parsed.csv'
locations_df = pd.read_csv(locations_f)
locations_df.head()

Unnamed: 0,timestamp,latitude,longitude
0,2018-10-11 19:08:31.352000,30.235755,-97.781692
1,2018-10-11 19:07:21.212000,30.235777,-97.781689
2,2018-10-11 19:05:21.175000,30.235777,-97.781689
3,2018-10-11 19:03:21.147000,30.235777,-97.781689
4,2018-10-11 19:01:20.980000,30.235777,-97.781689


In [36]:
### Function to format our coordinates dataframe to Shapely/GPS points

"""
From GeoPandas docs:

df['Coordinates'] = list(zip(df.Longitude, df.Latitude))
df['Coordinates'] = df['Coordinates'].apply(Point)
gdf = geopandas.GeoDataFrame(df, geometry='Coordinates')
"""

def convert_coords(coords_df):
    # Create a copy so we don't mutate original dataframe
    copy_df = coords_df.copy()
    
    # Zip the lat/lngs and create points from the ensuing list
    copy_df['coordinates'] = list(zip(copy_df.longitude, copy_df.latitude))
    copy_df['coordinates'] = copy_df['coordinates'].apply(Point)
    
    # Create and return a geodataframe
    gdf = gpd.GeoDataFrame(copy_df, geometry='coordinates')
    # Set projections
    gdf = gdf.to_crs(epsg=4326)
    return gdf

coord_df = convert_coords(locations_df)
coord_df.head()

Unnamed: 0,timestamp,latitude,longitude,coordinates
0,2018-10-11 19:08:31.352000,30.235755,-97.781692,POINT (-97.78169229999999 30.2357549)
1,2018-10-11 19:07:21.212000,30.235777,-97.781689,POINT (-97.7816885 30.2357765)
2,2018-10-11 19:05:21.175000,30.235777,-97.781689,POINT (-97.7816885 30.2357765)
3,2018-10-11 19:03:21.147000,30.235777,-97.781689,POINT (-97.7816885 30.2357765)
4,2018-10-11 19:01:20.980000,30.235777,-97.781689,POINT (-97.7816885 30.2357765)


---

**County Files**: GIS

- Read in files using Geopandas
- Take a peak at county names
- filter down to Johnson County (var)
- function to Take a lat/lng point and return a boolean if it's contained within the Johnson County Polygon

In [33]:
counties_f = '../data/counties/county.shp'
counties_df = gpd.read_file(counties_f)

print (counties_df.columns.tolist())
print ('Projection:\t{}'.format(counties_df.crs))
counties_df.head()

['AREA', 'PERIMETER', 'COUNTY_', 'COUNTY_ID', 'CO_NUMBER', 'CO_FIPS', 'ACRES_SF', 'ACRES', 'FIPS', 'COUNTY', 'ST', 'geometry']
Projection:	{'init': 'epsg:26915'}


Unnamed: 0,AREA,PERIMETER,COUNTY_,COUNTY_ID,CO_NUMBER,CO_FIPS,ACRES_SF,ACRES,FIPS,COUNTY,ST,geometry
0,1523795000.0,193975.5,2,1,60,119,376536.4,376538.0,19119,Lyon,IA,"POLYGON ((209020.1228789756 4822673.786538936,..."
1,1034539000.0,130929.6,3,2,72,143,255639.1,255640.2,19143,Osceola,IA,"POLYGON ((268707.9952103209 4820315.573744762,..."
2,1045942000.0,131290.5,4,3,30,59,258456.8,258457.9,19059,Dickinson,IA,"POLYGON ((306957.0767378386 4819138.180611289,..."
3,1042278000.0,130956.8,5,4,32,63,257551.4,257552.5,19063,Emmet,IA,"POLYGON ((345203.3379296674 4818218.465900645,..."
4,1707005000.0,172890.3,6,5,3,5,421808.4,421810.2,19005,Allamakee,IA,"POLYGON ((612312.4661374119 4817345.066712167,..."


In [32]:
# Function to reproject IA Counties and extract Johnson County
def reproject_and_extract_counties(gdf):
    johnson = gdf[gdf.COUNTY == 'Johnson']
    proj_johnson = johnson.to_crs(epsg=4326)
    return proj_johnson

# Save a reference to Johnson County (contains IC) to filter location points
johnson_cnty = reproject_and_extract_counties(counties_df)
johnson_cnty

Unnamed: 0,AREA,PERIMETER,COUNTY_,COUNTY_ID,CO_NUMBER,CO_FIPS,ACRES_SF,ACRES,FIPS,COUNTY,ST,geometry
60,1612965000.0,177786.3,62,61,52,103,398570.7,398572.4,19103,Johnson,IA,POLYGON ((-91.83149274268779 41.86183093260607...


In [49]:
# Merge points
def join_and_filter_coords(county, coords):
    # keep all of the coords contained within the single county geodataframe
    cnty_coords = gpd.sjoin(county, coords, how='inner', op='contains')
    
    # Slim down the columns, as the join adds the county datum
    cnty_coords = cnty_coords[['timestamp', 'latitude', 'longitude']]
    
    return cnty_coords

johnson_cnty_coords = join_and_filter_coords(johnson_cnty, coord_df)

print ('Old coords:\t{}'.format(len(coord_df)))
print ('New coords:\t{}'.format(len(johnson_cnty_coords)))
johnson_cnty_coords.head()

Old coords:	644421
New coords:	571882


Unnamed: 0,timestamp,latitude,longitude
60,2016-07-27 15:35:24,41.683614,-91.504861
60,2017-03-17 17:24:27,41.683686,-91.503358
60,2016-07-27 15:37:24,41.683698,-91.50488
60,2016-07-27 15:38:25,41.68371,-91.504871
60,2016-07-27 15:28:39.467000,41.683757,-91.504225


In [50]:
# Write out to file
johnson_cnty_coords.to_csv('test.csv', index=False)