# Community Space Distribution and Attendance


## Introduction
In an era of less face-to-face interaction with increasing digital connectivity, the sense of community fostered by public spaces and gatherings gradually fades. Community Centers are one avenue of preservation nonetheless -- being public venues for socialization and recreational activity, their active function within a neighborhood can signify a close-knit community. Here, we seek to analyze the distribution of these centers as well as their respective attendance counts as a metric of community.

### Datasets Used
- [Community Center Attendance (WRDOC)](https://data.wprdc.org/dataset/daily-community-center-attendance)

Details the attendance for a number of community centers throughout Pittsburgh.

- [City of Pittsburgh Facilities (WRDOC)](https://data.wprdc.org/dataset/city-of-pittsburgh-facilities/resource/fbb50b02-2879-47cd-abea-ae697ec05170)

Lists a number of public facilities around Pittsburgh.

#### Density of Community Spaces
We will begin by importing pandas and geopandas to analyze the datasets.

In [56]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import fpsnippets
from shapely.geometry import Point
import seaborn as sns

# Loading neighborhood shape file.
neighborhoods = gpd.read_file('Datasets/neighborhoods/Neighborhoods_.shp')
neighborhoods.head(10)

Unnamed: 0,objectid,fid_blockg,statefp10,countyfp10,tractce10,blkgrpce10,geoid10,namelsad10,mtfcc10,funcstat10,...,shape_ar_1,page_numbe,plannerass,created_us,created_da,last_edite,last_edi_1,Shape__Are,Shape__Len,geometry
0,1,0.0,42,3,40500,1,420030405001,Block Group 1,G5030,S,...,7843108.0,15,Derek Dauphin,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,7842997.0,11525.904546,"POLYGON ((-79.95304 40.44203, -79.95302 40.442..."
1,2,1.0,42,3,40400,1,420030404001,Block Group 1,G5030,S,...,13904630.0,15,Derek Dauphin,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,13904690.0,20945.56257,"POLYGON ((-79.95455 40.45882, -79.95427 40.458..."
2,3,2.0,42,3,40200,2,420030402002,Block Group 2,G5030,S,...,5999801.0,15,Derek Dauphin,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,5998649.0,18280.484515,"POLYGON ((-79.96230 40.44294, -79.96220 40.442..."
3,4,3.0,42,3,30500,2,420030305002,Block Group 2,G5030,S,...,7202139.0,15,Derek Dauphin,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,7203337.0,15697.914337,"POLYGON ((-79.98275 40.44641, -79.98273 40.446..."
4,5,5.0,42,3,20300,1,420030203001,Block Group 1,G5030,S,...,16947850.0,15,Andrea Lavin Kossis,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,16948750.0,24019.532672,"POLYGON ((-79.97494 40.45629, -79.97484 40.456..."
5,6,6.0,42,3,20100,4,420030201004,Block Group 4,G5030,S,...,17846690.0,15,Derek Dauphin,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,17845910.0,23034.929056,"POLYGON ((-79.99238 40.44484, -79.99233 40.444..."
6,7,7.0,42,3,262000,1,420032620001,Block Group 1,G5030,S,...,17550590.0,15,Stephanie Joy Everett,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,17543400.0,18197.706073,"POLYGON ((-79.99761 40.47460, -79.99761 40.474..."
7,8,8.0,42,3,261500,1,420032615001,Block Group 1,G5030,S,...,25220620.0,15,Stephanie Joy Everett,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,25224180.0,26390.538103,"POLYGON ((-80.01456 40.47727, -80.01462 40.477..."
8,9,10.0,42,3,261200,1,420032612001,Block Group 1,G5030,S,...,12232020.0,15,Stephanie Joy Everett,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,12232530.0,20906.829409,"POLYGON ((-80.01087 40.50097, -80.01073 40.499..."
9,10,11.0,42,3,260900,1,420032609001,Block Group 1,G5030,S,...,8739570.0,15,Stephanie Joy Everett,pgh.admin,2019-10-23T14:17:16.403Z,pgh.admin,2019-10-23T14:17:16.403Z,8727371.0,13757.331946,"POLYGON ((-80.00327 40.48271, -80.00326 40.482..."


With our libraries available for use, we will import our City of Pittsburgh Facilities dataset and name it "facilities". 

In [15]:
facilities = pd.read_csv('Datasets/city-facilities.csv')
facilities.head(3)

Unnamed: 0,_id,id,parcel_id,inactive,name,rentable,type,primary_user,address_number,street,...,neighborhood,council_district,ward,tract,public_works_division,pli_division,police_zone,fire_zone,latitude,longitude
0,1,650726265,120-J-300,f,57th Street Park Building,f,Storage,Department of Public Works,,57TH ST,...,Upper Lawrenceville,7,10.0,42003101100,2.0,10.0,2.0,3-5,40.485666,-79.94645
1,2,783044037,2-H-284,f,Albert Turk Graham Park Shelter,f,Shelter,Department of Public Works,39.0,VINE ST,...,Crawford-Roberts,6,3.0,42003030500,3.0,3.0,2.0,2-1,40.440458,-79.984104
2,3,1997158435,23-R-157,f,Allegheny Northside Senior Center and Hazlett ...,t,Senior,CitiParks,5.0,ALLEGHENY SQ E,...,Allegheny Center,1,22.0,42003562700,1.0,22.0,1.0,1-6,40.453099,-80.005343


You will note, from a preliminary list of 3 rows, that the data contains a number of columns as well as types of facilities irrelevant to the overarching metric. We would like to sift through the dataset and target recreational spaces instead. To understand what data we work with, we will first extract a list of facilty types to determine how to splice the set.

In [4]:
# List the different possible names for the 'type' column
print(facilities['type'].unique())

['Storage' 'Shelter' 'Senior' 'Pool' 'Utility' 'Activity' 'Restrooms'
 'Service' 'Concession' 'Dugout' 'Pool/Rec' 'Rec Center' 'Office'
 'Pool Closed' 'Firehouse' 'Community' 'Vacant' 'Cabin' 'Medic Station'
 'Training' 'Police' 'Salt Dome' 'Recycling' 'SERVICE' 'STORAGE' 'POLICE'
 'TRAINING' 'OFFICE']



We can now see what types of facilities the dataset has provided us! To determine the density of community spaces, we will extract data entries deemed types 'Pool', 'Pool/Rec', 'Rec Center', and 'Community' to their own library. 

To do this, we will create a query mask of our terms and apply them.


In [5]:
spaces = ['Pool', 'Pool/Rec', 'Rec Center', 'Community']
typemask = facilities['type'].isin(spaces)
filtered = facilities[typemask]
filtered.sample(3)

Unnamed: 0,_id,id,parcel_id,inactive,name,rentable,type,primary_user,address_number,street,...,neighborhood,council_district,ward,tract,public_works_division,pli_division,police_zone,fire_zone,latitude,longitude
369,370,410766333,25-R-106-0-1,f,West Penn Recreation Center,t,Pool/Rec,CitiParks,450.0,30TH ST,...,Polish Hill,7,6.0,42003060500,6.0,6.0,2.0,2-6,40.455652,-79.969544
40,41,73475635,96-G-1,f,Brookline Pool Guard Room,f,Pool,CitiParks,1400.0,OAKRIDGE ST,...,Brookline,4,32.0,42003320600,5.0,32.0,6.0,4-26,40.390749,-80.007548
41,42,2070467175,96-G-1,f,Brookline Recreation Center,t,Rec Center,CitiParks,1400.0,OAKRIDGE ST,...,Brookline,4,32.0,42003320600,5.0,32.0,6.0,4-26,40.39125,-80.008373


This query mask has allowed us to create a filtered data frame of community spaces. Still, to tidy up the data even further, let's form an entirely new frame with only the data we need -- we will include the name of the community space as well as its type and coordinates.

In [21]:
facil = pd.DataFrame(columns = ['Name', 'Type', 'Latitude', 'Longitude'])
facil['Name'] = filtered['name']
facil['Type'] = filtered['type']
facil['Latitude'] = filtered['latitude']
facil['Longitude'] = filtered['longitude']

print(len(facil))
facil.head(10)


41


Unnamed: 0,Name,Type,Latitude,Longitude
3,Ammon Recreation Center,Pool,40.448735,-79.977856
6,Arlington Pool Restrooms,Pool,40.417753,-79.974688
16,Banksville Pool Building and Shelter,Pool,40.414784,-80.040176
19,Bloomfield Pool and Recreation Center,Pool/Rec,40.462133,-79.953589
29,Brighton Heights Pool Building,Pool,40.486857,-80.030997
30,Brighton Heights Pool Guard Room,Pool,40.486976,-80.031012
39,Brookline Pool Building,Pool,40.390651,-80.007492
40,Brookline Pool Guard Room,Pool,40.390749,-80.007548
41,Brookline Recreation Center,Rec Center,40.39125,-80.008373
47,Burgwin Pool Building,Pool,40.405,-79.937155


This is a much cleaner representation! We will now seek to determine the distribution of these 41 facilities to see which neighborhoods contain more spaces.

In [66]:
## try to use long and lat when I come back to this bc zip code is too broad
nlist = []
counts = []
for index, row in facil.iterrows():
    lat = row["Latitude"]
    lon = row["Longitude"]
    nlist.append(fpsnippets.geo_to_neighborhood(lat, lon))

Density = pd.DataFrame(nlist, columns = ["Neighborhood"])
neighbors = Density["neighborhood"].value_counts().head(20)


neighbors.drop(0, axis = 0, inplace=True)
neighbors_plot = sns.barplot(
    data = neighbors
    x = 'Neighborhood',
    y = 'Number of commmunity spaces')

SyntaxError: invalid syntax (<ipython-input-66-636ef4bce81d>, line 15)