# Notebook 2: Data Wrangling & Cleaning

Now that we successfully connected to the Flickr API and tested how to wrangle the data for our test park--MacArthur Park--the next step was to apply this process to eight (8) parks across Los Angeles that we selected based on a Flickr search using the following criteria: 1) Over 50 photos associated with the park; 2) Variety of topics found in the tags from a brief review; and 3) Diverse locations of parks throughout the city. 

This notebook is broken into four sections:
1. Connect to the Flickr API
2. Data Wrangling: All Parks
3. Data Wrangling: Individual Parks 
4. Coding Methodology

## Connect to the Flickr API

In [1]:
#The first step was to once again call the API function

import flickrapi
import json

api_key = u'f950122b83b682c546201f10d33edffe'
api_secret = u'057c65cd7fe1b2c8'

#flickr = flickrapi.FlickrAPI(api_key, api_secret)
#for json format
flickr = flickrapi.FlickrAPI(api_key, api_secret, format='parsed-json')

Scaling up our data wrangling and cleaning process, we reference our list of eight pre-selected parks to search by tags identifying each park and then save the output into a dataframe.

## Data Wrangling: All Parks

In [2]:
import pandas as pd

def get_parks(num_pages):
    park_list = []
    for i in range(1, num_pages+1): #range documentation starts at 0, +1 ensures we pull the page number we feed the function below
        extras = 'geo,description,tags'
        tags = ['MacArthur Park, Woodley Avenue Park, Rio de Los Angeles State Park, Runyon Canyon, Temescal Gateway, Heidelberg Park, Hancock Park, Franklin Canyon Park, Angels Gate, Coldwater Canyon, Chatsworth Park South, Cheviot Hills, O''Melveny Park']
        parks_LA = flickr.photos.search(tags=tags, bbox = '-118.898278,33.704902,-118.161021,34.32848',
                                        method_name='flickr', page=i, per_page=500, extras=extras)  
        
        #.extend combines each page of search results together: https://www.programiz.com/python-programming/methods/list/extend
        park_list.extend(parks_LA['photos']['photo']) #pulls data from each individual photo 
        
    #reorients the data and converts to pandas df: https://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-a-pandas-dataframe
    df = pd.DataFrame.from_dict(park_list, orient='columns') 
    
    return df

parks_data = get_parks(12) #pulls data from all 12 pages of photos

In [3]:
#checking the dataframe 
parks_data

Unnamed: 0,id,owner,secret,server,farm,title,ispublic,isfriend,isfamily,description,...,latitude,longitude,accuracy,context,place_id,woeid,geo_is_public,geo_is_contact,geo_is_friend,geo_is_family
0,49856813066,9771767@N04,83838b7d7b,65535,66,Angel's Gate Park,1,0,0,"{'_content': 'Have a wonderful day, everyone...'}",...,33.709932,-118.293882,16,0,,5392528,1,0,0,0
1,51050150612,192454804@N08,d07257b1b3,65535,66,English Home,1,0,0,{'_content': 'English style house in Los Angel...,...,34.074580,-118.334275,16,0,,5355165,1,0,0,0
2,50952829406,77318907@N08,6fe5b8ef3a,65535,66,Angel's Gate Cloudscape,1,0,0,"{'_content': 'San Pedro, CA 01-02-21'}",...,33.721867,-118.271759,16,0,,5392528,1,0,0,0
3,50952896937,77318907@N08,ebe224b984,65535,66,Harbor Entrance at Sunrise,1,0,0,"{'_content': 'The Angel's Gate in San Pedro, C...",...,33.721707,-118.271791,16,0,,5392528,1,0,0,0
4,50474879407,54718757@N00,a011599de5,65535,66,22834,1,0,0,"{'_content': '22,834 smoke on the water, in th...",...,33.732833,-118.317989,16,0,,5392544,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2710,2396232,26829571@N00,0354fe9c73,1,1,Buddy and Franklin Drinking,1,0,0,{'_content': ''},...,34.115553,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0
2711,2396203,26829571@N00,36d59da346,1,1,Franklin,1,0,0,{'_content': ''},...,34.115553,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0
2712,2396212,26829571@N00,bde5937b1c,1,1,Buddy,1,0,0,{'_content': ''},...,34.115553,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0
2713,2396193,26829571@N00,522a741b6f,3,1,Franklin Digging,1,0,0,{'_content': ''},...,34.115553,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0


The next step was to create a new column and assign the park name to that column so that we could more easily track tags as they relate to specific parks.

In [4]:
# Create a new column and assign the park name 
park_names = ['macarthur', 'woodley', 'riodelosangeles', 'runyoncanyon', 'temescalgateway', 'heidelbergpark', 'hancockpark', 'franklincanyonpark', 'angelsgate', 'coldwatercanyon', 'chatsworthparksouth','cheviothills']

def get_park_name(row):
    for park in park_names:
        if park in row['tags']:
            return park
    return 'Unknown'

parks_data['parkname'] = parks_data.apply(lambda row: get_park_name(row), axis=1)

In [5]:
#checking our work; see new parkname column
parks_data

Unnamed: 0,id,owner,secret,server,farm,title,ispublic,isfriend,isfamily,description,...,longitude,accuracy,context,place_id,woeid,geo_is_public,geo_is_contact,geo_is_friend,geo_is_family,parkname
0,49856813066,9771767@N04,83838b7d7b,65535,66,Angel's Gate Park,1,0,0,"{'_content': 'Have a wonderful day, everyone...'}",...,-118.293882,16,0,,5392528,1,0,0,0,angelsgate
1,51050150612,192454804@N08,d07257b1b3,65535,66,English Home,1,0,0,{'_content': 'English style house in Los Angel...,...,-118.334275,16,0,,5355165,1,0,0,0,hancockpark
2,50952829406,77318907@N08,6fe5b8ef3a,65535,66,Angel's Gate Cloudscape,1,0,0,"{'_content': 'San Pedro, CA 01-02-21'}",...,-118.271759,16,0,,5392528,1,0,0,0,angelsgate
3,50952896937,77318907@N08,ebe224b984,65535,66,Harbor Entrance at Sunrise,1,0,0,"{'_content': 'The Angel's Gate in San Pedro, C...",...,-118.271791,16,0,,5392528,1,0,0,0,angelsgate
4,50474879407,54718757@N00,a011599de5,65535,66,22834,1,0,0,"{'_content': '22,834 smoke on the water, in th...",...,-118.317989,16,0,,5392544,1,0,0,0,angelsgate
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2710,2396232,26829571@N00,0354fe9c73,1,1,Buddy and Franklin Drinking,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0,runyoncanyon
2711,2396203,26829571@N00,36d59da346,1,1,Franklin,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0,runyoncanyon
2712,2396212,26829571@N00,bde5937b1c,1,1,Buddy,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0,runyoncanyon
2713,2396193,26829571@N00,522a741b6f,3,1,Franklin Digging,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302,1,0,0,0,runyoncanyon


In [6]:
#save to a CSV file so we do not have to keep pulling from Flickr
parks_data.to_csv('parks_data.csv', index=False)  

In [7]:
# Upload the csv with all the parks pulled from the Flickr API to continue analysis
import pandas as pd

fn = 'parks_data.csv'
parks_data = pd.read_csv(fn)
print(len(parks_data)) #internal check

parks_data

2715


Unnamed: 0,id,owner,secret,server,farm,title,ispublic,isfriend,isfamily,description,...,longitude,accuracy,context,place_id,woeid,geo_is_public,geo_is_contact,geo_is_friend,geo_is_family,parkname
0,49856813066,9771767@N04,83838b7d7b,65535,66,Angel's Gate Park,1,0,0,"{'_content': 'Have a wonderful day, everyone...'}",...,-118.293882,16,0,,5392528.0,1,0,0,0,angelsgate
1,51050150612,192454804@N08,d07257b1b3,65535,66,English Home,1,0,0,{'_content': 'English style house in Los Angel...,...,-118.334275,16,0,,5355165.0,1,0,0,0,hancockpark
2,50952829406,77318907@N08,6fe5b8ef3a,65535,66,Angel's Gate Cloudscape,1,0,0,"{'_content': 'San Pedro, CA\n01-02-21'}",...,-118.271759,16,0,,5392528.0,1,0,0,0,angelsgate
3,50952896937,77318907@N08,ebe224b984,65535,66,Harbor Entrance at Sunrise,1,0,0,"{'_content': ""The Angel's Gate in San Pedro, C...",...,-118.271791,16,0,,5392528.0,1,0,0,0,angelsgate
4,50474879407,54718757@N00,a011599de5,65535,66,22834,1,0,0,"{'_content': '22,834 smoke on the water, in th...",...,-118.317989,16,0,,5392544.0,1,0,0,0,angelsgate
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2710,2396232,26829571@N00,0354fe9c73,1,1,Buddy and Franklin Drinking,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302.0,1,0,0,0,runyoncanyon
2711,2396203,26829571@N00,36d59da346,1,1,Franklin,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302.0,1,0,0,0,runyoncanyon
2712,2396212,26829571@N00,bde5937b1c,1,1,Buddy,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302.0,1,0,0,0,runyoncanyon
2713,2396193,26829571@N00,522a741b6f,3,1,Franklin Digging,1,0,0,{'_content': ''},...,-118.351421,16,0,6RAY6t1TWr2KLaELHw,28751302.0,1,0,0,0,runyoncanyon


Next we take a look at the photo tags to initially clean them up. 

In [8]:
#Clean the tags by removing stopwords and additional words as defined
import nltk
import re
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords

swords = [re.sub(r"[^A-z\s]", "", sword) for sword in stopwords.words('english')]
swords += ['losangeles', 'la', 'losangelesca', 'ca', 'macarthur', 'macarthurpark', 'woodley', 'riodelosangeles', 'runyoncanyon', 
           'temescalgateway', 'heidelbergpark', 'hancockpark', 'franklincanyonpark', 'franklincanyonpark', 'angelsgate', 
           'coldwatercanyon', 'chatsworthparksouth','cheviothills', 'california', 'usa', 'southerncalifornia', 'park', 'parklabrea', 
          'unitedstates', 'america']

def clean_string(text):
    # remove punctuation
    text = re.sub(r"[^A-z\s]", "", text)
    
    cleaned_list_of_words = [word for word in word_tokenize(text.lower()) if word not in swords] #return a string or apply to all tags
    
    return cleaned_list_of_words

#calling the function to only apply to the tags column 
parks_data['tags'] = parks_data['tags'].apply(clean_string)


In [9]:
# Source: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html
# Source groupby documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
# Source sort_values documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

cols = ['tags', 'parkname']
tag_park_all = parks_data[cols].explode('tags', ignore_index=True)

#create a new column, value, using a list comprehension, and temporarily fill each row with a value of 1
tag_park_all['value'] = [1 for i in range(tag_park_all.shape[0])]

#return top 100 most used tags sorted by value
top_200_tags_all = tag_park_all.groupby('tags').sum().sort_values('value', ascending=False).head(200)

top_200_tags_all

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
tarpits,536
labreatarpits,450
agcc,370
labrea,344
museum,316
...,...
immigration,22
detetioncenter,22
pacificocean,22
mural,22


In [10]:
# exporting top 200 tags to a csv for hand coding 
top_200_tags_all.to_csv('top_200_tags.csv', index=True)

## Data Wrangling: Individual Parks

### MacArthur Park

In [11]:
#Filter the dataframe by park name
macarthur = parks_data['parkname']=='macarthur'

macarthur_tags = parks_data[macarthur]
macarthur_tags.parkname.unique()
cols = ['tags', 'parkname']

tag_park_mac = macarthur_tags[cols].explode('tags', ignore_index=True)

tag_park_mac

Unnamed: 0,tags,parkname
0,downtownlosangeles,macarthur
1,losangelesskyline,macarthur
2,twilight,macarthur
3,dtla,macarthur
4,downtownlosangeles,macarthur
...,...,...
3893,cultural,macarthur
3894,monument,macarthur
3895,bench,macarthur
3896,westlake,macarthur


In [12]:
#create a column with count of each tag 
tag_park_mac['value'] = [1] * tag_park_mac.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_mac = tag_park_mac.groupby('tags').sum().sort_values('value', ascending=False).head(100)

#so we can view all tags
pd.set_option('display.max_rows', 100)

top_100_tags_mac

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
westlake,178
lake,86
ciclavia,80
fountain,62
palmtrees,61
socal,53
dtla,47
architecture,47
downtownlosangeles,44
urban,43


In [13]:
# exporting top 100 tags to a csv for hand coding & reference for All Parks analysis
top_100_tags_mac.to_csv('top_tags_macarthur.csv', index=True)

### Rio de Los Angeles

In [14]:
#Filter the dataframe by park name
riodelosangeles = parks_data['parkname']=='riodelosangeles'

#Save the output to a new variable and disaggregate the tag lists associated with the park into individual tags 
riodelosangeles_tags = parks_data[riodelosangeles]
cols = ['tags', 'parkname']
tag_park_rio = riodelosangeles_tags[cols].explode('tags', ignore_index=True)

tag_park_rio

Unnamed: 0,tags,parkname
0,flowers,riodelosangeles
1,plants,riodelosangeles
2,leaves,riodelosangeles
3,flora,riodelosangeles
4,riodelosangelesstatepark,riodelosangeles
...,...,...
105,riodelosangelesstatepark,riodelosangeles
106,statepark,riodelosangeles
107,lariver,riodelosangeles
108,tayloryard,riodelosangeles


In [15]:
#create a column with count of each tag 
tag_park_rio['value'] = [1] * tag_park_rio.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_rio = tag_park_rio.groupby('tags').sum().sort_values('value', ascending=False).head(100)

#so we can view all tags
pd.set_option('display.max_rows', 100)

top_100_tags_rio

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
riodelosangelesstatepark,12
lariver,7
anahuakyouthsportsassocation,6
mmf,6
cityproject,6
cityprojectca,6
earthday,6
environmentaljustice,6
wcvi,6
walkathon,6


In [16]:
# exporting top 100 tags to a csv for hand coding & reference for All Parks analysis
top_100_tags_rio.to_csv('top_tags_riodelosangeles.csv', index=True)

### Runyon Canyon

In [17]:
runyoncanyon = parks_data['parkname']=='runyoncanyon'

runyoncanyon_tags = parks_data[runyoncanyon]
cols = ['tags', 'parkname']
tag_park_rc = runyoncanyon_tags[cols].explode('tags', ignore_index=True)

tag_park_rc

Unnamed: 0,tags,parkname
0,hollywood,runyoncanyon
1,runyon,runyoncanyon
2,hollywood,runyoncanyon
3,runyon,runyoncanyon
4,hollywood,runyoncanyon
...,...,...
1854,runyon,runyoncanyon
1855,weimardoodle,runyoncanyon
1856,dog,runyoncanyon
1857,runyon,runyoncanyon


In [18]:
#create a column with count of each tag 
tag_park_rc['value'] = [1] * tag_park_rc.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_rc = tag_park_rc.groupby('tags').sum().sort_values('value', ascending=False).head(100)

#so we can view all tags
#pd.set_option('display.max_rows', 100)

top_100_tags_rc

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
hollywood,115
runyon,71
hiking,55
hike,46
canyon,39
runyoncanyonpark,37
sunset,27
city,23
hollywoodhills,23
mountains,22


In [19]:
# exporting top 100 tags to a csv for hand coding 
top_100_tags_rc.to_csv('top_tags_runyoncanyon.csv', index=True)

### Hancock Park

In [20]:
hancock = parks_data['parkname']=='hancockpark'

hancock_tags = parks_data[hancock]
cols = ['tags', 'parkname']
tag_park_hancock = hancock_tags[cols].explode('tags', ignore_index=True)

tag_park_hancock

Unnamed: 0,tags,parkname
0,englishhome,hancockpark
1,architecture,hancockpark
2,bw,hancockpark
3,blackandwhite,hancockpark
4,monochrome,hancockpark
...,...,...
13805,accident,hancockpark
13806,clinton,hancockpark
13807,hollywood,hancockpark
13808,intersection,hancockpark


In [21]:
#create a column with count of each tag 
tag_park_hancock['value'] = [1] * tag_park_hancock.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_hancock = tag_park_hancock.groupby('tags').sum().sort_values('value', ascending=False).head(100)

#so we can view all tags
#pd.set_option('display.max_rows', 100)

top_100_tags_hancock

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
tarpits,536
labreatarpits,450
labrea,344
museum,316
fossils,217
bones,194
paleontology,170
pagemuseum,165
socal,155
estate,149


In [22]:
# exporting top 100 tags to a csv for hand coding 
top_100_tags_hancock.to_csv('top_tags_hancockpark.csv', index=True)

### Franklin Canyon Park

In [23]:
franklincanyonpark = parks_data['parkname']=='franklincanyonpark'

franklincanyonpark_tags = parks_data[franklincanyonpark]
cols = ['tags', 'parkname']
tag_park_franklin = franklincanyonpark_tags[cols].explode('tags', ignore_index=True)

tag_park_franklin

Unnamed: 0,tags,parkname
0,beverlyhills,franklincanyonpark
1,franklincanyon,franklincanyonpark
2,santamonicamountains,franklincanyonpark
3,ducks,franklincanyonpark
4,lake,franklincanyonpark
...,...,...
298,mayberrylake,franklincanyonpark
299,tvshowlocations,franklincanyonpark
300,californiadreamsphotographycom,franklincanyonpark
301,losangelestvlocations,franklincanyonpark


In [24]:
#create a column with count of each tag 
tag_park_franklin['value'] = [1] * tag_park_franklin.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_franklin = tag_park_franklin.groupby('tags').sum().sort_values('value', ascending=False).head(100)

#so we can view all tags
pd.set_option('display.max_rows', 100)

top_100_tags_franklin

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
santamonicamountains,11
nature,8
myerslake,7
andygriffithshow,7
losangelesmountains,7
losangelestvlocations,7
mayberrylake,7
mayberrync,7
mutt,7
andygriffith,7


In [25]:
# exporting top 100 tags to a csv for hand coding 
top_100_tags_franklin.to_csv('top_tags_franklincanyonpark.csv', index=True)

### Angels Gate

In [26]:
angelsgate = parks_data['parkname']=='angelsgate'

angelsgate_tags = parks_data[angelsgate]
cols = ['tags', 'parkname']
tag_park_angels = angelsgate_tags[cols].explode('tags', ignore_index=True)

#tag_park_angels

In [27]:
#create a column with count of each tag 
tag_park_angels['value'] = [1] * tag_park_angels.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_angels = tag_park_angels.groupby('tags').sum().sort_values('value', ascending=False).head(100)

top_100_tags_angels

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
agcc,370
angelsgateculturalcenter,271
openstudios,239
art,207
gallerya,114
artgallery,97
gallery,88
artonthewaterfront,83
allankaprow,81
happening,81


In [28]:
# exporting top 100 tags to a csv for hand coding 
top_100_tags_angels.to_csv('top_tags_angelsgate.csv', index=True)

### Coldwater Canyon

In [29]:
coldwatercanyon = parks_data['parkname']=='coldwatercanyon'

coldwatercanyon_tags = parks_data[coldwatercanyon]
cols = ['tags', 'parkname']
tag_park_coldwater = coldwatercanyon_tags[cols].explode('tags', ignore_index=True)

#tag_park_coldwater

In [30]:
#create a column with count of each tag 
tag_park_coldwater['value'] = [1] * tag_park_coldwater.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_coldwater = tag_park_coldwater.groupby('tags').sum().sort_values('value', ascending=False).head(100)

top_100_tags_coldwater

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
socal,19
lavc,18
vannuys,18
tujungawashgreenway,18
tujungawash,18
coldwatercanyonblvd,18
mural,16
greatwalloflosangeles,16
lahistory,16
californiahistory,15


In [31]:
# exporting top 100 tags to a csv for hand coding 
top_100_tags_coldwater.to_csv('top_tags_coldwatercanyon.csv', index=True)

### Cheviot Hills

In [32]:
cheviothills = parks_data['parkname']=='cheviothills'

cheviothills_tags = parks_data[cheviothills]
cols = ['tags', 'parkname']
tag_park_cheviot = cheviothills_tags[cols].explode('tags', ignore_index=True)

In [33]:
#create a column with count of each tag 
tag_park_cheviot['value'] = [1] * tag_park_cheviot.shape[0]

#return top 100 most used tags sorted by value
top_100_tags_cheviot = tag_park_cheviot.groupby('tags').sum().sort_values('value', ascending=False).head(100)

top_100_tags_cheviot

Unnamed: 0_level_0,value
tags,Unnamed: 1_level_1
westerncup,13
sports,13
parlance,13
quidditch,13
harrypotter,13
sharibellis,13
nikond,13
geek,13
geeks,13
nikon,13


In [34]:
# exporting top 100 tags to a csv for hand coding 
top_100_tags_cheviot.to_csv('top_tags_cheviothills.csv', index=True)

# Preparing the individual park data for spatial analysis

After exporting a csv for each park with the top 100 tags we wanted to analyze the spatial distribution of the most popular CES by park. To acheive this we executed the following steps:
1. Get CES frequencies by park
2. Convert CES frequencies list into a dataframe
3. Create a list of the park names as they appear in the parks shape file in order to do a join later
4. Create a list with the top CES for each park and add it as a column
5. Import the parks shape file as a geopandas dataframe and left join on "PARK_NAME" column
6. Export new dataframe as csv for visualization in the third notebook

The parks shape file was created in ArcGIS using the [California Protected Areas Database GIS dataset](https://www.calands.org/), which depicts lands that are owned in fee and protected for open space purposes by over 1,000 public agencies or non-profit organizations. The eight parks selected for the study were filtered into a new layer and exported as a polygon shape file. 

In [1]:
import pandas as pd

category_map = { 'Existence': ['westlake', 'lake','palmtrees','palms','elks','parkplaza','birds','palmtree', 'santamonicamountains','franklincanyonlake','losangelesmountains','mayberrylake','myerslake','grass','lake','trees','ducks','water','evergreens','frog','woods', 'ice', 'fluids','iceblocks','blocksofice','wallofice','harbor', 'sky', 'weather','tree','cloudy','garden','parks','lariver','losangelesriver','grass','mountains','canyons','hills','mountains','hill','horse','tujungawashgreenway','tujungawash','pacificocean','sky','weather','tree','cloudy'], 
                'Recreation': ['music','bikes','lilihaydn','bicycles','violin','ciclovia','loslobos','event', 'rustythedog','canine','chihuahuamix','mutt','dog','pet','weeksfordogs','urbanhiking', 'costumes', 'costume', 'cosplay', 'boomerang', 'lighthouse','westerncup','sports', 'quidditch', 'dog', 'puggle', 'puppy', 'referee', 'boat', 'nikon', 'nikond','gardentour','fish','campout','hiking','hike','sunriserunyon','sunriseinrunyon','sunriseinrunyoncanyonpark','trail','observatory','run','jog','summerolympics','westerncup','sports','quidditch','nikon','nikond','dog','puggle','puppy','referee','boat'], 
                'Social Relations': ['ciclavia','rally','protest','keepfamliestogethor','tamale','asada','alpastor','march','carnitas','eltaurino','burrito','thegreattacohunt','lasantacon','people','tacos','food','harrypotter', 'wand', 'geeks', 'geek','gardenparty','people','picnik','walkathon','earthday','zurbulon','harrypotter','wand'],
                'Aesthetics': ['colorful','green', 'landscape','textures','texturemaps','texturemap','texture','sunset','sky','skyline','clouds','sun','weather','sunrise','pacificocean','panorama','color','overlook','sunset','viewpoint','scenicoverview','mulhollandscenicoverview','brown','barbaraafineoverlook','lasunset','green','landscape'],
                'Spiritual': ['signs','sanity','nature','neature','harborinterfaith','outside','church','littlebrownchurchinthevalley'], 
                'Inspiration':['art','portraitsofhope','publicart', 'artonthewaterfront', 'artist', 'sculpture', 'printmaking', 'portrait', 'contemporaryart', 'painting', 'polaroid', 'draw', 'photographer', 'studioartist', 'prints','mural'],
                'Cultural Heritage': ['landmark','monument', 'curlettandbeelman','fortmacarthur','warreinactment', 'agcc', 'angelsgateculturalcenter', 'openstudios', 'allankaprow', 'gallerya', 'artgallery', 'gallery', 'artexhibition', 'slobodandimitrov', 'culturalcenter', 'exhibition', 'downstairsgallery', 'installation', 'hillarybradfield', 'festival','parlance','sculpture','art','statue','treasuresoflosangelesarchitecture','losangelesstatehistoricpark','midcenturymodernhomes','charliechaplin','parlance'],
                'Sense of Place': ['neighborhood','community','eccideasclub', 'sanpedro', 'neighborhood'], 
                'Cultural Diversity': ['mexican','lengua','march','immigration','czechart','lapride','westhollywoodpride','lagaypride','westhollywoodgaypride','losangelespride','losangelesgaypride','pride','gaypride'],
                'Knowledge Systems': ['historyofsanpedropunk', 'belleepoque','lahistory','californiahistory'],
                'Education': ['portoflosangeles', 'port','portofla','marshallastor','berth','georgecpagemuseum','museums','losangelescountymuseumofart','iceage','pleistocene','skeletons','skulls','pit','tarpits','labreatarpits','labrea','museum','pagemuseum','fossils','bones','paleontology','lacma','animalsmammoths','excavation','sabretooth','tigers','giantgroundsloths','gettyhouse','tar','sabretoothtigers','olympusem','skeleton','fossil','mammoth','mastodon','environmentaljustice','urbanparkmovement','losangelespubliclibrary'
                             ]}

# Import the csv of frequencies for each park under consideration
fns = ['top_tags_angelsgate.csv','top_tags_cheviothills.csv','top_tags_coldwatercanyon.csv',
       'top_tags_franklincanyonpark.csv','top_tags_hancockpark.csv','top_tags_macarthur.csv',
       'top_tags_riodelosangeles.csv','top_tags_runyoncanyon.csv']
parks_frequency = []
for fn in fns: 
    parks_frequency.append(pd.read_csv(fn))

#print(parks_frequency)

# Create a function to loop over the categories and sum the words associated with each category  
def getCategories(frequencyDf):
    
    category_frequencies = dict.fromkeys(category_map,0)
    
    for index, row in frequencyDf.iterrows():
        #print(row['tags'],row['value'])
        for category_name in category_map:
            wordlist = category_map[category_name]
            if row['tags'] in wordlist:
                category_frequencies[category_name]+=row['value']
            
    return category_frequencies
    
# Create a list of dictionaries with the frequencies by category for each park
cat_frequencies = []
for park in parks_frequency:
    cat_frequencies.append(getCategories(park))

print(cat_frequencies)

[{'Existence': 344, 'Recreation': 51, 'Social Relations': 0, 'Aesthetics': 0, 'Spiritual': 14, 'Inspiration': 517, 'Cultural Heritage': 1698, 'Sense of Place': 63, 'Cultural Diversity': 13, 'Knowledge Systems': 30, 'Education': 119}, {'Existence': 4, 'Recreation': 73, 'Social Relations': 40, 'Aesthetics': 5, 'Spiritual': 0, 'Inspiration': 0, 'Cultural Heritage': 13, 'Sense of Place': 2, 'Cultural Diversity': 0, 'Knowledge Systems': 0, 'Education': 0}, {'Existence': 37, 'Recreation': 1, 'Social Relations': 2, 'Aesthetics': 9, 'Spiritual': 2, 'Inspiration': 16, 'Cultural Heritage': 2, 'Sense of Place': 0, 'Cultural Diversity': 0, 'Knowledge Systems': 31, 'Education': 1}, {'Existence': 56, 'Recreation': 44, 'Social Relations': 0, 'Aesthetics': 1, 'Spiritual': 11, 'Inspiration': 4, 'Cultural Heritage': 0, 'Sense of Place': 0, 'Cultural Diversity': 0, 'Knowledge Systems': 0, 'Education': 0}, {'Existence': 39, 'Recreation': 30, 'Social Relations': 197, 'Aesthetics': 0, 'Spiritual': 0, 'Inspi

In [2]:
parksdf = pd.DataFrame(cat_frequencies)
print(parksdf)

   Existence  Recreation  Social Relations  Aesthetics  Spiritual  \
0        344          51                 0           0         14   
1          4          73                40           5          0   
2         37           1                 2           9          2   
3         56          44                 0           1         11   
4         39          30               197           0          0   
5        468         187               415          25         39   
6         14           6                13           0          1   
7         82         169                 0         178          8   

   Inspiration  Cultural Heritage  Sense of Place  Cultural Diversity  \
0          517               1698              63                  13   
1            0                 13               2                   0   
2           16                  2               0                   0   
3            4                  0               0                   0   
4           7

In [3]:
parknames = ['Angels Gate Park','Cheviot Hills Park and Recreation Center','Coldwater Canyon Park','Franklin Canyon Park','Hancock Park','MacArthur Park','Rio de Los Angeles State Park State Recreation Area','Runyon Canyon Park']

parknamesDf = pd.DataFrame(parknames)
parknamesDf['PARK_NAME']=parknames
parknamesDf

Unnamed: 0,0,PARK_NAME
0,Angels Gate Park,Angels Gate Park
1,Cheviot Hills Park and Recreation Center,Cheviot Hills Park and Recreation Center
2,Coldwater Canyon Park,Coldwater Canyon Park
3,Franklin Canyon Park,Franklin Canyon Park
4,Hancock Park,Hancock Park
5,MacArthur Park,MacArthur Park
6,Rio de Los Angeles State Park State Recreation...,Rio de Los Angeles State Park State Recreation...
7,Runyon Canyon Park,Runyon Canyon Park


In [4]:
parksjoinDf = parksdf.join(parknamesDf, how = 'left')
parksjoinDf

Unnamed: 0,Existence,Recreation,Social Relations,Aesthetics,Spiritual,Inspiration,Cultural Heritage,Sense of Place,Cultural Diversity,Knowledge Systems,Education,0,PARK_NAME
0,344,51,0,0,14,517,1698,63,13,30,119,Angels Gate Park,Angels Gate Park
1,4,73,40,5,0,0,13,2,0,0,0,Cheviot Hills Park and Recreation Center,Cheviot Hills Park and Recreation Center
2,37,1,2,9,2,16,2,0,0,31,1,Coldwater Canyon Park,Coldwater Canyon Park
3,56,44,0,1,11,4,0,0,0,0,0,Franklin Canyon Park,Franklin Canyon Park
4,39,30,197,0,0,76,154,0,1168,0,3988,Hancock Park,Hancock Park
5,468,187,415,25,39,76,169,112,85,0,0,MacArthur Park,MacArthur Park
6,14,6,13,0,1,0,6,0,0,0,12,Rio de Los Angeles State Park State Recreation...,Rio de Los Angeles State Park State Recreation...
7,82,169,0,178,8,0,0,0,0,0,0,Runyon Canyon Park,Runyon Canyon Park


In [5]:
Top_CES = ["Recreation","Existence","Cultural_Heritage","Existence","Education","Social Relations","Aesthetics","Existence"]
parksjoinDf['Top_CES']=Top_CES

In [None]:
# Import necessary modules
import geopandas as gpd

# Set filepath (fix path relative to yours)
#fp = '/Users/jacquelineadams/Documents/GitHub/LaParks_NLP_6/ces_laparks.shp"

shapefile = gpd.read_file('/Users/jacquelineadams/Documents/GitHub/LaParks_NLP_6/ces_laparks.shp')


# Read file using gpd.read_file()
#parks_sf = gpd.read_file(parks_shape)
#type(parks_sf)

In [None]:
park_shapes = shapefile.merge(parksjoinDf, on='PARK_NAME')

park_shapes.head()

In [None]:
park_shapes.to_csv('park_shapes.csv', index=False)  

## Coding Methodology

The next step in our process was to scale down our analysis to the park level and examine CES servies offered by park. Since our methodology closely follows the Hale (2019) aricle, we needed to hand-code a select set of the tags into Hale's predefined CES buckets. To do this, we first coded the top 200 tags across all parks to get a sense of the types of services offered across all parks. The top 100 tags did not yield enough codeable tags, so we expanded that selection for a more robust sample. The next step was to download separate CSV files for each park we intended to hand-code. We decided to only code parks that had 50 or more unique tags. The parks that fit this criteria were: MacArthur Park, Runyon Canyon, Angel's Gate, Cheviot Hills, Coldwater Canyon, Franklin Canyon, Hancock Park, and Rio de Los Angeles. We coded each of these CSV files separately and added them, by hand, to a dictionary that contained each tag and its respective CES bucket (you can find that code below)

Tags themeselves are coded into one of the following categories (we did not cross-code, for parsimony, though we recognize that future analysis would yield more comprehensive results if tags could be cross-coded): existence, recreation, social relations, aesthetics, spritual, knowledge systems, inspiration, cultural heritage, education, sense of place, and culutral diversity. 

**Existence**: relates to wildlife or natural phenomenon, such as wild animals, trees or lakes.

**Recreation**: relates to tags that encompass hobbies or leisure activities, such as boating, sports, or photography.

**Social Relations**: consists of activities that promote social cohesion and camaraderie, such as weddings or famliy activities.

**Aesthetics**: relates to tags that encompass beauty, such as scenic overlooks, sunsets, or architecture. 

**Spiritual**: tags are bucketed based on religious or faith-based activities, such as attending church or a worship ceremony. 

**Knowledge Systems**: encompasses collective learning or educational activities, such as birdwatching or visiting a museum.

**Inspiration**: relates to activiteis that promote self-reflection, such as examining a work of art or reflecting on one's surroundings.

**Cultural Heritage**: relates to activites centered on ethnic identities and groups, such as festivals or looking at historic monuments.

**Education**: relates to learning, and may center on visiting a library or an exibit in a museum. 

**Sense of Place**: refers to activitives that connect communities and neighborhoods through cultural activities or shared history.

**Cultural Diversity**: relates to activites that bring together specific groups, such as LGBTQIA+ events or Native American cultural events.  

We coded tags that fit into each of the above cateogories, but did not code tags that did not equate to a CES (i.e., nonsensical tags, vague tags, or tags for unidentifiable built infrastructure). Notebook 3 references the coded tags by category for visualization and analysis. 