# Food Safety and Community Food Production

An exploration of how narratives of "safety" in land use policy impact urban agriculture in select CA Counties: LA, Ventura, Sonoma, Mendocino, and Lake Counties

## The Data

112 total general plans from the five CA Counties (LA, Ventura, Sonoma, Mendocino, Lake)


1. Basic word counts:

2. Modified/precision topic modeling: How often do mentions of food, agriculture, soil health, food safety feature in plans? And as important, where are they mentioned?

In [2]:
# libraries
import pandas as pd
import geopandas as gpd
import numpy as np

# visualizations
import contextily as ctx
import matplotlib.pyplot as plt
import seaborn as sn

# census
import cenpy
from cenpy import products

# set display
pd.options.display.max_columns = 150
#pd.options.display.max_rows = 150

In [3]:
# libraries
import os

# extract text
from pdfminer.high_level import extract_text
import re

# stop words; split sentences; stems
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer

# topic modeling
import gensim
help(gensim.models.LdaMulticore)
import pyLDAvis
import pyLDAvis.gensim_models

Help on class LdaMulticore in module gensim.models.ldamulticore:

class LdaMulticore(gensim.models.ldamodel.LdaModel)
 |  LdaMulticore(corpus=None, num_topics=100, id2word=None, workers=None, chunksize=2000, passes=1, batch=False, alpha='symmetric', eta=None, decay=0.5, offset=1.0, eval_every=10, iterations=50, gamma_threshold=0.001, random_state=None, minimum_probability=0.01, minimum_phi_value=0.01, per_word_topics=False, dtype=<class 'numpy.float32'>)
 |  
 |  An optimized implementation of the LDA algorithm, able to harness the power of multicore CPUs.
 |  Follows the similar API as the parent class :class:`~gensim.models.ldamodel.LdaModel`.
 |  
 |  Method resolution order:
 |      LdaMulticore
 |      gensim.models.ldamodel.LdaModel
 |      gensim.interfaces.TransformationABC
 |      gensim.utils.SaveLoad
 |      gensim.models.basemodel.BaseTopicModel
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, corpus=None, num_topics=100, id2word=None, workers=

#### The Plan Elements

After downloading all plans, the plans were split by element and only the "Land Use" and "Open Space/Conservation" Elements will be included.

In [4]:
# loading plans
planlist = os.listdir('C:/Users/Urna6/OneDrive/Melody/GitHub/homeworks/project-ngmelo/General Plans')
planlist

['.ipynb_checkpoints',
 'Alhambra General Plan.August 2019.lowres (PDF).pdf',
 'City of Commerce 2020 General Plan.PDF',
 'Vernon_General_Plan.pdf']

In [12]:
# extracting text

# function
def readPDF(planname):
    txt = extract_text('C:/Users/Urna6/OneDrive/Melody/GitHub/homeworks/project-ngmelo/General Plans/'+planname)
    
    # remove punctuation, numbers, etc.
    txt = re.sub(r"[^A-z\s]", "", txt)
    # remove whitepace
    txt = re.sub(r"\s+", " ", txt) 
    
    # cleaning up name
    planname = planname.split(".")[0]
    
    # insert muni(planname) to the beginning of plan: creating indexable muni ID within each plan string
    txt = planname+", "+txt
    
    print('Finished {}'.format(planname))
    return txt

# read in all pdf files
suffixes = ('.pdf', '.PDF')
genplan = [readPDF(pn) for pn in planlist if pn.endswith(suffixes)]

Finished Alhambra General Plan
Finished City of Commerce 2020 General Plan
Finished Vernon_General_Plan


Creating a municipality list to append to tables.

In [13]:
# establish list of stopwords to exclude
swords = stopwords.words('english')

In [14]:
# TEST: wordcounts for one plan
Alhambra = genplan[0]
Commerce = genplan[1]
Vernon = genplan[2]

list1 = [word for word in word_tokenize(Alhambra.lower()) 
                 if word not in swords]

#### Keyword Count (LIST VERSION: Word positions)

Loop to return only count of select keywords for each plan in genplan and store in stacked df.

In [15]:
# return only count of select keywords for each plan in genplan and store in stacked df

# keywords dict frame to store results from funtion
topics = {'food', 'agriculture', 'garden', 'farm', 'fruit', 'vegetable', 'animal', 'soil', 
            'remediation', 'contaminate', 'carbon', 'sustainability', 'climate', 'environment', 
            'health', 'safety', 'justice'}

plankeywords = []
    
# function/loop
for plan in genplan:
    
    # create list of wordlists for each plan
    wordlist = [word for word in word_tokenize(plan.lower()) 
                 if word not in swords]
   
    # create empty dictionary to store topic mention counts
    keydict = {}

    for topic in topics:
        # generate topic counts to populate dictionary frame indexed to topic
        keydict[topic] = wordlist.count(topic)
        
    # turn dicts into dfs
    keywords = pd.DataFrame.from_dict(keydict, orient='index', columns=['counts'])
    keywords.index.name = 'keyword'
    
    # column ID
    muniID = plan.lower().find("city of ")
    #keywords['municipality'] = plan[muniID+8:muniID+20] # longest muni name in muni list has 20 chars
    keywords['municipality'] = plan.split(", ")[0]
    
    # append dfs onto master df
    plankeywords.append(keywords)
    
# master df with all keywordcounts dfs combined
keywordsall = pd.concat(plankeywords)

# inspect
keywordsall

Unnamed: 0_level_0,counts,municipality
keyword,Unnamed: 1_level_1,Unnamed: 2_level_1
animal,1,Alhambra General Plan
contaminate,0,Alhambra General Plan
food,10,Alhambra General Plan
garden,9,Alhambra General Plan
fruit,0,Alhambra General Plan
sustainability,2,Alhambra General Plan
environment,33,Alhambra General Plan
agriculture,1,Alhambra General Plan
remediation,1,Alhambra General Plan
health,85,Alhambra General Plan


#### Keyword Count (RAW TEXT VERSION: Space positions)

Loop to return only count of select keywords for each plan in genplan and store in stacked df.

In [16]:
# saving positions of keywords in raw plan text to a dictionary: Alhambra
textdict = {}

keywords = ['food', 'agriculture', 'garden', 'farm', 'fruit', 'vegetable', 'animal', 'soil', 
            'remediation', 'contaminate', 'carbon', 'sustainability', 'climate', 'environment', 
            'health', 'safety', 'justice']

for key in keywords:
    # text positions
    textpositions = [i for i in range(len(Alhambra)) if Alhambra.startswith(key, i)] # adapted: https://www.geeksforgeeks.org/python-all-occurrences-of-substring-in-string/#
    textdict[key] = textpositions
    
textdict

{'food': [28264,
  92775,
  102247,
  107926,
  109065,
  113179,
  178165,
  198174,
  198866,
  209796,
  209884,
  210207],
 'agriculture': [144819],
 'garden': [37125,
  39045,
  65339,
  97617,
  97662,
  97853,
  98611,
  102899,
  116871,
  118153,
  166246,
  166449,
  210144],
 'farm': [11630],
 'fruit': [101651],
 'vegetable': [98417, 101658, 210134],
 'animal': [124235],
 'soil': [18274,
  27578,
  121375,
  125060,
  178858,
  178959,
  178977,
  179005,
  179188,
  180326,
  180400,
  180684,
  180861,
  181002,
  182443,
  186816,
  203699,
  203796],
 'remediation': [],
 'contaminate': [],
 'carbon': [134877, 135016, 137259],
 'sustainability': [140202],
 'climate': [19215,
  19282,
  25232,
  25435,
  28188,
  33765,
  121436,
  121768,
  125225,
  125404,
  134458,
  134596,
  134781,
  137219,
  137295,
  137388,
  138817,
  138949,
  139884,
  149838,
  151106,
  152013,
  152585,
  199507,
  199858,
  200014,
  200077,
  200141,
  200369,
  201329,
  201949,
  20197

In [51]:
# converting dictionary into table counts: Alhambra
textdict2 = {}

for keyword in textdict:
    for position in keyword:
        textdict2[keyword] = len(textdict[keyword])

textdict2

{'food': 12,
 'agriculture': 1,
 'garden': 13,
 'farm': 1,
 'fruit': 1,
 'vegetable': 3,
 'animal': 1,
 'soil': 18,
 'remediation': 0,
 'contaminate': 0,
 'carbon': 3,
 'sustainability': 1,
 'climate': 39,
 'environment': 73,
 'health': 60,
 'safety': 32,
 'justice': 11}

In [17]:
# saving positions of keywords in raw plan text to a dictionary: all plans
textdictall = {}

keywords = ['food', 'agriculture', 'garden', 'farm', 'fruit', 'vegetable', 'animal', 'soil', 
            'remediation', 'contaminate', 'carbon', 'sustainability', 'climate', 'environment', 
            'health', 'safety', 'justice']

for plan in genplan:
    for key in keywords:
        # text positions:
        textpositionsall = [i for i in range(len(plan)) if plan.startswith(key, i)] # adapted: https://www.geeksforgeeks.org/python-all-occurrences-of-substring-in-string/#
        #textdictall[key, plan[8:25]] = textpositionsall 
        textdictall[key, plan.split(", ")[0]] = textpositionsall

textdictall

{('food', 'Alhambra General Plan'): [28264,
  92775,
  102247,
  107926,
  109065,
  113179,
  178165,
  198174,
  198866,
  209796,
  209884,
  210207],
 ('agriculture', 'Alhambra General Plan'): [144819],
 ('garden', 'Alhambra General Plan'): [37125,
  39045,
  65339,
  97617,
  97662,
  97853,
  98611,
  102899,
  116871,
  118153,
  166246,
  166449,
  210144],
 ('farm', 'Alhambra General Plan'): [11630],
 ('fruit', 'Alhambra General Plan'): [101651],
 ('vegetable', 'Alhambra General Plan'): [98417, 101658, 210134],
 ('animal', 'Alhambra General Plan'): [124235],
 ('soil', 'Alhambra General Plan'): [18274,
  27578,
  121375,
  125060,
  178858,
  178959,
  178977,
  179005,
  179188,
  180326,
  180400,
  180684,
  180861,
  181002,
  182443,
  186816,
  203699,
  203796],
 ('remediation', 'Alhambra General Plan'): [],
 ('contaminate', 'Alhambra General Plan'): [],
 ('carbon', 'Alhambra General Plan'): [134877, 135016, 137259],
 ('sustainability', 'Alhambra General Plan'): [140202]

In [52]:
# turning dictionary with all keyword positions for all plans into dataframe
#textdictalldf = pd.DataFrame.from_dict(textdictall, orient = 'columns')
#textdictalldf

In [19]:
# converting dictionary into table counts: all plans

plankeywordstxt = []

textdict3 = {}

for keyword in textdictall:
    for position in keyword:
        textdict3[keyword] = len(textdictall[keyword])

# turn dicts into dfs
    keywordscount = pd.DataFrame.from_dict(textdict3, orient='index', columns=['counts'])
    keywordscount.index.name = 'keyword'

# creating muni ID column
keywordscount['municipality'] = keywordscount.index
keywordscount['municipality'] = keywordscount['municipality'].astype('str')
keywordscount['municipality'] = keywordscount['municipality'].str.replace('(','').str.replace(')','')
keywordscount[['keyword', 'municipality']] = keywordscount['municipality'].str.split(',', expand = True)
keywordscount['municipality'] = keywordscount['municipality'].str.replace("' ",'').str.replace("'",'')
keywordscount['keyword'] = keywordscount['keyword'].str.replace("' ",'').str.replace("'",'')

# inspect
keywordscount

  keywordscount['municipality'] = keywordscount['municipality'].str.replace('(','').str.replace(')','')


Unnamed: 0_level_0,counts,municipality,keyword
keyword,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"(food, Alhambra General Plan)",12,Alhambra General Plan,food
"(agriculture, Alhambra General Plan)",1,Alhambra General Plan,agriculture
"(garden, Alhambra General Plan)",13,Alhambra General Plan,garden
"(farm, Alhambra General Plan)",1,Alhambra General Plan,farm
"(fruit, Alhambra General Plan)",1,Alhambra General Plan,fruit
"(vegetable, Alhambra General Plan)",3,Alhambra General Plan,vegetable
"(animal, Alhambra General Plan)",1,Alhambra General Plan,animal
"(soil, Alhambra General Plan)",18,Alhambra General Plan,soil
"(remediation, Alhambra General Plan)",0,Alhambra General Plan,remediation
"(contaminate, Alhambra General Plan)",0,Alhambra General Plan,contaminate


Visualizing mentions.

In [79]:
# uploading file with municipalities matched to Counties and assigning counties to municipalities in keyword df through join

# loading file
#counties = pd.read_csv('')

# resetting index for keyword df
#keywordcounts1 = keywordcounts.set_index('municipality')

# joining
#keywordlocassgnd = keywordcounts1.join(counties, how= 'left', rsuffix = '_counties')

#keywordscounts = 

In [82]:
# individual barplots (color-coded by north or south)

# municipalities = []

#fig ax, = plt.subplots(size= (8, 12))

#keywordscount.plot (ax=ax, column = 'keyword', size = 'counts', legend = True, legendkwds = {orientation: ''})

In [81]:
# aggregated: by keyword clusters: frequency north vs. south


### (B) Proximity of Keywords in Plans: How is Food and Urban Agriculture Being Talked About?

This next exercise aims to make inferences about how topics like food are being talked about based on the proximity (distance of words) between keywords (e.g. "food" and "soil" or "food" and "safety"). 

For each plan, keyword set pairs will be selected and matrix math will be used to compare the distance between the individual words in those keyword set pairs. The goal is to produce 28 matrices for word pairs for each plan for the following:

    A. FOOD: safety, soil, sustainability, climate, environment, remediation, justice
    B. AGRICULTURE: safety, soil, sustainability, climate, environment, remediation, justice
    C. GARDEN: safety, soil, sustainability, climate, environment, remediation, justice
    D. FARM: safety, soil, sustainability, climate, environment, remediation, justice
    
-- and then to identify the frequency of word pairs that are within 100 or less words of each other in each municipality/plan. This would need to be done with a cleaned wordlist instead of the raw text, which might produce approximate word count distances for unique word pairs that exclude a lot of (filler) words, but would ultimately be more meaningful than distance measured by spaces in the raw text.

The hope is that this method will allow me to identify where (which municipalities) are talking about food and community agriculture in conjunction with environmental health and safety more generally.

In [225]:
# dictionary of clean plan wordlist positions for keywords: Alhambra

keywords = ['food', 'agriculture', 'garden', 'farm', 'fruit', 'vegetable', 'animal', 'soil', 
            'remediation', 'contaminate', 'carbon', 'sustainability', 'climate', 'environment', 
            'health', 'safety', 'justice']

list1 = [word for word in word_tokenize(Alhambra.lower()) 
                 if word not in swords]

# create empty dictionary to store topic mention counts
lposdict = {}

for key in keywords:
    # generate topic counts to populate dictionary frame indexed to topic
    positions = [i for i,w in enumerate(list1) if w.lower() == key]
    lposdict[key] = positions

lposdict

{'food': [9459, 9605, 10460, 11031, 11132, 11562, 18122, 20098, 20172, 21304],
 'agriculture': [14791],
 'garden': [4045, 9973, 9979, 9987, 10000, 10051, 10525, 12691, 14314],
 'farm': [1289],
 'fruit': [],
 'vegetable': [21296],
 'animal': [12717],
 'soil': [189,
  2841,
  11547,
  12421,
  12754,
  12803,
  18324,
  18333,
  18362,
  18378,
  18391,
  18959,
  20607,
  20622,
  20632,
  20643],
 'remediation': [9617],
 'contaminate': [],
 'carbon': [13830, 13842, 14068],
 'sustainability': [2017, 14360],
 'climate': [224,
  2013,
  2019,
  2612,
  2634,
  2909,
  12462,
  12819,
  12839,
  13045,
  13120,
  13793,
  13805,
  13822,
  13837,
  14065,
  14071,
  14081,
  14222,
  14236,
  14331,
  15284,
  15413,
  15509,
  15570,
  20216,
  20219,
  20269,
  20281,
  20286,
  20293,
  20311,
  20407,
  20461,
  20465,
  20620,
  21148,
  21169,
  21190,
  21200,
  21213,
  21217,
  21230],
 'environment': [939,
  1227,
  1518,
  2158,
  2161,
  2284,
  2662,
  2732,
  2753,
  4334,
  

In [21]:
# dictionary of clean plan wordlist positions for keywords: all plans

keywords = ['food', 'agriculture', 'garden', 'farm', 'fruit', 'vegetable', 'animal', 'soil', 
            'remediation', 'contaminate', 'carbon', 'sustainability', 'climate', 'environment', 
            'health', 'safety', 'justice']

# create empty dictionary to store topic mention counts
lposdictall = {}

for plan in genplan:
    wordlist = [word for word in word_tokenize(plan.lower()) 
                 if word not in swords and len(word)>2]
    
    for key in keywords:
        # generate topic counts to populate dictionary frame indexed to topic
        positions = [i for i,w in enumerate(wordlist) if w.lower() == key]
        lposdict[key, plan[8:25]] = positions

lposdictall

{}

Turning wordlists into arrays in order to do matrix math.

In [226]:
# create a numpy array for each keyword set: Alhambra

# pair: food and safety

#array of food positions
food = []

for keyword, positions in lposdict.items():
    for position in positions:
        if "food" in keyword:
            food = positions
            food = np.array(food)

#array of safety positions
safety = []

for keyword, positions in lposdict.items():
    for position in positions:
        if "safety" in keyword:
            safety = positions
            safety = np.array(safety)

# creating matrix with arrays of different shapes
foodsafety = abs(np.subtract(food, safety[:,None]))
print(len(foodsafety))

62


In [227]:
# identifying number of occurences in the array where distance <100 words
close = np.count_nonzero(foodsafety < 100)
close

# provide summary: ratio of close mentions to total mentions
print('Mentions of "food": {}'.format(len(food)))
print('Mentions of "safety": {}'.format(len(safety)))
print('Mentions of "food" within 100 words of "safety": {}'.format(close))
## duplicates: some singular instances of a word might be close to multiple mentions of another word
## e.g. one mention of food might be within 100 words of two mentions of safety

Mentions of "food": 10
Mentions of "safety": 62
Mentions of "food" within 100 words of "safety": 12


In [24]:
# (A) turning the whole dictionary into an array
Alhambrapositions = np.array(lposdict)
print(type(Alhambrapositions))

<class 'numpy.ndarray'>


In [229]:
## TEST CELL
keywords = ['soil', 'remediation', 'sustainability', 'climate', 'environment', 'safety', 'justice']

for keyword, positions in lposdict.items():
    for key in keywords:
        if key in keyword:
            print(key)

soil
remediation
sustainability
climate
environment
safety
justice


In [230]:
# create matrices from all keyword np arrays and food
# build into loop matrix subtraction and IDing within matrices frequencies of words w/i 100 words of each other

keywords = ['soil', 'remediation', 'sustainability', 'climate', 'environment', 'safety', 'justice']

for keyword, positions in lposdict.items():
    for key in keywords:
        key_list = []
        if key in keyword:
            key_list = positions
            key_array = np.array(key_list)
                
            # create matrices for keyword pairs
            food_key = abs(np.subtract(food, key_array[:,None])) # with different array shapes
            print('"{}" and "FOOD"'.format(key.upper()))
                    
            # identifying number of occurences in the array where distance <100 words
            close = np.count_nonzero(food_key < 100)
                
            # provide summary: ratio of close mentions to total mentions
            print('Mentions of "food": {}'.format(len(food)))
            print('Mentions of {}: {}'.format(key, len(key_array)))
            print('Mentions of "food" within 100 words of "{}": {}'.format(key, close))

"SOIL" and "FOOD"
Mentions of "food": 10
Mentions of soil: 16
Mentions of "food" within 100 words of "soil": 1
"REMEDIATION" and "FOOD"
Mentions of "food": 10
Mentions of remediation: 1
Mentions of "food" within 100 words of "remediation": 1
"SUSTAINABILITY" and "FOOD"
Mentions of "food": 10
Mentions of sustainability: 2
Mentions of "food" within 100 words of "sustainability": 0
"CLIMATE" and "FOOD"
Mentions of "food": 10
Mentions of climate: 43
Mentions of "food" within 100 words of "climate": 6
"ENVIRONMENT" and "FOOD"
Mentions of "food": 10
Mentions of environment: 33
Mentions of "food" within 100 words of "environment": 2
"SAFETY" and "FOOD"
Mentions of "food": 10
Mentions of safety: 62
Mentions of "food" within 100 words of "safety": 12
"JUSTICE" and "FOOD"
Mentions of "food": 10
Mentions of justice: 22
Mentions of "food" within 100 words of "justice": 13


In [232]:
# create matrices from all keyword np arrays and food and store in a df

keywords = ['soil', 'remediation', 'sustainability', 'climate', 'environment', 'safety', 'justice']

foodclosedict = {}

for keyword, positions in lposdict.items():
    for key in keywords:
        key_list = []
        if key in keyword:
            key_list = positions
            key_array = np.array(key_list)
                
            # create matrix for each keyword pair
            food_key = abs(np.subtract(food, key_array[:,None])) # with different array shapes
                    
            # identifying number of occurences in the array where distance <100 words
            close = np.count_nonzero(food_key < 100)
                
            # store in dictionary
            foodclosedict[key] = np.count_nonzero(food_key < 100)
                
    foodprox = pd.DataFrame.from_dict(foodclosedict, orient='index', columns=['food to keyword'])
        
foodprox

Unnamed: 0,food to keyword
soil,1
remediation,1
sustainability,0
climate,6
environment,2
safety,12
justice,13


In [256]:
# (B) turn each keyset list in Alhambra dict into separate array that can be indexed/called in a loop

keywords1 = ['food', 'agriculture', 'garden', 'farm'] 
keywords2 = ['soil', 'remediation', 'sustainability', 'climate', 'environment', 'safety', 'justice']

for keyword, positions in lposdict.items():
    # turn position list in each keyword in keyset1 into an array
    for key1 in keywords1:
        key_list1 = {}
        if key1 in keyword:
            key_list1 = positions
            key_array1 = np.array(key_list1)
        # turn position list in each keyword in keyset2 into an array
    for key2 in keywords2:
        key_list2 = {}
        if key2 in keyword:
            key_list2 = positions
            key_array2 = np.array(key_list2)
                
            # create a matrix from each unique keyset array pair
            key1_key2 = abs(np.subtract(key_array1, key_array2[:,None])) # with different array shapes
                
            # identify the unique keyword pair
            print('"{}" and "{}"'.format(key1.upper(), key2.upper()))
                
            # identify the number of occurences in the keyword pair where distance between them <100 words
            close = np.count_nonzero(key1_key2 < 100)
                
            # provide summary: ratio of close mentions to total mentions of each word
            print('Mentions of "{}": {}'.format(key1, len(key_array1)))
            print('Mentions of "{}": {}'.format(key2, len(key_array2)))
            print('Mentions of "{}" within 100 words of "{}": {}'.format(key1, key2, close))

"FARM" and "SOIL"
Mentions of "farm": 1
Mentions of "soil": 16
Mentions of "farm" within 100 words of "soil": 0
"FARM" and "REMEDIATION"
Mentions of "farm": 1
Mentions of "remediation": 1
Mentions of "farm" within 100 words of "remediation": 0
"FARM" and "SUSTAINABILITY"
Mentions of "farm": 1
Mentions of "sustainability": 2
Mentions of "farm" within 100 words of "sustainability": 0
"FARM" and "CLIMATE"
Mentions of "farm": 1
Mentions of "climate": 43
Mentions of "farm" within 100 words of "climate": 0
"FARM" and "ENVIRONMENT"
Mentions of "farm": 1
Mentions of "environment": 33
Mentions of "farm" within 100 words of "environment": 1
"FARM" and "SAFETY"
Mentions of "farm": 1
Mentions of "safety": 62
Mentions of "farm" within 100 words of "safety": 1
"FARM" and "JUSTICE"
Mentions of "farm": 1
Mentions of "justice": 22
Mentions of "farm" within 100 words of "justice": 0


In [267]:
# (B) turn each keyset list in Alhambra dict into separate array that can be indexed/called in a loop

keywords1 = ['food', 'agriculture', 'garden', 'farm'] 
keywords2 = ['soil', 'remediation', 'sustainability', 'climate', 'environment', 'safety', 'justice']

for keyword, positions in lposdict.items():
    # turn position list in each keyword in keyset1 into an array
    for key1 in keywords1:
        key_list1 = []
        if key1 in keyword:
            key_list1 = positions
            key_array1 = np.array(key_list1)
    
        # turn position list in each keyword in keyset2 into an array
        for key2 in keywords2:
            key_list2 = []
            if key2 in keyword:
                key_list2 = positions
                key_array2 = np.array(key_list2)

                # create a matrix from each unique keyset array pair
                key1_key2 = abs(np.subtract(key_array1, key_array2[:,None])) # with different array shapes
    
                # identify the unique keyword pair
                print('"{}" and "{}"'.format(key1.upper(), key2.upper()))

                # identify the number of occurences in the keyword pair where distance between them <100 words
                close = np.count_nonzero(key1_key2 < 100)

                # provide summary: ratio of close mentions to total mentions of each word
                print('Mentions of "{}": {}'.format(key1, len(key_array1)))
                print('Mentions of "{}": {}'.format(key2, len(key_array2)))
                print('Mentions of "{}" within 100 words of "{}": {}'.format(key1, key2, close))

"FOOD" and "SOIL"
Mentions of "food": 1
Mentions of "soil": 16
Mentions of "food" within 100 words of "soil": 0
"AGRICULTURE" and "SOIL"
Mentions of "agriculture": 1
Mentions of "soil": 16
Mentions of "agriculture" within 100 words of "soil": 0
"GARDEN" and "SOIL"
Mentions of "garden": 1
Mentions of "soil": 16
Mentions of "garden" within 100 words of "soil": 0
"FARM" and "SOIL"
Mentions of "farm": 1
Mentions of "soil": 16
Mentions of "farm" within 100 words of "soil": 0
"FOOD" and "REMEDIATION"
Mentions of "food": 1
Mentions of "remediation": 1
Mentions of "food" within 100 words of "remediation": 0
"AGRICULTURE" and "REMEDIATION"
Mentions of "agriculture": 1
Mentions of "remediation": 1
Mentions of "agriculture" within 100 words of "remediation": 0
"GARDEN" and "REMEDIATION"
Mentions of "garden": 1
Mentions of "remediation": 1
Mentions of "garden" within 100 words of "remediation": 0
"FARM" and "REMEDIATION"
Mentions of "farm": 1
Mentions of "remediation": 1
Mentions of "farm" within 

In [279]:
# (B) turn each keyset list in Alhambra dict into separate array that can be indexed/called in a loop

keywords1 = ['food', 'agriculture', 'garden', 'farm'] 
keywords2 = ['soil', 'remediation', 'sustainability', 'climate', 'environment', 'safety', 'justice']

store = {}
closedict = []

for keyword, positions in lposdict.items():
    # turn position list in each keyword in keyset1 into an array
    for key1 in keywords1:
        key_list1 = []
        if key1 in keyword:
            key_list1 = positions
            key_array1 = np.array(key_list1)
    
    # turn position list in each keyword in keyset2 into an array
        for key2 in keywords2:
            key_list2 = []
            if key2 in keyword:
                key_list2 = positions
                key_array2 = np.array(key_list2)

                # create a matrix from each unique keyset array pair
                key1_key2 = abs(np.subtract(key_array1, key_array2[:,None])) # with different array shapes

                # identify the number of occurences in the keyword pair where distance between them <100 words
                close = np.count_nonzero(key1_key2 < 100)

                # store in dictionary
                store[key2] = np.count_nonzero(key1_key2 < 100)
                
                # dict to df
                prox = pd.DataFrame.from_dict(store, orient='index', columns=['key1distance'])
           
        # append dfs onto master df
        closedict.append(store)
        
#proximities = pd.concat(closedict)

#proximities
store

{'soil': 0,
 'remediation': 0,
 'sustainability': 0,
 'climate': 0,
 'environment': 1,
 'safety': 1,
 'justice': 0}

### (C) Modified Topic Modeling: How Are Municipalities Treating Food & Urban Agriculture as Policy Priority?

The last exercise focused on identifying/verifying whether food and urban agriculture was being talked about in relation to a specific set of topics and keywords (health, safety, sustainability, climate). This is a variation on topic modeling that focuses more agnostically on finding the common words/topics around mentions of food and urban agriculture in general plans to gauge how food is talked about in municipalities more generally. 

In [60]:
# creating a keyword subset dictionary for topic modeling purposes
textdictsub = {}

keyloc = ['food', 'agriculture', 'garden', 'farm', 'fruit', 'vegetable', 'animal']

for plan in genplan:
    for key in keyloc:
        # text positions
        textpositionsub = [i for i in range(len(plan)) if plan.startswith(key, i)] # adapted: https://www.geeksforgeeks.org/python-all-occurrences-of-substring-in-string/#
        textdictsub[key, plan.split(", ")[0]] = textpositionsub

textdictsub

{('food', 'Alhambra General Plan'): [28264,
  92775,
  102247,
  107926,
  109065,
  113179,
  178165,
  198174,
  198866,
  209796,
  209884,
  210207],
 ('agriculture', 'Alhambra General Plan'): [144819],
 ('garden', 'Alhambra General Plan'): [37125,
  39045,
  65339,
  97617,
  97662,
  97853,
  98611,
  102899,
  116871,
  118153,
  166246,
  166449,
  210144],
 ('farm', 'Alhambra General Plan'): [11630],
 ('fruit', 'Alhambra General Plan'): [101651],
 ('vegetable', 'Alhambra General Plan'): [98417, 101658, 210134],
 ('animal', 'Alhambra General Plan'): [124235],
 ('food', 'City of Commerce 2020 General Plan'): [8724, 98911, 575597, 621711],
 ('agriculture', 'City of Commerce 2020 General Plan'): [120021],
 ('garden', 'City of Commerce 2020 General Plan'): [],
 ('farm', 'City of Commerce 2020 General Plan'): [193528,
  318124,
  318336,
  318611],
 ('fruit', 'City of Commerce 2020 General Plan'): [],
 ('vegetable', 'City of Commerce 2020 General Plan'): [],
 ('animal', 'City of Com

In [132]:
# loop: try getting all segs for one keyword in Alhambra plan

# keyword
key = 'food'

# empty frame to store keys and created word segments for key
foodsegs = {}

# list: keyword positions for "food" in Alhambra
foodpositionsr = textdict['food']

# loop: for ea position of key in plan, create segment = +/- 200 words of key mention
for foodpr in foodpositionsr: 
    # add row to dict for ea unique key mention and corresponding word segments
    foodsegs[key, foodpr] = Alhambra[foodpr-800:foodpr+800]
    
    # turn dict into df: store ea segment for keyword mention
    foodsegsdf = pd.DataFrame.from_dict(foodsegs, orient='index', columns=['foodseg'])
    foodsegsdf.index.name = 'key'
    
    # add muni ID column
    muniID = Alhambra.lower().find("city of ")
    foodsegsdf['municipality'] = Alhambra.split(", ")[0]
    
foodsegsdf

Unnamed: 0_level_0,foodseg,municipality
key,Unnamed: 1_level_1,Unnamed: 2_level_1
"(food, 28264)",ications system accessible throughout the comm...,Alhambra General Plan
"(food, 92775)",y Crafting land use policies that allow for fi...,Alhambra General Plan
"(food, 102247)",ultural amenities Alhambra hosts a number of c...,Alhambra General Plan
"(food, 107926)",ate environmental justice policies into their ...,Alhambra General Plan
"(food, 109065)",o CalEnviroscreen a program associated with th...,Alhambra General Plan
"(food, 113179)",cs Policy Numbers Land Use and Community Desig...,Alhambra General Plan
"(food, 178165)",resources available to respond when a public h...,Alhambra General Plan
"(food, 198174)",he community and input received from the publi...,Alhambra General Plan
"(food, 198866)",the communitys quality of life and economic v...,Alhambra General Plan
"(food, 209796)",tion of local health care facilities that meet...,Alhambra General Plan


In [134]:
# getting all segs for all keywords in Alhambra plan

# empty frame to store keys and created word segments for key
segs = {}

# for ea key in keylist, for ea position of key in plan, create segment = +/- 200 words of key mention
# using ea position for ea unique mention of key:
for keyword, positions in textdict.items():
    for position in positions:
        # add row to dict for ea unique key mention and corresponding word segments
        segs[keyword, position] = Alhambra[position-800:position+800]
            
    # turn dict into df: store ea segment for ea keyword in keyloc (for ea plan in genplan)
    segsdf = pd.DataFrame.from_dict(segs, orient='index', columns=['seg'])
    segsdf.index.name = 'key'
   
    # add muni ID column
    muniID = Alhambra.lower().find("city of ")
    segsdf['municipality'] = Alhambra.split(", ")[0]

segsdf

Unnamed: 0_level_0,seg,municipality
key,Unnamed: 1_level_1,Unnamed: 2_level_1
"(food, 28264)",ications system accessible throughout the comm...,Alhambra General Plan
"(food, 92775)",y Crafting land use policies that allow for fi...,Alhambra General Plan
"(food, 102247)",ultural amenities Alhambra hosts a number of c...,Alhambra General Plan
"(food, 107926)",ate environmental justice policies into their ...,Alhambra General Plan
"(food, 109065)",o CalEnviroscreen a program associated with th...,Alhambra General Plan
...,...,...
"(justice, 108710)",blic facilities food access safe and sanitary ...,Alhambra General Plan
"(justice, 109345)",nmental justice communities can be defined bot...,Alhambra General Plan
"(justice, 110740)",igure Disadvantaged Communities ALHAMBRA Gener...,Alhambra General Plan
"(justice, 111856)",eles County and in the state The percentage of...,Alhambra General Plan


In [66]:
# getting all segs for all keywords in all plans

# empty frame to store keys and created word segments for key
allsegs = {}

# for ea key in keylist, for ea position of key in plan, create segment = +/- 200 words of key mention
# using ea position for ea unique mention of key:
for plan in genplan:
    for keyword, positions in textdictsub.items():
        for position in positions:
            # add row to dict for ea unique key mention and corresponding word segments
            allsegs[keyword, position] = plan[position-800:position+800]
            
        # turn dict into df: store ea segment for ea keyword in keyloc (for ea plan in genplan)
        allsegsdf = pd.DataFrame.from_dict(allsegs, orient='index', columns=['seg'])
        allsegsdf.index.name = 'key'
    
allsegsdf

Unnamed: 0_level_0,seg
key,Unnamed: 1_level_1
"((food, Alhambra General Plan), 28264)",rial and residential uses Vernons city boundar...
"((food, Alhambra General Plan), 92775)",that the City owns of these units original dat...
"((food, Alhambra General Plan), 102247)",between and percent of the Los Angeles County ...
"((food, Alhambra General Plan), 107926)",tardation but shall not include other handicap...
"((food, Alhambra General Plan), 109065)",ersons with special needs and disabilities Lar...
...,...
"((animal, Vernon_General_Plan), 164757)",s located near the intersection of nd Street a...
"((animal, Vernon_General_Plan), 290685)",broad homogeneous area Glossary Vernon General...
"((animal, Vernon_General_Plan), 290784)",The Regional Housing Needs Assessment RHNA is...
"((animal, Vernon_General_Plan), 290872)",of population growth and housing unit demand a...


In [67]:
# converting the column containing the segments in the dataframe series (seg) into a list of strings
allsegslist = allsegsdf.seg.values

# turning list of segment strings into a list of segment lists
# list of strings: ea string is a narrow word segment surrounding each unique keyword mention
planseglists = [[word for word in word_tokenize(seg.lower())
                 if word not in swords and len(word)>2] for seg in allsegslist]

In [68]:
print('There are {} total segments in planseglists.'.format(len(planseglists)))

There are 66 total segments in planseglists.


Taking the list of segments for all unique keyword mentions for each keyword for each plan and passing it through GENSIM for topic modeling.

In [None]:
# generating topic model: start w/ 5 topics
dictionary = gensim.corpora.Dictionary(planseglists)
corpus = [dictionary.doc2bow(planseg) for planseg in planseglists]
# LdaMulticore uses multiple cores (thus, it runs faster); if you have problems, try replacing LdaMulticore with LdaModel
model = gensim.models.LdaMulticore(corpus, id2word=dictionary, num_topics=5)

# show topics
model.show_topics()

### (D) Environmental Justice at the Municipal Level: Are Food Systems Part of the Picture?

This last exercise is focused on replicating the previous exercises exclusively on the Environmental Justice Elements of General Plans (for those plans that contain them). In 2016, the state of California passed legislation requiring all municipalities to include environmental justice planning in their overall city planning.