# Food Safety and Community Food Production

An exploration of how narratives of "safety" in land use policy impact urban agriculture in select CA Counties: LA, Ventura, Sonoma, Mendocino, and Lake Counties

## The Data

112 total general plans from the five CA Counties (LA, Ventura, Sonoma, Mendocino, Lake)


1. Basic word counts:

2. Modified/precision topic modeling: How often do mentions of food, agriculture, soil health, food safety feature in plans? And as important, where are they mentioned?

In [2]:
# libraries
import pandas as pd
import geopandas as gpd
import numpy as np

# visualizations
import contextily as ctx
import matplotlib.pyplot as plt
import seaborn as sn

# census
import cenpy
from cenpy import products

# set display
pd.options.display.max_columns = 150
#pd.options.display.max_rows = 150

In [3]:
# libraries
import os

# extract text
from pdfminer.high_level import extract_text
import re

# stop words; split sentences; stems
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer

# topic modeling
import gensim
help(gensim.models.LdaMulticore)
import pyLDAvis
import pyLDAvis.gensim_models

Help on class LdaMulticore in module gensim.models.ldamulticore:

class LdaMulticore(gensim.models.ldamodel.LdaModel)
 |  LdaMulticore(corpus=None, num_topics=100, id2word=None, workers=None, chunksize=2000, passes=1, batch=False, alpha='symmetric', eta=None, decay=0.5, offset=1.0, eval_every=10, iterations=50, gamma_threshold=0.001, random_state=None, minimum_probability=0.01, minimum_phi_value=0.01, per_word_topics=False, dtype=<class 'numpy.float32'>)
 |  
 |  An optimized implementation of the LDA algorithm, able to harness the power of multicore CPUs.
 |  Follows the similar API as the parent class :class:`~gensim.models.ldamodel.LdaModel`.
 |  
 |  Method resolution order:
 |      LdaMulticore
 |      gensim.models.ldamodel.LdaModel
 |      gensim.interfaces.TransformationABC
 |      gensim.utils.SaveLoad
 |      gensim.models.basemodel.BaseTopicModel
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, corpus=None, num_topics=100, id2word=None, workers=

#### The Plan Elements

After downloading all plans, the plans were split by element and only the "Land Use" and "Open Space/Conservation" Elements will be included.

In [4]:
# loading plans
planlist = os.listdir('C:/Users/melod/Documents/data science/Food-Systems-Policy-Research/Food Systems and General Plans/General Plans')
planlist

['.ipynb_checkpoints',
 'Agoura Hills.pdf',
 'Alhambra.pdf',
 'Arcadia.pdf',
 'Artesia.pdf',
 'Avalon.pdf',
 'Azusa.pdf',
 'Baldwin Park.pdf',
 'Bell Gardens.pdf',
 'Bell.pdf',
 'Bellflower.pdf',
 'Beverly Hills.pdf',
 'Bradbury.pdf',
 'Burbank.pdf',
 'Calabasas.pdf',
 'Camarillo.pdf',
 'Carson.pdf',
 'Cerritos.pdf',
 'Claremont.pdf',
 'Clearlake.pdf',
 'Commerce.PDF',
 'Compton.pdf',
 'Cotati.pdf',
 'Covina.pdf',
 'Cudahy.pdf',
 'Culver City.pdf',
 'Diamond Bar.pdf',
 'Downey.pdf',
 'Duarte.pdf',
 'El Monte.pdf',
 'El Segundo.pdf',
 'Filmore.pdf',
 'Fort Bragg.pdf',
 'Glendale.pdf',
 'Hawaiian Gardens.pdf',
 'Hawthorne.pdf',
 'Healdsburg.pdf',
 'Hermosa Beach.pdf',
 'Hidden Hills.pdf',
 'Huntington Park.pdf',
 'Industry.pdf',
 'Inglewood.pdf',
 'Irwindale.pdf',
 'La Canada Flintridge.pdf',
 'La Habra Heights.pdf',
 'La Mirada.pdf',
 'La Puente.pdf',
 'La Verne.pdf',
 'Lakeport.pdf',
 'Lancaster.pdf',
 'Lawndale.pdf',
 'Lomita.pdf',
 'Long Beach.pdf',
 'Lynwood.pdf',
 'Malibu.pdf',
 'M

In [None]:
# extracting text

# function
def readPDF(planname):
    txt = extract_text('C:/Users/melod/Documents/data science/Food-Systems-Policy-Research/Food Systems and General Plans/General Plans/'+planname)
    
    # remove punctuation, numbers, etc.
    txt = re.sub(r"[^A-z\s]", "", txt)
    # remove whitepace
    txt = re.sub(r"\s+", " ", txt) 
    
    # cleaning up name
    planname = planname.split(".")[0]
    
    # insert muni(planname) to the beginning of plan: creating indexable muni ID within each plan string
    txt = planname+", "+txt
    
    print('Finished {}'.format(planname))
    return txt

# read in all pdf files
suffixes = ('.pdf', '.PDF')
genplan = [readPDF(pn) for pn in planlist if pn.endswith(suffixes)]

Finished Agoura Hills
Finished Alhambra
Finished Arcadia
Finished Artesia
Finished Avalon
Finished Azusa
Finished Baldwin Park
Finished Bell Gardens
Finished Bell
Finished Bellflower
Finished Beverly Hills
Finished Bradbury
Finished Burbank
Finished Calabasas
Finished Camarillo
Finished Carson
Finished Cerritos
Finished Claremont
Finished Clearlake
Finished Commerce
Finished Compton
Finished Cotati
Finished Covina
Finished Cudahy
Finished Culver City
Finished Diamond Bar
Finished Downey
Finished Duarte
Finished El Monte
Finished El Segundo
Finished Filmore
Finished Fort Bragg
Finished Glendale
Finished Hawaiian Gardens
Finished Hawthorne
Finished Healdsburg
Finished Hermosa Beach
Finished Hidden Hills
Finished Huntington Park
Finished Industry
Finished Inglewood
Finished Irwindale
Finished La Canada Flintridge
Finished La Habra Heights
Finished La Mirada
Finished La Puente
Finished La Verne
Finished Lakeport
Finished Lancaster
Finished Lawndale
Finished Lomita
Finished Long Beach
Finis

In [None]:
# save list of plans
genplan.to_file()

In [7]:
# establish list of stopwords to exclude
swords = stopwords.words('english')

#### Keyword Count (RAW TEXT VERSION: Space positions)

Loop to return only count of select keywords for each plan in genplan and store in stacked df.

In [12]:
# saving positions of keywords in raw plan text to a dictionary: all plans
textdictall = {}
plan_cols = []

keywords = ['food', 'agriculture', 'garden', 'farm', 'fruit', 'vegetable', 'animal', 'soil', 
            'remediation', 'contaminate', 'sustainability', 'climate', 'environment', 
            'health', 'safety', 'justice']

for plan in genplan:
    for key in keywords:
        # identify text positions of keyword mentions
        textpositionsall = [i for i in range(len(plan)) if plan.startswith(key, i)] # adapted: https://www.geeksforgeeks.org/python-all-occurrences-of-substring-in-string/#
        # save list of positions for each keyword in a dictionary
        textdictall[key] = textpositionsall

    textdict = {}

    # generate counts for each keyword by summing the number of positions in its position list
    for keyword in textdictall:
        for position in keyword:
            # store counts in dictionary
            textdict[keyword] = len(textdictall[keyword])

        # turn dicts into dfs
        keywordscount = pd.DataFrame.from_dict(textdict, orient='index', columns=[plan.split(", ")[0]]) # muni plan name as series name
    
    # add each municipality column generated to a single dataframe
    plan_cols.append(keywordscount)
    munikeys = pd.concat(plan_cols, axis = 1)

# inspect/show
munikeys

{('food', 'Alhambra General Plan'): [28264,
  92775,
  102247,
  107926,
  109065,
  113179,
  178165,
  198174,
  198866,
  209796,
  209884,
  210207],
 ('agriculture', 'Alhambra General Plan'): [144819],
 ('garden', 'Alhambra General Plan'): [37125,
  39045,
  65339,
  97617,
  97662,
  97853,
  98611,
  102899,
  116871,
  118153,
  166246,
  166449,
  210144],
 ('farm', 'Alhambra General Plan'): [11630],
 ('fruit', 'Alhambra General Plan'): [101651],
 ('vegetable', 'Alhambra General Plan'): [98417, 101658, 210134],
 ('animal', 'Alhambra General Plan'): [124235],
 ('soil', 'Alhambra General Plan'): [18274,
  27578,
  121375,
  125060,
  178858,
  178959,
  178977,
  179005,
  179188,
  180326,
  180400,
  180684,
  180861,
  181002,
  182443,
  186816,
  203699,
  203796],
 ('remediation', 'Alhambra General Plan'): [],
 ('contaminate', 'Alhambra General Plan'): [],
 ('carbon', 'Alhambra General Plan'): [134877, 135016, 137259],
 ('sustainability', 'Alhambra General Plan'): [140202]

In [None]:
# reorienting table the keyword counts so they can be joined to other gdfs
munikeyst = munikeys.transpose()

# naming index column
munikeyst.index.name = "City"

# show
munikeyst

In [None]:
# saving keyword counts table
munikeyst.to_csv("Plan Keyword Counts.csv")

Now to visualize which cities potentially include community food production in their long-range land use planning  goals and strategies, the keyword count table will be merged with the table including municipal boundaries and preliminary information about the density of urban agriculture sites and facilities that produce toxic waste in a given municipality (created in the "LAC Map" notebook).

In [79]:
# joining keyword counts table to table with geocoded municipalities with preliminary UA site and Toxic Release densities

# loading file
LAC = gpd.read_file('C:/Users/melod/Documents/data science/Food-Systems-Policy-Research/Food Systems and General Plans/LACfinal.json')

# resetting index for keyword df
keywordcounts1 = keywordcounts.set_index('municipality')

# joining
LACmunikeys = LAC.join(munikeyst, how= 'left', rsuffix = '_counties')

In [82]:
# individual barplots (color-coded by north or south)

# municipalities = []

#fig ax, = plt.subplots(size= (8, 12))

#keywordscount.plot (ax=ax, column = 'keyword', size = 'counts', legend = True, legendkwds = {orientation: ''})

In [81]:
# aggregated: by keyword clusters: frequency north vs. south


### (B) Proximity of Keywords in Plans: How is Food and Urban Agriculture Being Talked About?

This next exercise aims to make inferences about how topics like food are being talked about based on the proximity (distance of words) between keywords (e.g. "food" and "soil" or "food" and "safety"). 

For each plan, keyword set pairs will be selected and matrix math will be used to compare the distance between the individual words in those keyword set pairs. The goal is to produce 28 matrices for word pairs for each plan for the following:

    Set 1: FOOD, AGRICULTURE, GARDEN, FARM
    Set 2: safety, soil, sustainability, climate, environment, remediation, justice, public, residential
    
-- and then to identify the frequency of word pairs that are within 100 or less words of each other in each municipality/plan. This would need to be done with a cleaned wordlist instead of the raw text, which might produce approximate word count distances for unique word pairs that exclude a lot of (filler) words, but would ultimately be more meaningful than distance measured by spaces in the raw text.

The hope is that this method will allow me to identify where (which municipalities) are talking about food and community agriculture in conjunction with environmental health and safety more generally.

Turning wordlists into arrays in order to do matrix math.

In [27]:
# for each plan
keywords1 = ['food', 'garden', 'farm', 'agriculture']  
keywords2 = ['soil', 'remediation', 'sustainability', 'climate', 'environment', 'safety', 'justice',
            'public', 'residential']

proxim = {}
plan_col = []

for plan in genplan:
    
    # create a wordlist for each plan
    wordlist = [word for word in word_tokenize(plan.lower()) 
                 if word not in swords]
    
    # create dictionaries for keyword set 1 for each plan
    lpostdict1 = {}
    for key1 in keywords1:
        # generate topic counts to populate dictionary frame indexed to topic
        positions1 = [i for i,w in enumerate(wordlist) if w.lower() == key1]
        lposdict1[key1] = positions1

    # create dictionaries for keyword set 2 for each plan
    lpostdict2 = {}
    for key2 in keywords2:
        # generate topic counts to populate dictionary frame indexed to topic
        positions2 = [i for i,w in enumerate(wordlist) if w.lower() == key2]
        lposdict2[key2] = positions2
    
    for key1, positions1 in lposdict1.items():
        for key2, positions2 in lposdict2.items():

            # create subtraction matrix for each keyword pair
            key1_key2 = abs(np.subtract(np.array(positions1), np.array(positions2)[:,None])) # with different array shapes
            # count occurences in each matrix where distance between keywords is <100 words + store in dictionary 
            proxim[key1, key2] = np.count_nonzero(key1_key2 < 100)
            # convert dictionary to dataframe series/column for each municipality
            prox = pd.DataFrame.from_dict(proxim, orient='index', columns=[plan.split(", ")[0]])
    
    # add each muni column generated to a single dataframe
    plan_col.append(prox)
    muniprox = pd.concat(plan_col, axis=1)

muniprox

Unnamed: 0,Alhambra
"(food, soil)",1
"(food, remediation)",1
"(food, sustainability)",0
"(food, climate)",6
"(food, environment)",2
"(food, safety)",12
"(food, justice)",13
"(food, public)",15
"(food, residential)",0
"(garden, soil)",1


In [None]:
# reorienting table the keyword pair frequencies can be joined to other gdfs
muniproxt = muniprox.transpose()
muniproxt

In [None]:
# saving keyword pairs table
#muniproxt.to_csv("Plan Keyword Proximities.csv")

### (C) Modified Topic Modeling: How Are Municipalities Treating Food & Urban Agriculture as Policy Priority?

The last exercise focused on identifying/verifying whether food and urban agriculture was being talked about in relation to a specific set of topics and keywords (health, safety, sustainability, climate). This is a variation on topic modeling that focuses more agnostically on finding the common words/topics around mentions of food and urban agriculture in general plans to gauge how food is talked about in municipalities more generally. 

In [60]:
# creating a keyword subset dictionary for topic modeling purposes

# create empty dictionary to store keyword positions lists
textdictsub = {}
# create empty dictionary to store created word segments around each keyword

keyloc = ['food', 'garden', 'farm', 'agriculture', 'fruit', 'vegetable', 'animal']

for plan in genplan:
    for key in keyloc:
        # identify position for each keyword in the raw text text
        textpositionsub = [i for i in range(len(plan)) if plan.startswith(key, i)] # adapted: https://www.geeksforgeeks.org/python-all-occurrences-of-substring-in-string/#
        # save to a dictionary
        textdictsub[key, plan.split(", ")[0]] = textpositionsub

    # for each unique keyword mention in each plan, create a segment of +/- 200 words around the keyword
    for keyword, positions in textdictsub.items():
        for position in positions:
            # add row to dict for ea unique key mention and corresponding word segments
            allsegs[keyword, position] = plan[position-800:position+800]
            
        # turn dict into df: store ea segment for ea keyword in keyloc (for ea plan in genplan)
        allsegsdf = pd.DataFrame.from_dict(allsegs, orient='index', columns=['segment'])

# inspect/show    
allsegsdf

{('food', 'Alhambra General Plan'): [28264,
  92775,
  102247,
  107926,
  109065,
  113179,
  178165,
  198174,
  198866,
  209796,
  209884,
  210207],
 ('agriculture', 'Alhambra General Plan'): [144819],
 ('garden', 'Alhambra General Plan'): [37125,
  39045,
  65339,
  97617,
  97662,
  97853,
  98611,
  102899,
  116871,
  118153,
  166246,
  166449,
  210144],
 ('farm', 'Alhambra General Plan'): [11630],
 ('fruit', 'Alhambra General Plan'): [101651],
 ('vegetable', 'Alhambra General Plan'): [98417, 101658, 210134],
 ('animal', 'Alhambra General Plan'): [124235],
 ('food', 'City of Commerce 2020 General Plan'): [8724, 98911, 575597, 621711],
 ('agriculture', 'City of Commerce 2020 General Plan'): [120021],
 ('garden', 'City of Commerce 2020 General Plan'): [],
 ('farm', 'City of Commerce 2020 General Plan'): [193528,
  318124,
  318336,
  318611],
 ('fruit', 'City of Commerce 2020 General Plan'): [],
 ('vegetable', 'City of Commerce 2020 General Plan'): [],
 ('animal', 'City of Com

In [67]:
# converting the list of segments in the dataframe (the segment series) into a list of strings
allsegslist = allsegsdf.seg.values

# turning list of segment strings into a list of segment lists
# list of strings: ea string is a narrow word segment surrounding each unique keyword mention
planseglists = [[word for word in word_tokenize(seg.lower())
                 if word not in swords and len(word)>2] for seg in allsegslist]

In [68]:
print('There are {} total segments in planseglists.'.format(len(planseglists)))

There are 66 total segments in planseglists.


Taking the list of segments for all unique keyword mentions for each keyword for each plan and passing it through GENSIM for topic modeling.

In [None]:
# generating topic model: start w/ 5 topics
dictionary = gensim.corpora.Dictionary(planseglists)
corpus = [dictionary.doc2bow(planseg) for planseg in planseglists]
# LdaMulticore uses multiple cores (thus, it runs faster); if you have problems, try replacing LdaMulticore with LdaModel
model = gensim.models.LdaMulticore(corpus, id2word=dictionary, num_topics=5)

# show topics
model.show_topics()

### (D) Future Analysis: Environmental Justice at the Municipal Level: Are Food Systems Part of the Picture?

This last exercise is focused on replicating the previous exercises exclusively on the Environmental Justice Elements of General Plans (for those plans that contain them). In 2016, the state of California passed legislation requiring all municipalities to include environmental justice planning in their overall city planning.