# Capstone Project: U.S. Farmers Markets and County Income Analysis
### Applied Data Science Course from IBM/Coursera

<img src="https://www.usda.gov/sites/default/files/ams-fm-produce-release-042919.jpg">

**Abstract**: Nutrition and health in the United States are closely related in the study of public health. Local farmers markets have proliferated as a means to distribute fresh produce directly to consumers, skipping the costly distribution and packaging step. However, one criticism of farmers markets is that they are largely inaccessible to many Americans, due to location, especially to those of low income and socio-economic status. This project draws upon all the registered farmers markets in the United States in 2020 as well as economic statistics for each county to investigate the correlation of income and access to farmers markets. This preliminary study used multiple regression modeling to find a positive correlation of the density of farmers markets in a county to the per capita income. However, there was a negative correlation of the density of farmers markets to the county’s median household/family income. Overall, there was no clear trend that greater income in a given county affected the density of farmers markets. Furthermore, county segmentation by K-means found an optimal of 3 clusters of counties by the elbow method. Four counties (Maricopa County, AZ; Los Angeles County, CA; Harris County, TX; and Cook County, IL) were identified as great potential for new farmers markets due to low market density, high population and relatively high income.

> ## Table of contents:
* [Introduction](#first-section)
* [Data Sources](#second-section)
* [Methodology](#third-section)
* [Results](#fourth-section)
* [Discussion](#fifth-section)
* [Conclusion](#sixth-section)
* [References](#seventh-section)

## Introduction<a class="anchor" id="first-section"></a>

<img src="https://helpingpublicmarketsgrow.files.wordpress.com/2018/08/why-fm-graphics-2018.jpg?w=500">

**Background:** Over the past century, public health is the United States has achieved tremendous success in increasing the longevity and productivity of life. At the same time, due in part of the changes in lifestyle behaviors, the rates of non-communicable diseases, specifically, chronic diet related diseases have risen. Some of these most common diseases include cardiovascular disease, high blood pressure, type 2 diabetes, some cancers, and poor bone health. Alarmingly, roughly half of all American adults (117 million individuals) have one or more preventable chronic diseases more preventable chronic diseases. [1] The impact of chronic disease is also felt economically, as in 2008, the medical costs associated with obesity were estimated to be \\$147 billion a year. In 2017, the total estimated cost of diagnosed diabetes was \\$327 billion in medical costs and loss of productivity. [2-4]

**Problem:** Having access to healthy, safe, and affordable food choices is crucial for an individual to live a healthy lifestyle. Food access is influenced by diverse factors, including proximity to food retail outlets (e.g., distance to a grocery store/supermarket/market or overall density of markets), individual resources (e.g., income to dictate spending), and neighborhood-level resources (e.g., average income of the neighborhood/county to subsidize and incentivize agricultural infrastructure). Innovative approaches to food access such as farmers markets, mobile markets, shelters, food banks and community gardens/cooperatives have emerged as an alternative and improvement to grocery stores. However, the limited location of farmers markets makes their accessibility restricted. It should be noted that race/ethnicity, socioeconomic status, and the presence of a disability also may affect an individual’s ability to access foods to support healthy eating patterns. However, the in depth study of these other contributing factors is outside the scope of this project.

**Interest:** This project seeks to visualize and communicate the relationship between the location and density of farmers markets and the economic status of U.S. counties. Knowing which regions and counties are lacking in farmers markets is critical both from a public health perspective and from an economic opportunity perspective. Local municipalities and city councils may use this information to organize and petition for additional farmers markets to meet the needs of their constituents. From a purely business perspective, new vendors and farmers can find target areas lacking in competition in order to sell to underserved communities.

## Data Sources<a class="anchor" id="second-section"></a>

The data sources were uploaded to [Kaggle](https://www.kaggle.com/) by [Madeleine Ferguson](https://www.kaggle.com/madeleineferguson/farmers-markets-in-the-united-states). [5] The farmers market data is maintained by the USDA Agricultural Marketing Service. [6] The farmers market listings include market locations, directions, operating times, product offerings, accepted forms of payment, and more. The list of United States counties by per capita income is from the 2009-2013 American Community Survey 5-Year Estimates; data for Puerto Rico is from the 2013-2017 American Community Survey 5-Year estimates, and data for the other U.S. territories is from the 2010 U.S. Census. [7-10] The data contains the name of the county (or equivalent), state or territory, per capita income, median household income, median family income, population, and number of households. An alternative method was considered to obtain farmers market information from the Foursquare API using a free developer user. The resulting searches are limited to 50 responses and the maximum radius of search is 100,000 meters. Foursquare API has advantages to tabulate all the venues near a particular position, but is cumbersome for acquiring clean data such as all farmers markets in the United States. Thus, farmers market data was taken from ams.usda.gov.

## Methodology<a class="anchor" id="third-section"></a>

In [None]:
# Let's start off importing some python tools
import pandas as pd
import numpy as np
import random as rnd
from tabulate import tabulate

# Visualization libraries 
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import plotly as py
import plotly.express as px
import plotly.graph_objects as go

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Library to handle JSON files
import json

# Converting an address into latitude and longitude values
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

# Library to handle requests
import requests 
# Tranforming JSON file into a pandas dataframe
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage

# ML tools
from sklearn import preprocessing
from sklearn.cluster import KMeans

# Visualization of maps with Folium library
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

In [None]:
# Reading in the CSV data for farmers markets and U.S. counties
farmers_market_df = pd.read_csv("../input/farmers-markets-in-the-united-states/farmers_markets_from_usda.csv")
us_counties_df = pd.read_csv("../input/farmers-markets-in-the-united-states/wiki_county_info.csv")

After initial analysis of the farmers_market_df dataframe, there are 36 rows with NaN for "County" but only 28 rows with NaN for the longitude and latitude coordinates. Looking into some of the NaN values reveals that they correspond to mobile farmers markets which may be located in multiple counties and cities. We will exclude these in our analysis. The 28 instances of unavailable coordinates represents only 0.3% of the total dataset (8804 farmers markets) so it could be ignored without significant change in the overall analysis.

    farmers_market_df[farmers_market_df['x'].isnull()]

For the U.S. county dataframe, we will clean up the data by excluding two rows which don't contain any data. There also appears to be duplicates of New Jersey, Maryland, and Puerto Rico as there are three rows which contains the sum values across the entire state/territory. All three also have rows showing the income and population per county.

    us_counties_df[us_counties_df['county'].isnull()]

The 89 permanently-inhabited county-equivalents in the territories of the United States (such as the municipalities of Puerto Rico) are also listed (but are not ranked in the dataframe). The dataset excludes the 8 county-equivalents in the U.S. territories that have zero people (Baker Island, Howland Island, Jarvis Island, Johnston Atoll, Kingman Reef, Navassa Island, Northern Islands Municipality and Rose Atoll). The 3 semi-populated county-equivalents in the U.S. Minor Outlying Islands (Midway Atoll, Palmyra Atoll and Wake Island) are also excluded.

In [None]:
# Dropping mobile farmers markets that did not provide a permanent longitude and latitude coordinates
farmers_market_df = farmers_market_df[farmers_market_df['x'].notna()]

In [None]:
# Dropping U.S. counties with NaN as their name in the dataframe since they are errors or duplicates
us_counties_df = us_counties_df[us_counties_df['county'].notna()]

# Removing the $ sign in front of the "Per Capita Income", "Median Household Income", and "Median Family Income" columns
us_counties_df[us_counties_df.columns[3:]] = us_counties_df[us_counties_df.columns[3:]].replace('[\$,]', '', regex=True).astype(float)

### Visualization of the U.S. counties data by per capita income

In [None]:
# Plotting the Per Capita Income aggregated across each state/territory
# Figure is organized by descending in the order of highest county by each state
# Box plots highlight the median, interquartile range, min (Q1-1.5*IQR) and max (Q3 + 1.5*IQR), and outliers. 
plt.figure(figsize=(14,20))

ax = sns.boxplot(x="per capita income", y="State", data=us_counties_df).set_title('Per Capita County Income grouped by U.S. States (descending by highest county)')

In [None]:
# Plotting the Per Capita Income aggregated across each state/territory
# Figure is organized by descending in the order of median "per capita income" value by state
# Box plots highlight the median, interquartile range, min (Q1-1.5*IQR) and max (Q3 + 1.5*IQR), and outliers. 
sort_median_counties = us_counties_df.groupby(["State"]).median()['per capita income'].sort_values(ascending=False)
state_names_by_median = sort_median_counties.index.values
plt.figure(figsize=(14,20))
ax = sns.boxplot(x="per capita income", y="State", order = state_names_by_median, data=us_counties_df).set_title('Per Capita County Income grouped by U.S. States (descending by state median)')

In [None]:
# Plotting the Median Household Income aggregated across each state/territory
# Figure is organized by descending in the order of highest Median Household county
# Box plots highlight the median, interquartile range, min (Q1-1.5*IQR) and max (Q3 + 1.5*IQR), and outliers. 
sort_median_counties = us_counties_df.groupby(["State"]).max()['median household income'].sort_values(ascending=False)
state_names_by_median = sort_median_counties.index.values
plt.figure(figsize=(14,20))
ax = sns.boxplot(x="median household income", y="State", order = state_names_by_median, data=us_counties_df).set_title('Median Household Income grouped by U.S. States (descending by state max)')

In [None]:
# Plotting the Median Household Income aggregated across each state/territory
# Figure is organized by descending in the order of highest median "Median Household Income" value by state
# Box plots highlight the median, interquartile range, min (Q1-1.5*IQR) and max (Q3 + 1.5*IQR), and outliers. 
sort_median_counties = us_counties_df.groupby(["State"]).median()['median household income'].sort_values(ascending=False)
state_names_by_median = sort_median_counties.index.values
plt.figure(figsize=(14,20))
ax = sns.boxplot(x="median household income", y="State", order = state_names_by_median, data=us_counties_df).set_title('Median Household Income grouped by U.S. States (descending by state median)')

In [None]:
# Plotting the Median Family Income aggregated across each state/territory
# Figure is organized by descending in the order of highest Median Family Income in each state
# Box plots highlight the median, interquartile range, min (Q1-1.5*IQR) and max (Q3 + 1.5*IQR), and outliers. 
sort_median_counties = us_counties_df.groupby(["State"]).max()['median household income'].sort_values(ascending=False)
state_names_by_median = sort_median_counties.index.values
plt.figure(figsize=(14,20))
ax = sns.boxplot(x="median family income", y="State", order = state_names_by_median, data=us_counties_df).set_title('Median Family Income grouped by U.S. States (descending by state max)')

In [None]:
# Plotting the Median Family Income aggregated across each state/territory
# Figure is organized by descending in the order of highest median value of "Median Family Income" by state
# Box plots highlight the median, interquartile range, min (Q1-1.5*IQR) and max (Q3 + 1.5*IQR), and outliers. 
sort_median_counties = us_counties_df.groupby(["State"]).median()['median household income'].sort_values(ascending=False)
state_names_by_median = sort_median_counties.index.values
plt.figure(figsize=(14,20))
ax = sns.boxplot(x="median family income", y="State", order = state_names_by_median, data=us_counties_df).set_title('Median Family Income grouped by U.S. States (descending by state median)')

In [None]:
# Per capita income plotted against median household income by county, size of scatter dot is based on population
import plotly.express as px
fig = px.scatter(us_counties_df, x="median household income", y="per capita income", color="State",
                 size='population', hover_data=['county'],size_max=30)
fig.show()

### Visualization of the Farmers Markets datasets

In [None]:
# Correcting an error in the dataset for the Derwood Farmers Market being at (39.126442, -77.150267), and not (1.7081209, -3.4606929)

farmers_market_df['x'][1898]=-77.150267
farmers_market_df['y'][1898]= 39.126442

In [None]:
# Create a map of U.S. farmers markets using latitude and longitude values
map_markets = folium.Map(location=[39.8283, -98.5795], zoom_start=4)

# add markers to map
for lat, lng, market, state in zip(farmers_market_df['y'], farmers_market_df['x'], farmers_market_df['MarketName'], farmers_market_df['State']):
    label = '{}, {}'.format(market, state)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='green',
        fill=True,
        fill_color='#31cc34',
        fill_opacity=0.7,
        parse_html=False).add_to(map_markets)  
    
map_markets

In [None]:
# Load the shape of the states (US states) from USA States Geodata kaggle dataset:
with open('../input/usa-states/usa-states.json') as json_data:
    usa_states = json.load(json_data)
    
# Load the shape of the counties from us-counties
with open('../input/uscounties/geojson-counties-fips.json') as json_data2:
    usa_counties = json.load(json_data2)

In [None]:
# Removing the phrase "County" from the U.S. counties to match json format
us_counties_df = us_counties_df.replace(" County", "",regex=True)
us_counties_df = us_counties_df.replace(" Census Area", "",regex=True)

farmers_market_df = farmers_market_df.replace(" County", "",regex=True)

In [None]:
# Count the number of farmers markets by State
farmers_market_state_count = farmers_market_df.groupby('State').count()
farmers_market_state_count['State'] = farmers_market_state_count.index.values 

# Dropping non-state terrotories in US to match json state names
farmers_market_state_count = farmers_market_state_count.drop(['District of Columbia', 'Puerto Rico', 'Virgin Islands'], axis=0)

# Count the number of farmers markets by County
farmers_market_county_count = farmers_market_df.groupby('County').count()
farmers_market_county_count['county'] = farmers_market_county_count.index.values 

# Add up population by state
state_pop = us_counties_df.groupby('State').sum()
state_pop['State'] = state_pop.index.values 

# Add up population by state
county_pop = us_counties_df.groupby('county').sum()
county_pop['county'] = county_pop.index.values 

In [None]:
# Normalized by population:
farmers_market_state_count2 = farmers_market_state_count.copy()
farmers_market_state_count2['FMID'] = (farmers_market_state_count2['FMID'].div(state_pop['population']))
farmers_market_state_count2['FMID'] = farmers_market_state_count2['FMID']*100000
# Initialize the map:
map_states = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
 
# Add the color for the chloropleth:
state_fm = folium.Choropleth(
 geo_data=usa_states,
 name='choropleth',
 data=farmers_market_state_count2,
 columns=['State','FMID'],
 key_on='feature.properties.name',
 fill_color='YlGn',
 fill_opacity=0.7,
 line_opacity=0.2,
 legend_name='Number of Farmers Markets per 100k Population'
).add_to(map_states)

 
map_states

In [None]:
# Normalized by population:
farmers_market_county_count2 = farmers_market_county_count.copy()
farmers_market_county_count2['FMID'] = (farmers_market_county_count2['FMID'].div(county_pop['population']))
farmers_market_county_count2['FMID'] = farmers_market_county_count2['FMID']*100000

# Initialize the map:
map_states2 = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
 
# Add the color for the chloropleth:
county_fm = folium.Choropleth(
 geo_data=usa_counties,
 name='choropleth',
 data=farmers_market_county_count2,
 columns=['county','FMID'],
 key_on='feature.properties.NAME',
 fill_color='YlGn',
 fill_opacity=0.7,
 line_opacity=0.2,
 threshold_scale=[0,10,20,30,40,70,141],
 legend_name='Number of Farmers Markets by County per 100k Population'
).add_to(map_states2)

 
map_states2

In [None]:
# Initialize the map:
map_states3 = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
 
# Add the color for the chloropleth:
county_income = folium.Choropleth(
 geo_data=usa_counties,
 name='choropleth',
 data=us_counties_df,
 columns=['county','per capita income'],
 key_on='feature.properties.NAME',
 fill_color='Spectral',
 fill_opacity=0.7,
 line_opacity=0.2,
 legend_name='Per Capita Income by County'
).add_to(map_states3)

 
map_states3

## Results<a class="anchor" id="fourth-section"></a>

### Plotting income vs farmers market density

In [None]:
farmers_market_df2 = farmers_market_df.copy()
farmers_market_df2['County-StateM']=farmers_market_df2['County'].astype(str) + ',' + farmers_market_df2['State'].astype(str)

us_counties_df2 = us_counties_df.copy()
us_counties_df2['County-State']=us_counties_df2['county'].astype(str) + ',' + us_counties_df2['State'].astype(str)

# Count the number of farmers markets by State
farmers_market_state_count2 = farmers_market_df2.groupby('County-StateM').count()
farmers_market_state_count2['County-State'] = farmers_market_state_count2.index.values 

merged_county_df = pd.merge(left=us_counties_df2, right=farmers_market_state_count2, left_on='County-State', right_on='County-State')

merged_county_df = merged_county_df.rename(columns={"State_x": "state", "FMID": "Farmers Market Count"})

# Creating a clean merged dataframe with only the relevant columns
clean_merged_county_df = merged_county_df[['County-State','state','per capita income','median household income','median family income','population','number of households','Farmers Market Count']].copy()
clean_merged_county_df['Household Density']=(clean_merged_county_df['population'].div(clean_merged_county_df['number of households']))


clean_merged_county_df = clean_merged_county_df.rename(columns={"per capita income": "Per Capita Income", "median household income": "Median Household Income", "median family income": "Median Family Income", "population": "Population", "number of households":"Number of Households","state":"State"})

clean_merged_county_df['Farmers Market 100k Density']=clean_merged_county_df['Farmers Market Count'].div(clean_merged_county_df['Population'])*100000

clean_merged_county_df.head()

In [None]:
# Multiple Regression Model
from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(clean_merged_county_df[['Per Capita Income','Median Household Income','Median Family Income','Population','Number of Households']])
y = np.asanyarray(clean_merged_county_df[['Farmers Market 100k Density']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)

In [None]:
y_hat= regr.predict(clean_merged_county_df[['Per Capita Income','Median Household Income','Median Family Income','Population','Number of Households']])
x = np.asanyarray(clean_merged_county_df[['Per Capita Income','Median Household Income','Median Family Income','Population','Number of Households']])
y = np.asanyarray(clean_merged_county_df[['Farmers Market 100k Density']])
print("Residual sum of squares: %.2f"
      % np.mean((y_hat - y) ** 2))

# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(x, y))

In [None]:
# Scatter plots of income and farmers market relationships
fig = px.scatter(clean_merged_county_df, x="Per Capita Income", y="Farmers Market 100k Density", color="State",
                 size="Population", hover_data=['County-State'],size_max=50)
fig.show()

In [None]:
# Scatter plots of income and farmers market relationships
fig = px.scatter(clean_merged_county_df, x="Median Household Income", y="Farmers Market 100k Density", color="State",
                 size="Population", hover_data=['County-State'],size_max=50)
fig.show()

In [None]:
# Scatter plots of income and farmers market relationships
fig = px.scatter(clean_merged_county_df, x="Per Capita Income", y="Population", color="State",
                 size="Population", hover_data=['County-State'],size_max=30)
fig.show()

### Unsupervised Machine Learning: County Segmentation with K-Means 

Pre-processing, dropping columns with dsicrete variables as the Euclidean distance function isn't really meaningful when clustering. I also drop the Number of Households, Household Density, and Farmers Market Count as they are duplicates of already presented columns in the dataframe. 

In [None]:
counties_df = clean_merged_county_df.drop(['State','Number of Households','Household Density','Farmers Market Count','Median Family Income'], axis=1)
counties_df.describe()

Let's normalize the dataset. But why do we need normalization in the first place? Normalization is a statistical method that helps mathematical-based algorithms to interpret features with different magnitudes and distributions equally. We use StandardScaler() to normalize our dataset.

In [None]:
from sklearn.preprocessing import StandardScaler
X = counties_df.values[:,1:]
X = np.nan_to_num(X)
Clus_dataSet = StandardScaler().fit_transform(X)
Clus_dataSet

For the K-means algorithm, we need to input a K value. The *KElbowVisualizer* from Yellowbrick implements the “Elbow” method to select the optimal number of clusters by fitting the model with a range of values for K. The optimal K value occurs at the inflection on the curve and is shown with a dashed line.

In [None]:
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

from yellowbrick.cluster import KElbowVisualizer

# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(1,10))

visualizer.fit(X)        # Fit the data to the visualizer
visualizer.show()        # Finalize and render the figure

In [None]:
clusterNum = 3
k_means = KMeans(init = "k-means++", n_clusters = clusterNum, n_init = 12)
k_means.fit(X)
labels = k_means.labels_

In [None]:
# Visualizing the 3 clusters
counties_df["Cluster"] = labels
counties_df.groupby('Cluster').mean()

In [None]:
# Scatter plots of income and farmers market relationships
counties_df["Cluster"] = counties_df["Cluster"].astype(str)
fig = px.scatter(counties_df, x="Per Capita Income", y="Farmers Market 100k Density",color="Cluster",size="Population", hover_data=['County-State'],size_max=50)
fig.show()

In [None]:
# Scatter plots of income and farmers market relationships
counties_df["Cluster"] = counties_df["Cluster"].astype(str)
fig = px.scatter(counties_df, x="Median Household Income", y="Farmers Market 100k Density",color="Cluster",
                 size="Population", hover_data=['County-State'],size_max=50)
fig.show()

## Discussion<a class="anchor" id="fifth-section"></a>

Overall, per capita income in the U.S. territories tends to be lower than per capita income in the 50 states and District of Columbia. Excluding the uninhabited county-equivalents, the county or county-equivalent with the highest per capita income is New York County, New York (Manhattan) (\\$62,498), and the county or county-equivalent with the lowest per capita income is Manu'a District, American Samoa (\\$5,441). Puerto Rico is also much poorer in comparison to the 50 states.

The U.S. Census Bureau defines a family as two or more people related by birth, marriage, or adoption residing in the same housing unit. A household consists of all people who occupy a housing unit regardless of relationship. A household may consist of a person living alone or multiple unrelated individuals or families living together. Median family income is typically higher than median household income because of the composition of households. Family households tend to have more people, and more of those members are in their prime earning years; as contrasted with members who have lesser incomes because they are very young or elderly. Areas with a wide disparity between the two measures have an excess of nonfamily households: single persons or otherwise.
    
K-means partitioned the counties into 3 mutually exclusive groups. The counties in each cluster are similar to each other demographically and economically. Now I can create a profile for each group, considering the common characteristics of each cluster. The 3 clusters are:

•	Less dense population counties (\~73,900), with a median household income of \\$46,122 and Per Capita Income of \\$23,696. High density of farmers markets (8.4/100k population). These are rural areas in the U.S., which while lower in economic status, are high in farmers markets.

•	Medium dense population counties (\~109,000), with a median household income of \\$60,734 and Per Capita Income of \\$31,568. Medium density of farmers markets (2/100k population).

•	Highly dense population counties (\~579,000), with a median household income of \\$54,297 and Per Capita Income of \\$28,271. Low density of farmers markets (1.2/100k population). 

There are only 4 counties designated as (2) cluster: Maricopa County, Arizona; Los Angeles County, California; Harris County, Texas; and Cook County, Illinois. These would be the best places for new farmers markets to open because of the low density of existing markets but relatively high income and population!

## Conclusion<a class="anchor" id="sixth-section"></a>

Using multiple regression modeling and unsupervised machine learning of K-Means clustering, I analyzed the relationship of the density of farmers markets and economic income indicators in the United States by county. Four counties (Maricopa County, AZ; Los Angeles County, CA; Harris County, TX; and Cook County, IL) were identified as serving great potential for new farmers markets because of the low farmers market density, high population and relatively high median income. It should be mentioned that an individual’s access to a farmers market depends on more factors, such as individual income and means of transportation. While a county may be above average in economic terms, multiple cities and towns are located in each county. Even individual cities have income inequality between neighborhoods. Also, this report did not access the size of the farmers markets and availability of specific dietary foods. Yet, it is reassuring, nevertheless, to find no substantial evidence on the county level that farmers markets are disproportionately distributed by per capita and median income. 


## References<a class="anchor" id="seventh-section"></a>

[1] U.S. Department of Health and Human Services and U.S. Department of Agriculture. 2015 – 2020 Dietary Guidelines for Americans. 8th Edition. December 2015. Available at https://health.gov/our-work/food-and-nutrition/2015-2020-dietary-guidelines/.

[2] Centers for Disease Control and Prevention (CDC). About Chronic Diseases. October 23, 2019. Available at https://www.cdc.gov/chronicdisease/about/index.htm. Accessed June 23, 2018. 

[3] American Diabetes Association. Economic Costs of Diabetes in the U.S. in 2017. Diabetes Care 2018;41(5):917-928. [PubMed abstract](https://pubmed.ncbi.nlm.nih.gov/29567642/)

[4] Finkelstein EA, Trogdon JG, Cohen JW, Dietz W. Annual medical spending attributable to obesity: payer- and service-specific estimates. Health Aff 2009;28(5):w822-31. [PubMed abstract](https://pubmed.ncbi.nlm.nih.gov/19635784/)

[5] https://www.kaggle.com/madeleineferguson/farmers-markets-in-the-united-states

[6] https://www.ams.usda.gov/local-food-directories/farmersmarkets

[7] "SELECTED ECONOMIC CHARACTERISTICS 2009-2013 American Community Survey 5-Year Estimates". U.S. Census Bureau. Archived from the original on 2015-01-17. Retrieved 2015-01-12.

[8] "ACS DEMOGRAPHIC AND HOUSING ESTIMATES 2009-2013 American Community Survey 5-Year Estimates". U.S. Census Bureau. Archived from the original on 2015-01-05. Retrieved 2015-01-12.

[9] "HOUSEHOLDS AND FAMILIES 2009-2013 American Community Survey 5-Year Estimates". U.S. Census Bureau. Archived from the original on 2020-02-12. Retrieved 2015-01-12.

[10] U.S. Census Bureau: American FactFinder. 2013-2017 American Community Survey 5-Year Estimates (Puerto Rico) and "Profile of selected economic characteristics: 2010" (American Samoa / Guam / Northern Mariana Islands / U.S. Virgin Islands).