# <img style="float: right; border: 1px solid black;" src="LFL.jpg" width=390> DATS6103 - Data Mining: Individual Project 3
# Data Mining Little Free Library Locations

Evan Carraway

Nima Zahadat, Ph.D.

The George Washington University

May 1, 2019

Objective: The purpose of this project is to continue learning and improving skills in data mining and extracting information from raw data using Python by analyzing locations of free libraries along with other demographic data.

Due Date: May 4, 2019 by 4:30 PM

## Tasks
1. Scrape library locations from online map
2. Aggregate and map libraries by city
3. Correlate library density to other city demographics 

## Subtasks
1. Create an effective presentation describing your data set of choice
2. Present the topic and the rationale on why you are picking that topic
3. Topic and its data MUST be complex and not at all obvious
4. The topic will be checked for plagiarism
5. State where the data came from
6. State your data cleaning and preprocessing
7. Your code needs to be working and be well commented
8. Your analysis needs to be clear, organized, and in depth
9. Indicate any classifications, clusters, associations/correlations, etc.
10. Discuss your predictions you might have and your reasoning behind them
11. Describe your learning processes
12. Present an overall conclusion

## Abstract

May 2019 will mark the 10 year anniversary of Little Free Library, a non-profit, community-driven program to spread literacy and learning through free book exchanges. As of this year, there are over 80,000 such libraries in over 90 countries (LFL, 2019). Previous research has shown links to book access from sources like LFLs can lead to higher literacy rates (Rebori, 2017). Knowing more about the locations of these libraries and the communities that build them can give us an understanding of the effectiveness of this movement, as well as other insights. Using littlefreelibrary.org's database of over 30,000 registered LFLs which contains zip, state, country and coordinates, and combining this with other demographic data, we will seek to learn and expose new insights about LFLs and their popularity.

## Assumptions and Questions

We would expect to see more libraries in higher population areas, but it is unclear if more little free libraries in an area would have a correlation with academic success. We would predict cities such as Ann Arbor, MI, Washington, DC and Madison, WI might have a higher density of libraries since they are frequently ranked as the most educated cities in the U.S. (Sonnenberg, 2017). Washington, Seattle and Minneapolis would also be contenders for having more libraries given they are frequently rated as the most literate cities (McClurg, 2017). Studies of certain cities have shown that LFL are in built more in high affluence, low poverty areas, but at the same time there are differences city to city in who builds the libraries, whether they be individuals, governmental organization or NGOs (Sarmiento, 2017). 

## Tools Used

For this exercise, we used Python 3 in Jupyter for interactive computing, with the Pandas data manipulation and analysis libraries, as well as the Plotly statistical data visualization library to generate interactive charts and figures. For data retrieval and parsing we will use requests and json. For storage and retrieval of data sets we will use SQLite. Lastly, we will use Folium to create an interactive map of cities and LFLs.

## Little Free Library Website

The primary dataset we will be using for this project is the Little Free Library website. Locations of little free libraries can be found at https://littlefreelibrary.org/

In [1]:
# Import required modules
import requests
import json
import time
import math
import numpy as np
import pandas as pd
import geopandas as gpd
import folium
from folium.plugins import HeatMap
from sqlalchemy import create_engine
from scipy.optimize import curve_fit
from IPython.display import display, HTML
import plotly
import plotly.graph_objs as go
plotly.offline.init_notebook_mode(connected=True)
pd.options.mode.chained_assignment = None

# Demo flag. Set to False to turn off screenshot and dataframe feedback
demo=True

# Hides code in HTML output
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

## Data Retrieval (LFL Website)

The LFL website stores locations of libraries including charter number, name, story, steward information, address, city, region (state), zip code, country and latitude/longitude. Looking at the search page, we see that this website does not have a documented application programmatic interface (API), so instead we will use our browser's developer network tool to capture the POST that is used when executing a search on the map. What we learn is that the site uses XML HTTP Request (XHR) to request results for a given search which is returned using a JavaScript Object Notation (JSON) file. Using this would be faster than loading the entire page and scraping the data, because it just returns a light weight results dictionary or JSON. 

Taking advantage of this requires documenting what the request payload looks like as well as any required headers. Once we have that information, we can use the Python requests library to request a page. To simplify lookups, we'll create a custom function that builds the request based on a user defined list of countries, states, cities or zipcodes, in addition to other parameters. The request content requires use of JSON, so our function will build a JSON as part of the request. Once this is complete, we basically have our own custom API for this database.

In [2]:
# Define dictionary to converty result JSON keys to dataframe column names
coldict = {'Official_Charter_Number__c':'charter', 
           'Library_Name__c':'name', 
           'Library_Story__c':'story', 
           'Primary_Steward_s_Name__c':'steward', 
           'Primary_Steward_s_Email__c':'email', 
           'Street__c':'address', 
           'City__c':'city', 
           'State_Province_Region__c':'region', 
           'Postal_Zip_Code__c':'zip', 
           'Country__c':'country', 
           'latitude':'latitude', 
           'longitude':'longitude'}

# Define dictionary of states for querying API
statesdict = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MP': 'Northern Mariana Islands',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NA': 'National',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'PR': 'Puerto Rico',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VI': 'Virgin Islands',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'}
states = list(statesdict.keys())
statesrepdict = dict((v,k) for k,v in statesdict.items())

citiesrepdict = {'Saint Paul':'St. Paul',
                 'St Paul':'St. Paul',
                 'Saint Louis':'St. Louis',
                 'St Louis':'St. Louis',
                 'Brooklyn':'New York',
                 'Staten Island':'New York',
                 'Bronx':'New York',
                 'Boise':'Boise City',
                 'Winston Salem':'Winston-Salem'}

charterrepdict = {'M':'',
                 '#':'',
                 'G':'',
                 ' ':'',
                 ',':''}

# Define list of countries for querying API
countries = ['Afghanistan', 'Albania', 'Algeria', 'Argentina', 'Australia', 'Bahamas', 
             'Belarus', 'Belgium', 'Brazil', 'Canada', 'China', 'Colombia', 'Costa Rica', 
             'Cuba', 'Estonia', 'Finland', 'France', 'Germany', 'Ghana', 'Greece', 
             'Grenada', 'Guatemala', 'Honduras', 'Hungary', 'India', 'Indonesia', 
             'Ireland', 'Israel', 'Italy', 'Japan', 'Kyrgyzstan', 'Lebanon', 'Malaysia', 
             'Mexico', 'Micronesia', 'Netherlands', 'New Zealand', 'Nigeria', 'Norway', 
             'Oman', 'Pakistan', 'Philippines', 'Poland', 'Portugal', 'Romania', 'Slovakia', 
             'Slovenia', 'South Korea', 'Spain', 'Switzerland', 'Taiwan', 'Tanzania', 
             'Ukraine', 'United Arab Emirates', 'United Kingdom', 'Venezuela']

# Define function to convert result JSON to dataframe
def ResultsToDF(results):
    df = pd.DataFrame(results).fillna('')
    try:
        df['latitude'] = df['Library_Geolocation__c'].apply(lambda x: x.get('latitude'))
        df['longitude'] = df['Library_Geolocation__c'].apply(lambda x: x.get('longitude'))
    except:
        pass
    df.rename(columns=coldict, inplace=True)
    for colname in list(coldict.values()):
        if colname not in df:
            df[colname] = ''
    df = df[list(coldict.values())]
    return df

# Define function to search Little Free Library map service
def LFLsearch(values, mode='Name', wait=5, 
              keepcols=['charter','city','region','zip','country','latitude','longitude'],
              demo=False):
    # Define endpoint URL and header variables
    url = 'https://littlefreelibrary.secure.force.com/apexremote'
    header = json.loads('{"Referer":"https://littlefreelibrary.secure.force.com/mapPage"}')
    # Convert values to list if not already a list
    if type(values) != list:
        values = [values]
    # Loop through search values
    for value in values:
        results = []
        if mode == 'Country':
            searchvalue = str(value)
            searchmode = 'Country'
        elif mode == 'Zip':
            searchvalue = str(value)
            searchmode = 'ZipCode'
        elif mode == 'State':
            searchvalue = 'BLANKSEPERATE' + str(value)
            searchmode = 'CityState'
        elif mode == 'City':
            searchvalue = str(value) + 'SEPERATEBLANK'
            searchmode = 'CityState'
        elif mode == 'CityState':
            searchvalue = str(value).split(',')[0].strip() + 'SEPERATE' + str(value).split(',')[-1].strip()
            searchmode = 'CityState'
        elif mode == 'Charter':
            searchvalue = str(value)
            searchmode = 'CharterNumber'
        elif mode == 'Name':
            searchvalue = str(value)
            searchmode = 'StewardsName'
        else:
            print('Invalid search paramater: ' + mode + '. Valid modes: Country, Zip, State, City, CityState, Charter, Name')
            return
        # Create query parameter JSON for POST request
        rjson = json.loads('{"action":"MapPageController","method":"remoteSearch","data":["' + searchvalue + 
                           '","' + searchmode + '",null,null],"type":"rpc","tid":5,"ctx":{"csrf":"VmpFPSxNakF4T1Mwd05DMHlNbFF4T0Rvek1Eb3lNeTR6TXpoYSxoQXk2UThwcnRVdDJKa2luVmxDNnF3LFpUTmhZelkw","vid":"066d00000027Meh","ns":"","ver":29}}')
        try:
            #Send POST with parameters
            r = requests.post(url, json=rjson, headers=header)
            # Extract and append result library JSONs
            for r1 in r.json()[0]['result']:
                results += [r1['library']]
        except:
            pass
        time.sleep(wait)
        # Convert result JSON to dataframe
        df = ResultsToDF(results)
        df = df[keepcols]
        if len(values) < 10:
            display(HTML(df.head(1).to_html()))
        # Store found libraries in SQL database
        engine=create_engine('sqlite:///libraries.db', echo=False)
        if demo == False:
            df.to_sql("libraries", engine, if_exists="append", index=False)
        else:
            df.to_sql("libraries_demo", engine, if_exists="append", index=False)
        print(mode + ' : ' + str(value) + ' - ' + str(len(df)) + ' results stored in database')
        
def intRgb(mag, cmin, cmax):
    try: x = (.8 + float(mag-cmin)/(cmax-cmin))/2
    except ZeroDivisionError: x = 0.5 # cmax == cmin
    blue  = 70 # int(min((max((4*(0.75-x), 0.)), 1.))*255)
    red   = 50 # int(min((max((4*(x-0.25), 0.)), 1.))*255)
    green = int(min((max((x, 0.1)), .9))*254)
    return red, green, blue

Once the functions are defined, we can use it to search and store data from a defined list of countries or states. There are limitations on how much data can be retrieved in a single request so we can not just query to get all results for USA.

In [101]:
# Search for LFLs in DC
LFLsearch(['DC','MD','VA'], mode='State', wait=7, demo=True)

Unnamed: 0,charter,city,region,zip,country,latitude,longitude
0,49093,Washington,DC,20016,USA,38.95235,-77.09458


State : DC - 110 results stored in database


Unnamed: 0,charter,city,region,zip,country,latitude,longitude
0,66110,Rockville,MD,20852,USA,39.075195,-77.153304


State : MD - 615 results stored in database


Unnamed: 0,charter,city,region,zip,country,latitude,longitude
0,67157,Alexandria,VA,22314,USA,38.79757,-77.04996


State : VA - 922 results stored in database


Once we have queried from the locations we need, we can repull the data from the local database to begin our analysis.

In [3]:
# Query local database for all library data
engine = create_engine('sqlite:///libraries.db', echo=False)
df = pd.read_sql_query("select distinct * FROM libraries", engine)
print(str(len(df)) + ' libraries retrieved from database.')

# Format coordinates and charter as numbers
df['charter'].replace(charterrepdict,inplace=True, regex=True)
df['charter'] = pd.to_numeric(df['charter'],errors='coerce')
df = df[(df['charter'] > 0) & (df['charter'] < 87000)]
df = df.astype({"latitude": float, "longitude": float, "charter": int})

# Sort dataframe by charter number
df = df.sort_values(by=['charter'],ascending=True)
display(HTML(df.head().to_html()))

33333 libraries retrieved from database.


Unnamed: 0,charter,city,region,zip,country,latitude,longitude
29961,1,Hudson,WI,54016,USA,44.98783,-92.75851
5648,2,Boulder,CO,80302,USA,40.006829,-105.287075
30666,3,Verona,WI,53593,USA,42.98805,-89.53376
30667,4,Madison,WI,5370,,43.07203,-89.47988
30668,5,Madison,WI,53704-5453,,43.09549,-89.34656


## Data Retrieval (American Community Survey)

To provide context to the LFL locations, we should incorporate demographic data about the cities in the United States. The best source of information for our purposes is the U.S. Census Bureau, which produces American Community Survey (ACS) 5-Year Estimates on locations across the country. The data for 2013-2017 is searchable at https://factfinder.census.gov/.

We will use Pandas to load specific columns from the ACS datasets for population and education/income. Using Pandas we can load in just the specific columns we need and reformat the data to numeric values with the correct city, state format.

In [5]:
# Load city education and population data 
edf = pd.read_csv('ACS_17_5YR_S1501_with_ann.csv', encoding='iso-8859-1', low_memory=False, usecols=['GEO.display-label', 'HC02_EST_VC17', 'HC02_EST_VC18', 'HC01_EST_VC80'])
edf['pop'] = pd.read_csv('ACS_17_5YR_B01003_with_ann.csv', encoding='iso-8859-1', low_memory=False, usecols=['HD01_VD01'])

# Rename and reformat column names and values
edf.columns = ['city','highschool','bachelor','income','pop']
edf = edf[edf['city'].str.contains('city, ')]
edf['state']=edf['city'].str.split(', ', n = 1, expand = True)[1]
edf['city']=edf['city'].str.split(' city, ', n = 1, expand = True)[0]
edf['state'].replace(statesrepdict,inplace=True)
edf['city'].replace(citiesrepdict,inplace=True)
edf['city'] = edf['city'].apply(lambda x: x.title())
edf['city'] = edf['city'].map(str) + ', ' + edf['state'].map(str)
edf['income'].replace('-',0,inplace=True)
edf['income'] = pd.to_numeric(edf['income'],errors='coerce')
edf['bachelor'] = pd.to_numeric(edf['bachelor'],errors='coerce')
edf['highschool'] = pd.to_numeric(edf['highschool'],errors='coerce')
edf['pop'] = pd.to_numeric(edf['pop'],errors='coerce')
edf = edf.sort_values(by=['pop'],ascending=False)
edf.head()

Unnamed: 0,city,highschool,bachelor,income,pop,state
17551,"New York, NY",81.1,36.7,41098.0,8560072,NY
2724,"Los Angeles, CA",76.4,33.0,32284.0,3949776,CA
6283,"Chicago, IL",83.8,37.5,39917.0,2722586,IL
25203,"Houston, TX",77.9,31.7,33521.0,2267336,TX
1208,"Phoenix, AZ",81.2,27.8,35624.0,1574421,AZ


The next step is to group the LFL dataframe by city and then combine that data with the demographic data from the ACS dataframe. Once we have all of our data in a single table, we can use Plotly to create an interactive web table. We can add some basic sort views to the table to do further EDA.

Note: LFL per capita is measured in libraries per 100,000 residents. This table can be opened full screen by clicking <a href="table.html">here</a>.

In [14]:
# Create city column in LFL dataframe
col = 'city'
df2 = df[df[col] != '']
df2['region'].replace(statesrepdict,inplace=True)
df2['city'].replace(citiesrepdict,inplace=True)
df2['city'] = df2['city'].apply(lambda x: x.title())
df2['city'] = df2['city'].map(str) + ', ' + df2['region'].map(str)

# Group by specified column and create sorted top n counts
dfg = df2[[col,'charter']].groupby(col).count()
dfg.columns = ['count']
dfg2 = df2.groupby(col)[['latitude','longitude']].median().round(6)
dfg = pd.concat([dfg, dfg2], axis=1, sort=False)

# Merge LFL dataframe to education/population dataframe
dfg = pd.merge(dfg, edf, on='city', how='outer')
dfg = dfg[dfg['pop'] > 1000]
dfg = dfg[dfg['count'] > 1]
dfg['lflpercap'] = (dfg['count']/dfg['pop']*100000).round(2)

# Select top N cities based on LFL count
dfg = dfg.sort_values(by=['count'],ascending=False).reset_index()
dfg = dfg[['city','count','pop','lflpercap','highschool','bachelor','income','latitude','longitude']]
dfgp = dfg.sort_values(by=['pop'],ascending=False).reset_index()
dfgpc = dfg.sort_values(by=['lflpercap'],ascending=False).reset_index()
dfgmax = dfg
dfg1000 = dfg.head(1000)
dfg200 = dfg.head(200)

# Define function to create Plotly interactive table traces
def tracedf(dfg, sortby):
    dfg = dfg.sort_values(by=[sortby],ascending=False).reset_index()
    trace = go.Table(
        header=dict(values=['city','count','pop','lflpercap','highschool','bachelor','income'],
                    fill = dict(color='#C2D4FF'),
                    align = ['left'] * 2,
                    font = dict(color = 'black', size = 20)),
        cells=dict(values=[dfg[col], dfg['count'], dfg['pop'], dfg['lflpercap'], dfg['highschool'], dfg['bachelor'], dfg['income']],
                   fill = dict(color='#F5F8FF'),
                   align = ['left'] * 2))
    return trace

# Create traces for sorted tables
trace0 = tracedf(dfg,'count')
trace1 = tracedf(dfg,'pop')
trace2 = tracedf(dfg,'lflpercap')
trace3 = tracedf(dfg,'highschool')
trace4 = tracedf(dfg,'bachelor')
trace5 = tracedf(dfg,'income')

# Add layout and buttons to sort by different columns
layout = dict(width=985, height=600, title='Little Free Library Distribution by U.S. City')
updatemenus = list([
        dict(
            buttons=list([   
                dict(label = 'Sort (Libraries)',
                     method = 'animate', 
                     args=[{'data' : [trace0]}]
                ),
                dict(label = 'Sort (Population)',
                     method = 'animate', 
                     args=[{'data' : [trace1]}]
                ),
                dict(label = 'Sort (LFL Per Cap)',
                     method = 'animate', 
                     args=[{'data' : [trace2]}]
                ),
                dict(label = 'Sort (HS Grad)',
                     method = 'animate', 
                     args=[{'data' : [trace3]}]
                ),
                dict(label = 'Sort (Col Grad)',
                     method = 'animate', 
                     args=[{'data' : [trace4]}]
                ),
                dict(label = 'Sort (Income)',
                     method = 'animate', 
                     args=[{'data' : [trace5]}]
                ),          
            ]),
            direction = 'left',
            pad = {'r': 10, 't': 10},
            showactive = True,
            type = 'buttons',
            x = .5,
            xanchor = 'center',
            y = 1.11,
            yanchor = 'top' 
        )
    ])
layout['updatemenus'] = updatemenus

# Plot table
data = [trace0]
fig = dict(data=data, layout=layout)
plotly.offline.iplot(fig,show_link=True, filename='table.html')

## Mapping Distribution of LFLs in the U.S.

Since our data has geolocational information, we can visualize how many LFLs are in major cities in the U.S.. This would be useful for identifying any outliers. We would expect there would be more LFLs in higher population density areas, but there may be cities with lower populations that have more LFLs and vice versa. To create our map, we will use the Folium library which can create an embedded HTML Leaflet map, showing multiple layers, base maps and points/polygons. We will map cities with circles based on the amount of libraries total, and modify color and opacity using the LFL per capita data point.

Note: the map layers can be controlled using the layer control box in the top right of the map. The can be opened full screen by clicking <a href="map.html">here</a>.

In [7]:
# Make a map with tile layers centered on U.S.
dfg = dfg1000
m = folium.Map(location=[39.143,-98.485], tiles="CartoDB positron", zoom_start=4)
folium.TileLayer('Mapbox Bright').add_to(m)
folium.TileLayer('CartoDB Dark_Matter').add_to(m)
folium.TileLayer('openstreetmap').add_to(m)
folium.TileLayer('stamenterrain').add_to(m)

# Add groups for showing more or fewer circles
fg1=folium.FeatureGroup(name="1-10", show=True)
fg2=folium.FeatureGroup(name="10-30", show=True)
fg3=folium.FeatureGroup(name="30-100", show=True)
fg4=folium.FeatureGroup(name="100-1000", show=True)
fg5=folium.FeatureGroup(name="Heatmap", show=False)

# Find upper and lower bounds for scaling color/opacity
cmin,cmax = dfg['lflpercap'].min(),dfg['lflpercap'].max()

# Add marker one by one on the map
for i in range(0,1000):
    color = 'rgb' + str(intRgb(dfg.iloc[i]['lflpercap'],cmin,cmax))
    ptext = '<b>' + str(dfg.iloc[i]['city']) + '</b><br>' + str(int(dfg.iloc[i]['count'])) + ' libraries' + '<br>' + str(dfg.iloc[i]['lflpercap']) + ' / 100k pop'
    popup = folium.Popup(ptext, max_width=300)
    radius = 5000+(int(dfg.iloc[i]['count'])*300)
    fillo = max(min(float(dfg.iloc[i]['lflpercap']-cmin)/(cmax-cmin),.8),.2)
    c1 = folium.Circle(
        location=[float(dfg.iloc[i]['latitude']), float(dfg.iloc[i]['longitude'])],
        popup=popup, 
        tooltip=ptext,
        radius=radius,
        weight=.6,
        color=color,
        opacity=(fillo+1)/2,
        fill=True,
        fill_color=color,
        fill_opacity=fillo,
        highlight=True
       )
    if i < 10:
        fg1.add_child(c1).add_to(m)
    elif i < 30:
        fg2.add_child(c1).add_to(m)
    elif i < 100:
        fg3.add_child(c1).add_to(m)
    else:
        fg4.add_child(c1).add_to(m)

# Create heatmap of cities with library count
max_amount = float(dfg['count'].max())
hm_wide = folium.plugins.HeatMap( list(zip(dfg['latitude'], dfg['longitude'], dfg['count'])),
                   min_opacity=0.1,
                   max_val=max_amount,
                   radius=15, blur=15, 
                   max_zoom=.0001, 
                 )
fg5.add_child(hm_wide).add_to(m)

# Add layer controls and display map
m.add_child(folium.LayerControl())
m.save('map.html')
m

What we observed in our map is a high distribution of LFL libraries in the upper Midwest, particularly the Minneapolis / St. Paul, MN region. This makes sense given that the first LFL was built in nearby Hudson, WI. In addition, large cities like Seattle, WA and Portland, OR also have a large amount of LFLs.

## Visualizing Little Free Library Density and Academic Achievement

Since we combined our dataframes for U.S. cities to include population, LFLs per capita, and percentages of high school and college graduates, we can plot those values in a scatter plot. We will be using Plotly to make the visualizations more dynamic and interactive. To identify where larger cities fall in this distribution, we will color code and adjust the size of our points by population. This will help us determine if there is any correlation with the distribution of LFLs and education level/achievement.

Note: This chart can be opened full screen by clicking <a href="chart.html">here</a>.

In [11]:
# Subset to 200 cities with most LFLs
dfg = dfg200

# Define exponential function
def func(x, a, b, c):
        return a*(1+np.exp(-b*x))+c

# Create label field for chart
dfg['text'] = ('City: ' + dfg['city'] + '<br>' + 
               'Population: ' + dfg['pop'].map(int).map(str) + '<br>' + 
               dfg['lflpercap'].map(str) + ' lib/100k, ' + dfg['highschool'].map(str) + '% HS grad rate<br>' + 
               dfg['bachelor'].map(str) + '% bachelor grad rate<br>income: ' + dfg['income'].map(str))

# Create high school and lfl scatter plot trace
trace0 = go.Scatter(
    x= dfg['lflpercap'],
    y= dfg['highschool'],
    mode= 'markers',
    marker= dict(size= (np.log(dfg['pop'])**1.5)/3,
                line= dict(width=1),
                opacity= 0.4,
                color=np.log(dfg['pop']),
                colorscale='Portland'
               ),
    text= dfg['text'])

# Create high school and lfl line plot trace
x = dfg['lflpercap']
y = dfg['highschool']
popt, pcov = curve_fit(func,x, y,p0=(300,0.1,1))
xx = np.linspace(0, 300, 1000,)
yy = func(xx, *popt)
line0 = go.Scatter(
    x= xx,
    y= yy,
    line= dict(width=5,
                color='rgba(0,0,0,.2)'
               ))

# Create bachelors and lfl line plot trace
y = dfg['bachelor']
popt, pcov = curve_fit(func,x, y,p0=(300,0.1,1))
yy = func(xx, *popt)
line1 = go.Scatter(
    x= xx,
    y= yy,
    line= dict(width=5,
                color='rgba(0,0,0,.2)'
               ))

# Create bachelors and lfl scatter plot trace
trace1 = go.Scatter(
    x= dfg['lflpercap'],
    y= dfg['bachelor'],
    mode= 'markers',
    marker= dict(size= (np.log(dfg['pop'])**1.5)/3,
                line= dict(width=1),
                opacity= 0.4,
                color=np.log(dfg['pop']),
                colorscale='Portland'
               ),
    text= dfg['text'])

# Create income and lfl scatter plot trace
trace2 = go.Scatter(
    x= dfg['lflpercap'],
    y= dfg['income'],
    mode= 'markers',
    marker= dict(size= (np.log(dfg['pop'])**1.5)/3,
                line= dict(width=1),
                opacity= 0.4,
                color=np.log(dfg['pop']),
                colorscale='Portland'
               ),
    text= dfg['text'])

# Add custom layout
layout0 = go.Layout(
        title= 'Little Free Libraries and City Demographics',
        hovermode= 'closest',
        xaxis= dict(
            title= 'Little Free Libraries per 100,000 Residents',
            ticklen= 5,
            zeroline= False,
            gridwidth= 2,
            range=[-10,310], 
            autorange=False,
        ),
        yaxis=dict(
            title= 'Graduation Rate (High School)',
            ticklen= 5,
            gridwidth= 2,
            range=[66,104],
            autorange=False,
        ),
        showlegend=False
    )

# Add custom layout
layout1 = go.Layout(
        title= 'Little Free Libraries and City Demographics',
        hovermode= 'closest',
        xaxis= dict(
            title= 'Little Free Libraries per 100,000 Residents',
            ticklen= 5,
            zeroline= False,
            gridwidth= 2,
            range=[-10,310], 
            autorange=False,
        ),
        yaxis=dict(
            title= 'Graduation Rate (Bachelors)',
            ticklen= 5,
            gridwidth= 2,
            range=[0,90],
            autorange=False,
        ),
        showlegend=False
    )

# Add custom layout
layout2 = go.Layout(
        title= 'Little Free Libraries and City Demographics',
        hovermode= 'closest',
        xaxis= dict(
            title= 'Little Free Libraries per 100,000 Residents',
            ticklen= 5,
            zeroline= False,
            gridwidth= 2,
            range=[-10,310], 
            autorange=False,
        ),
        yaxis=dict(
            title= 'Median Income',
            ticklen= 5,
            gridwidth= 2,
            range=[20000,100000],
            autorange=False,
        ),
        showlegend=False
    )

# Add buttons to show different education levels
updatemenus = list([
        dict(
            buttons=list([   
                dict(label = 'High School Degrees',
                     method = 'animate', 
                     args=[{'data' : [trace0,line0],'layout':layout0}]
                ),
                dict(label = 'Bachelors Degrees',
                     method = 'animate', 
                     args=[{'data' : [trace1,line1],'layout':layout1}]
                ),     
                dict(label = 'Income',
                     method = 'animate', 
                     args=[{'data' : [trace2],'layout':layout2}]
                ),  
            ]),
            direction = 'left',
            pad = {'r': 10, 't': 10},
            showactive = True,
            type = 'buttons',
            x = 0.1,
            xanchor = 'left',
            y = 1.1,
            yanchor = 'top' 
        )
    ])

layout0['updatemenus'] = updatemenus
layout1['updatemenus'] = updatemenus
layout2['updatemenus'] = updatemenus

# Plot scatter chart
fig = go.Figure(data=[trace0,line0], layout=layout0)
plotly.offline.iplot(fig,show_link=True, filename='chart.html')

Based on our scatter plot, it appears there is a strong positive correlation between LFLs per capita and high school graduation rates in U.S. cities. That said, the direction of the correlation is unclear based on this limited data, without other education investment data points and a good way to isolate variables. It is just as likely or even more likely that cities with with more graduates would be more likely to install LFLs, given a pre-existing interest in reading. LFL distribution and income did not appear to be strongly correlated at the city level, but there may be less spatial equality within individual cities.

## Conclusion

By analyzing LFL locations, we were able to determine which cities in the U.S. had more LFLs and correlate those to academic achievement and population. There is a positive correlation between LFLs and academic achievement, but more studies would be needed to validate that access to LFLs improves academic outcomes. While LFLs are a way to increase availability of reading materials, local governments, NGOs and individuals should seek to ensure LFLs and other similar resources are effectively located to promote equal access.

## Bibliography

Little Free Library. (2019, April 02). 10th Anniversary. Retrieved April 18, 2019, from https://littlefreelibrary.org/10years/

McClurg, J. (2017, March 31). Washington is nation's 'most literate' city. Retrieved April 30, 2019, from https://www.usatoday.com/story/life/books/2017/03/31/americas-most-literate-cities-washington-dc/99874942/

Rebori, M. K. (2017, June). *Using Geospatial Analysis to Align Little Free Library Locations with Community Literacy Needs*. Retrieved April 18, 2019, from https://www.joe.org/joe/2017june/tt3.php

Sarmiento, C. S., Sims, J. R., & Morales, A. (2017). *Little Free Libraries: An examination of micro-urbanist interventions. Journal of Urbanism: International Research on Placemaking and Urban Sustainability*, 11(2), 233-253. doi:10.1080/17549175.2017.1387588

Sonnenberg, L. (2017, August 03). The Most And Least Educated American Cities In 2017. Retrieved April 30, 2019, from https://www.forbes.com/sites/laurensonnenberg/2017/07/28/the-most-and-least-educated-american-cities-in-2017/

U.S. Census Bureau (2018). Selected population and educational attainment characteristics, 2013-2017 American Community Survey 5-year estimates. Retrieved from http://factfinder2.census.gov/