## Harvard's Physical Impact
How much of the United States does Harvard own?

In [1]:
import cityscraper as cs
import numpy as np
import pandas as pd
import math

In [2]:
import matplotlib.pyplot as plt
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

In [3]:
plotly.tools.set_credentials_file(username='hangulu', api_key='78VR3oagCeoHkdYiKB4b')

There are gaps in the available information for cities abroad, so this analysis will focus on Harvard's national impact. Hence, the buildings abroad can be ignored.

In [4]:
buildings = pd.read_excel("2018_building_reference_list.xlsx", header=3)
buildings = buildings[~buildings['City'].isin(['Sardis', 'Fiesole', 'Florence', 'Shanghai', 'London'])]
footage = buildings['GSF SF Total'].sum()
count = buildings['GSF SF Total'].count()

Harvard owns {{count}} buildings in the following cities in the United States: Jamaica Plain, Roslindale, Cambridge, Washington DC, Ledyard, Allston, Hamilton, Petersham, Harvard, Bedford, Southborough, Shrewsbury, Kittery Point, Boston, Los Angeles, San Francisco, Holyoke, Somerville, Menlo Park, Burlingame, Lowell and New York.

Most of the data available on cities can be found at http://www.city-data.com, so I built a webscraper, `cityscraper.py`, to be able to pull data from every city that Harvard has buildings in.

In [5]:
df = pd.read_excel("city_url.xlsx", header=0)
results = []

for i in range(df['Type'].count()):
    city = df.iloc[[i]]['City'].values[0]
    my_type = df.iloc[[i]]['Type'].values[0]
    endpoint = df.iloc[[i]]['Endpoint'].values[0]

    results.append(cs.scraper(city, my_type, endpoint))

city_data = pd.DataFrame(results)

To facilitate comparison, the `Area` column of the `city_data` dataframe will be converted from square miles to square feet, by multiplying by $27878000$.

In [6]:
city_data['Square Footage'] = city_data['Area'] * 27878000

Now, we can easily determine how much of a given city Harvard owns.

In [7]:
footage_results = []

for i in range(city_data['City'].count()):
    city = city_data.iloc[[i]]['City'].values[0]
    city_buildings = buildings[buildings['City'] == city]
    harv_footage = city_buildings['GSF SF Total'].sum()
    city_footage = city_data.iloc[i]['Square Footage']
    perc_harv = (harv_footage / city_footage) * 100
    footage_results.append({'City': city, 'Harvard Footage': harv_footage, 'City Footage': city_footage, 'Percentage Harvard': perc_harv})

final = pd.DataFrame(footage_results)
final.sort_values('Percentage Harvard', ascending=False, inplace=True)
final.reset_index(inplace=True, drop=True)
print(final)

             City  City Footage  Harvard Footage  Percentage Harvard
0       Cambridge  1.792555e+08       17151456.3            9.568160
1         Allston  1.014202e+08        5777374.0            5.696475
2          Boston  1.349295e+09        3651048.7            0.270589
3      Somerville  1.145786e+08         166018.0            0.144894
4   Jamaica Plain  8.851265e+07         122906.0            0.138857
5          Lowell  3.847164e+08         352000.0            0.091496
6    Southborough  3.930798e+08         269812.0            0.068641
7      Washington  1.711709e+09         256038.0            0.014958
8   Kittery Point  5.296820e+07           5500.0            0.010384
9        Hamilton  4.070188e+08          27809.0            0.006832
10      Petersham  1.510988e+09          84358.0            0.005583
11     Shrewsbury  5.770746e+08          16478.0            0.002855
12        Bedford  3.819286e+08          10507.0            0.002751
13        Harvard  7.359792e+08   

The three cities that Harvard owns most of are:
1. Cambridge, owning {{math.ceil(final.iloc[0]['Percentage Harvard'] * 100) / 100}}% of the land.
1. Allston, owning {{math.ceil(final.iloc[1]['Percentage Harvard'] * 100) / 100}}% of the land.
1. Boston, owning {{math.ceil(final.iloc[2]['Percentage Harvard'] * 100) / 100}}% of the land.

In [8]:
labels1 = ['Cambridge','Harvard']
labels2 = ['Allston', 'Harvard']
labels3 = ['Boston', 'Harvard']
values1 = [100 - 9.57, 9.57]
values2 = [100 - 5.7, 5.7]
values3 = [100 - 0.28, 0.28]

trace1 = go.Pie(labels=labels1, values=values1)
trace2 = go.Pie(labels=labels2, values=values2)
trace3 = go.Pie(labels=labels3, values=values3)

fig = {
    'data': [
        {
            'labels': labels1,
            'values': values1,
            'type': 'pie',
            'name': 'Cambridge',
            'marker': {'colors': ['rgb(46, 204, 113)',
                                  'rgb(164, 16, 52)']},
            'domain': {'x': [0, .32]},
            'hole': .4,
            'hoverinfo':'percent+name',
            'textinfo':'none'
        },
        {
            'labels': labels2,
            'values': values2,
            'type': 'pie',
            'name': 'Allston',
            'marker': {'colors': ['rgb(52, 73, 94)',
                                  'rgb(164, 16, 52)']},
            'domain': {'x': [.33, .65]},
            'hole': .4,
            'hoverinfo':'percent+name',
            'textinfo':'none'

        },
        {
            'labels': labels3,
            'values': values3,
            'type': 'pie',
            'name': 'Boston',
            'marker': {'colors': ['rgb(241, 196, 15)',
                                  'rgb(164, 16, 52)']},
            'domain': {'x': [.66, 1]},
            'hole': .4,
            'hoverinfo':'percent+name',
            'textinfo':'none'
        }
    ],
    'layout': {'title': "Harvard's Presence In Cities",
              "annotations": [
                {
                    "font": {
                        "size": 40
                    },
                    "showarrow": False,
                    "text": "C",
                    "x": 0.143,
                    "y": 0.5
                },
                {
                    "font": {
                        "size": 40
                    },
                    "showarrow": False,
                    "text": "A",
                    "x": 0.49,
                    "y": 0.5
                },
                {
                    "font": {
                        "size": 40
                    },
                    "showarrow": False,
                    "text": "B",
                    "x": 0.85,
                    "y": 0.5
                }
        ]}
}


py.iplot(fig, filename='cambridge_harvard')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~hangulu/0 or inside your plot.ly account where it is named 'cambridge_harvard'
