# Visualizing the Growth of Skyscrapers

In [1]:
import pandas as pd

skyscrapers = pd.read_csv('csvs/skyscrapers.csv')
print len(skyscrapers)
print skyscrapers.iloc[0]

2427
name                   1 Bank of America Center
rank                                        720
functions                                office
structural_material                   composite
status                                      COM
city                                  Charlotte
city_locode                                 CLT
start                                      2007
completed                                  2010
floors_above                                 32
height                                   147.52
longitude                              -80.8408
latitude                                35.2265
dem                                         NaN
Name: 0, dtype: object


## Clean the data

According to [Wikipedia](https://en.wikipedia.org/wiki/Skyscraper), a skyscraper is a building taller than 50 meters (164 feet).

I'll drop all the buildings that have yet to be completed as well as any buildings smaller than 50 meters.

In [3]:
skyscrapers = skyscrapers.dropna(subset=['completed'])
skyscrapers = skyscrapers[skyscrapers['height'] > 50]
print len(skyscrapers)

2337


Let's see the breakdown of how many skyscrapers each city in the database has.

In [20]:
skyscrapers_counts = skyscrapers['city'].value_counts()
print skyscrapers_counts


New York City        639
Chicago              391
Miami                117
San Francisco         74
Seattle               69
Houston               66
Honolulu              57
Los Angeles           51
Las Vegas             45
Minneapolis           39
Pittsburgh            38
Boston                38
Philadelphia          37
Atlanta               37
Dallas                25
Sunny Isles Beach     25
Detroit               24
San Diego             23
Austin                22
Jersey City           21
Miami Beach           19
Cleveland             18
Indianapolis          16
Baltimore             16
Phoenix               15
Milwaukee             14
St. Louis             14
Cincinnati            13
Denver                12
Columbus              12
                    ... 
Oakbrook               1
Ventnor City           1
Portsmouth (RI)        1
Waco                   1
McAllen                1
Bismarck               1
Lubbock                1
Orange Beach           1
Portland (ME)          1


_How many skyscrapers are in the United States currently?_

In [21]:
print len(skyscrapers[skyscrapers['dem'].isnull()])

2296


## Create JSON

In [12]:
# Create dict of average latitude and longitudes for each city for plotting on the map
city_info = skyscrapers[(skyscrapers['latitude'] != 0) & (skyscrapers['longitude'] != 0)].pivot_table(index=['city_locode', 'city'], values=['latitude', 'longitude'], aggfunc='mean')
city_info.reset_index(inplace=True)
city_info.set_index('city_locode', inplace=True)
city_info = city_info.rename(columns={'latitude': 'lat', 'longitude': 'lon', 'city_locode': 'c'})
skyscraper_info = skyscrapers[['name', 'completed', 'dem', 'city_locode', 'height']].rename(columns={'name': 'n', 'completed': 's', 'dem': 'e', 'city_locode': 'c', 'height': 'h'})
skyscraper_info['s'] = skyscraper_info['s'].astype(int)

n = len(skyscraper_info)
with open('json/skyscrapers.json', 'wa') as f:
    f.write('{"city_info": ')
    f.write(city_info.to_json(orient='index'))
    f.write(",")
    f.write('"skyscrapers": [')
    for i, r in skyscraper_info.iterrows():
        if str(r['e']) == 'nan':
            f.write(r[['n', 'h', 's', 'c']].to_json())
        else:
            f.write(r[['n', 'h', 's', 'c', 'e']].to_json())
        f.write(",")
    f.write("]}")


## References

* [http://skyscrapercenter.com](http://skyscrapercenter.com/)
* [https://en.wikipedia.org/wiki/Skyscraper](https://en.wikipedia.org/wiki/Skyscraper)