In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np

In [2]:
# read csv file into pandas dataframe

listings = pd.read_csv('listings.csv')
listings.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2318,Casa Madrona - Urban Oasis 1 block from the park!,2536,Megan,Central Area,Madrona,47.61082,-122.29082,Entire home/apt,296,7,32,2020-02-01,0.23,2,86
1,6606,"Fab, private seattle urban cottage!",14942,Joyce,Other neighborhoods,Wallingford,47.65411,-122.33761,Entire home/apt,90,30,150,2019-09-28,1.15,3,45
2,9419,Glorious sun room w/ memory foambed,30559,Angielena,Other neighborhoods,Georgetown,47.55017,-122.31937,Private room,62,2,148,2019-12-27,1.26,8,365
3,9460,Downtown Convention Center B&B -- Free Minibar,30832,Siena,Downtown,First Hill,47.61265,-122.32936,Private room,79,3,466,2020-03-07,3.63,4,10
4,9531,The Adorable Sweet Orange Craftsman,31481,Cassie,West Seattle,Fairmount Park,47.55539,-122.38474,Entire home/apt,165,3,40,2019-12-30,0.4,2,276


We will only need some of the columns which are 'neighbourhood', 'latitude', 'longitude'. So let's create a new dataframe with the needed columns.

In [4]:
# dataframe only with the columns neighbourhood, latitude, longitude and price
listings = listings[['neighbourhood', 'latitude', 'longithttp://localhost:8889/notebooks/repos/Coursera_Capstone/capstone_the_battle_of_neighborhoods_week_2.ipynb#ude', 'price']]
listings.head()

Unnamed: 0,neighbourhood,latitude,longitude,price
0,Madrona,47.61082,-122.29082,296
1,Wallingford,47.65411,-122.33761,90
2,Georgetown,47.55017,-122.31937,62
3,First Hill,47.61265,-122.32936,79
4,Fairmount Park,47.55539,-122.38474,165


Now let's see if there is any NaN values in out new dataframe.

In [23]:
listings.isnull().values.any()


False

It returns False, so we don't have any NaN valus in our dataframe. We can continue analyzing without dropping any values. Now we will create a new dataframe with the average prices for each neighborhood.

In [29]:
neighborhoods = pd.DataFrame(listings.groupby(['neighbourhood']).mean())
neighborhoods = neighborhoods.reset_index()
neighborhoods.head()


Unnamed: 0,neighbourhood,latitude,longitude,price
0,Adams,47.671661,-122.385505,150.292035
1,Alki,47.575465,-122.407382,151.793478
2,Arbor Heights,47.510568,-122.380087,117.45
3,Atlantic,47.595194,-122.304142,217.438095
4,Belltown,47.615327,-122.345001,181.958025


__Create a map of Seattle with neighborhoods.__

We will create a city of Seattle folium map in order to see the neighborhoods on the map. Feel free to click on the markers to see the neighborhood names.

In [30]:
import folium

# Seattle latitude and longitude values
from geopy.geocoders import Nominatim
address = 'Seattle, WA'

geolocator = Nominatim(user_agent="seattle_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Seattle are {}, {}.'.format(latitude, longitude))


# create map of Toronto using latitude and longitude values
map_seattle = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['latitude'], neighborhoods['longitude'], neighborhoods['neighbourhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_seattle)
    
map_seattle

The geograpical coordinates of Seattle are 47.6038321, -122.3300624.
