# Exploratory data analysis (EDA)

You will learn how to systematically approach investigating an unknown dataset while maintaining a creative and open mind to search for insights.

## Context
Airbnb is an online marketplace for people to rent places to stay. 

Airbnb has rolled out a new service to help listers set prices. Airbnb makes a percentage commission off of the listings, so they are incentivized to help listers price optimally; that is, at the maximum possible point where they will still close a deal. You are an Airbnb consultant helping with this new pricing service.

## Goal

We are going to focus on a question: which features are helpful for finding out the appropriate listing price?

## Load Data

In [2]:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import folium

In [3]:
listings = pd.read_csv('data/airbnb_nyc.csv')

In [4]:
listings

Unnamed: 0,id,name,summary,description,experiences_offered,neighborhood_overview,transit,house_rules,host_id,host_since,...,hot_tub_sauna_or_pool,internet,long_term_stays,pets_allowed,private_entrance,secure,self_check_in,smoking_allowed,accessible,event_suitable
0,2539,Clean & quiet apt home by the park,Renovated apt home in elevator building.,Renovated apt home in elevator building. Spaci...,none,Close to Prospect Park and Historic Ditmas Park,Very close to F and G trains and Express bus i...,-The security and comfort of all our guests is...,2787,39698.0,...,-1,1,1,-1,-1,1,1,-1,1,1
1,3647,THE VILLAGE OF HARLEM....NEW YORK !,,WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY T...,none,,,Upon arrival please have a legibile copy of yo...,4632,39777.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
2,7750,Huge 2 BR Upper East Cental Park,,Large Furnished 2BR one block to Central Park...,none,,,,17985,39953.0,...,-1,1,-1,1,-1,-1,-1,-1,-1,-1
3,8505,Sunny Bedroom Across Prospect Park,Just renovated sun drenched bedroom in a quiet...,Just renovated sun drenched bedroom in a quiet...,none,Quiet and beautiful Windsor Terrace. The apart...,Ten minutes walk to the 15th sheet F&G train s...,- No shoes in the house - Quiet hours after 11...,25326,40006.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
4,8700,Magnifique Suite au N de Manhattan - vue Cloitres,Suite de 20 m2 a 5 min des 2 lignes de metro a...,Suite de 20 m2 a 5 min des 2 lignes de metro a...,none,,Metro 1 et A,,26394,40014.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30174,36484363,QUIT PRIVATE HOUSE,THE PUBLIC TRANSPORTATION: THE TRAIN STATION I...,THE PUBLIC TRANSPORTATION: THE TRAIN STATION I...,none,QUIT QUIT QUIT !!!!!!,TRAIN STATION 5 MINUTE UBER OR 15 MINUTE WALK ...,"Guest should not wear shoes, no smoking mariju...",107716952,42722.0,...,-1,1,-1,-1,-1,-1,1,-1,-1,1
30175,36484665,Charming one bedroom - newly renovated rowhouse,"This one bedroom in a large, newly renovated r...","This one bedroom in a large, newly renovated r...",none,"There's an endless number of new restaurants, ...",We are three blocks from the G subway and abou...,,8232441,41504.0,...,-1,1,-1,-1,1,-1,1,-1,-1,-1
30176,36485057,Affordable room in Bushwick/East Williamsburg,,,none,,,,6570630,41419.0,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
30177,36485609,43rd St. Time Square-cozy single bed,,,none,,,,30985759,42104.0,...,-1,1,-1,-1,-1,-1,1,-1,-1,-1


In [5]:
listings.columns

Index(['id', 'name', 'summary', 'description', 'experiences_offered',
       'neighborhood_overview', 'transit', 'house_rules', 'host_id',
       'host_since', 'host_response_time', 'host_response_rate',
       'host_is_superhost', 'host_listings_count', 'host_identity_verified',
       'street', 'neighbourhood', 'latitude', 'longitude', 'property_type',
       'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds',
       'bed_type', 'amenities', 'price', 'guests_included', 'extra_people',
       'minimum_nights', 'calendar_updated', 'has_availability',
       'availability_30', 'availability_60', 'availability_90',
       'availability_365', 'number_of_reviews', 'number_of_reviews_ltm',
       'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'instant_bookable', 'cancellation_policy',
       'calculated_host_listings_count',


Please check out data dictionary [here](https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit#gid=982310896)

## Activities

**Q**: Pick an ```latitude``` and ```longitude``` from the dataset. Using the two coordinates, create a map. See that you can zoom in and out based on the Leaflet maps.

In [7]:
listings[['latitude','longitude']].head()

Unnamed: 0,latitude,longitude
0,40.64749,-73.97237
1,40.80902,-73.9419
2,40.79685,-73.94872
3,40.65599,-73.97519
4,40.86754,-73.92639


In [13]:
lat= 40.64749
long= -73.97237

m= folium.Map(location= [lat, long], zoom_start=10)

m

**Q**: Folium map has different styles available. Using the following `tiles` argument:

* `Stamen Toner`
* `Stamen Terrain`
* `Stamen Watercolor`
* `CartoDB positron`
* `CartoDB dark_matter`

argument, play around with the maps

In [19]:
lat= 40.64749
long= -73.97237

m= folium.Map(location= [lat, long], zoom_start=10, tiles= 'Stamen Toner')
a= folium.Map(location= [lat, long], zoom_start=10, tiles= 'Stamen Terrain')
p= folium.Map(location= [lat, long], zoom_start=10, tiles= 'Stamen Watercolor')
l= folium.Map(location= [lat, long], zoom_start=10, tiles= 'CartoDB positron')
e= folium.Map(location= [lat, long], zoom_start=10, tiles= 'CartoDB dark_matter')



**Q**: Using the heat maps, let's understand the price distribution with the location. 

Notice that Folium has different parameters available such as:
- `min_opacity`
- `radius`
- `blur`
- and etc.

Play around with these parameters

In [None]:
###I can't run this it crashes.
m = folium.Map(location=[listings['latitude'].mean(), listings['longitude'].mean()], zoom_start=10)

max_price = listings['price'].max()

for index, row in listings.iterrows():
    price_opacity = row['price'] / max_price
    folium.Marker([row['latitude'], row['longitude']], opacity= price_opacity).add_to(m)
    


m

**Q**: Using the heat maps, let's understand the `review_scores_rating` distribution with the location. What problem did you encounter? How are you solving this problem?

In [None]:
m= folium.Map(location=[listings['latitude'].mean(), listing['longitude'].mean()], zoom_start=10)

color_map = {0.0: 21.00, 'blue', 21.00: 41.00, 'green', 41.00: 61.00: 'red', 61.00: 81.00: 'purple', 81.00 :101.00: 'orange'}

color_map = {
    0.0: 21.0: 'blue',
    22.0: 40.0: 'green',
    40.0: 60.0: 'red',
    60.0: 80.0: 'purple',
    80.0: 100.0: 'orange'
}

for index, row in listings.iterrows():
    review_score = row['review_scores_rating']
    for range_start, range_end in color_map.items():
        if range_start <= review_score <= range_end:
            color = color_map[range_start]
    if color:
        folium.Marker([row['latitude'], row['longitude']], icon=folium.Icon(color=color)).add_to(m)

m

## References

"New York", Inside Airbnb, http://insideairbnb.com/get-the-data.html

