## Final Proposal: Airbnb Case Study in New York City##
#### Principal Investigator: Nan Jiang <br> Email: [nancy.jiang@nyu.edu](nancy.jiang@nyu.edu)

This project will study how pricing of listed Airbnb accommodation within New York City is subject to location, room types and number of reviews as well as how it compares with neighboring hotels. Specifically, New York City, due to its enormous traffic flow and skyrocketing rental fees, accommocates a huge airbnb market across its 5 distinct boroughs. This project will group current Airbnb listings by roomtypes and boroughs and visualize the outstanding difference in prices.

The key element of the project is the use of [NYC Airbnb Listing Dataset](http://data.insideairbnb.com/united-states/ny/new-york-city/2018-03-04/visualisations/listings.csv) and [Yelp's API](https://api.yelp.com/v3/businesses/{id)  (This part I still need some time to figure out). The dataset is a thorough collection of data on prices, whereabouts and room type of each listing. While the latter provides access to price levels of some selection of hotels in the neighborhood, which gives a rough sense how airbnb prices would compare across. Details of this dataset are described below in the data report.

I anticipate that the project will have three sections.

- Basic statistics about the breakdown of room types and difference in average pricing across boroughs will be reported. What is the cheapest/most expensive combination of boroughs and room_type to stay at?

- An analysis on how Airbnb accommodation compares to neighboring hotels in terms of prices. Specifically, I would like to compute the average Airbnb price within each borough and compare it to average hotel price within the neignbourhood.

- An analysis on how occupancy rate differ due to number of reviews and timing of the year within the borough that has the most listings.

## Data Report ##

**Overview:** The data behind my project comes from the [Inside Airbnb](http://insideairbnb.com/). It is an independent, non-commercial set of data that aggregated all publicly avaliable information of Airbnb listings across the world. Their [NYC Airbnb Listing Dataset](http://data.insideairbnb.com/united-states/ny/new-york-city/2018-03-04/visualisations/listings.csv) provides access to key metrics and information including, prices, location, room type, number of reviews and dates of availability for each listing.

**Important Variables:** The key series of variables I will be focusing on in this project include the prices and neighborhood of the listing. 

In my report I will also try to get access to Yelp's API to understand average hotel prices within different neighborhoods. 

The *Geography* that I will work with is New York City. However, the same analysis could be easily applied to other cities as well.

**Access** I will use [Inside Airbnb](http://insideairbnb.com/) to download and access the core dataset. Below I demonstrate that I have the ability to access the data.

**Requisite Packages** Below I bring in the packages I need...

In [13]:
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import numpy as np
import requests

%matplotlib inline

**Grabing the Data:** Below I use the Yelp API to grab prices and location information for hotels. First, I create xxx to pass to Yelp.

In [1]:
from yelpapi import YelpAPI

In [17]:
my_api_key = "V9KIR9KSlZgIunfSgwFNUY6DW1GBr-MJrwQJepeswI0ASq3koN6eWgijk2KywRxwbrecey"
my_api_key = my_api_key + "7O39IFr1DXXz-FnLjA844Wb99UKL4dh7XbCeWRiGUg_y2SOb1gyJ7mWnYx"

Y = YelpAPI(my_api_key)

type(Y)

yelpapi.yelpapi.YelpAPI

yelpapi.yelpapi.YelpAPI

In [120]:
url = "http://data.insideairbnb.com/united-states/ny/new-york-city/2018-03-04/visualisations/listings.csv"
ablist = pd.read_csv(url)

To clean up the dataset, I first removed all listings whose price is less than 10 USD, as according to [Airbnb](airbnb.com), 10 USD is the minimum starting price. 

In [126]:
ablist["price"] = pd.to_numeric(ablist["price"])
ablist = ablist[ablist.price >= 10]

I then picked out the columns of interest to me:

In [127]:
dfhost = ablist[["id","host_name"]]
dflist = ablist[["price","minimum_nights"]]
dfreview = ablist[["number_of_reviews","reviews_per_month"]]
dfavail = ablist [["availability_365"]]

By concatenating them together and setting index, I arrived at the cleaned up version dataframe I can readily use to create pivot tables and solve for section one. 

In [133]:
ablists = pd.concat([dfhost, dflist, dfreview, dfavail], axis=1, join_axes=[dfhost.index])
ablists = ablist.set_index(["neighbourhood_group","room_type"])
newab = ablists.sort_index(ascending = True)
newab

Unnamed: 0_level_0,Unnamed: 1_level_0,id,name,host_id,host_name,neighbourhood,latitude,longitude,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
neighbourhood_group,room_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Bronx,Entire home/apt,63610,DOMINIQUE'S NY CHIC SUITE-wifi/metro/sleeps 3*...,310670,Vie.,Eastchester,40.880573,-73.835723,99,2,15,2018-01-01,0.25,9,365
Bronx,Entire home/apt,149777,"Artsy, one bedroom 20 min from 42nd Grand Cent...",716306,"Dee, Dre & Mama Shelley",Woodlawn,40.897475,-73.863897,77,1,127,2018-02-15,2.02,1,331
Bronx,Entire home/apt,182177,PRIVATE FLAT / APARTMENT- $SPECIAL$,873273,Christian & Carla,Allerton,40.864658,-73.857087,125,2,236,2018-02-19,2.98,4,328
Bronx,Entire home/apt,385824,New York City- Riverdale Modern two bedrooms unit,1931205,Orit,Spuyten Duyvil,40.879914,-73.916725,120,2,22,2018-01-02,0.99,1,345
Bronx,Entire home/apt,449680,Artfully Decorated 2 Bedroom Apt,1812871,Karen,Longwood,40.816108,-73.899094,100,5,67,2018-01-02,0.97,1,40
Bronx,Entire home/apt,525293,Yankee Nest,2556498,Chris,Concourse,40.828225,-73.924394,250,3,96,2018-01-02,1.41,1,299
Bronx,Entire home/apt,755528,PRIVATE BATH/TONS OF SUNLIGHT/SAFE,3684360,Enrique,Allerton,40.858397,-73.869686,49,2,127,2018-02-22,1.95,4,251
Bronx,Entire home/apt,791452,"condominium near Manhattan, free gas and electric",2556784,Claudia,Fieldston,40.887570,-73.905217,60,1,8,2017-08-14,0.39,1,365
Bronx,Entire home/apt,958444,Great 1BD waterfront City Island NY,5214644,Noelva,City Island,40.852350,-73.788728,84,3,67,2018-01-31,1.18,1,31
Bronx,Entire home/apt,1390964,1 bdr Duplex with Garden and roof,1651080,Nicole,Mott Haven,40.809252,-73.917528,180,7,1,2014-09-26,0.02,1,358


**Problem: I tried using groupby, but had difficulty in converting the dataframegroupby object into a dataframe that I could work with**

In [131]:
ablists.groupby(["neighbourhood_group","room_type"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,id,name,host_id,host_name,neighbourhood,latitude,longitude,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
neighbourhood_group,room_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Bronx,Entire home/apt,258,258,258,258,258,258,258,258,258,258,198,197,258,258
Bronx,Private room,581,578,581,579,581,581,581,581,581,581,447,447,581,581
Bronx,Shared room,37,36,37,37,37,37,37,37,37,37,22,22,37,37
Brooklyn,Entire home/apt,9050,9050,9050,9036,9050,9050,9050,9050,9050,9050,7556,7544,9050,9050
Brooklyn,Private room,10956,10949,10956,10922,10956,10956,10956,10956,10956,10956,8045,8032,10956,10956
Brooklyn,Shared room,420,420,420,419,420,420,420,420,420,420,294,293,420,420
Manhattan,Entire home/apt,12853,12844,12853,12830,12853,12853,12853,12853,12853,12853,10244,10229,12853,12853
Manhattan,Private room,8721,8715,8721,8699,8721,8721,8721,8721,8721,8721,6620,6612,8721,8721
Manhattan,Shared room,565,565,565,564,565,565,565,565,565,565,387,387,565,565
Queens,Entire home/apt,1786,1786,1786,1785,1786,1786,1786,1786,1786,1786,1460,1455,1786,1786


As shown below, by manipulating the dataframe, I am able to quickly figure out relative popularity of the 5 boroughs in terms of airbnb listings.

In [113]:
cab = ablists.groupby(["neighbourhood_group"]).count()

In [114]:
cab

Unnamed: 0_level_0,id,host_name,price,minimum_nights,number_of_reviews,reviews_per_month,availability_365
neighbourhood_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Bronx,876,874,876,876,876,666,876
Brooklyn,20426,20377,20426,20426,20426,15869,20426
Manhattan,22139,22093,22139,22139,22139,17228,22139
Queens,5073,5064,5073,5073,5073,3900,5073
Staten Island,295,295,295,295,295,215,295


Below is an initial attempt to figure out the neighborhoods with the highest amount of airbnb listings within each borough. Ultimately, I would like to map out the concentration of the airbnb listings in New York.

In [75]:
cab.id.sort_values(ascending = False)

neighbourhood_group  neighbourhood             
Brooklyn             Williamsburg                  4325
                     Bedford-Stuyvesant            3571
Manhattan            Harlem                        2884
Brooklyn             Bushwick                      2489
Manhattan            East Village                  2096
                     Upper West Side               2037
                     Hell's Kitchen                1869
                     Upper East Side               1783
Brooklyn             Crown Heights                 1687
Manhattan            Midtown                       1279
                     East Harlem                   1223
                     Chelsea                       1174
Brooklyn             Greenpoint                    1152
Manhattan            Lower East Side               1034
                     Washington Heights             941
Queens               Astoria                        940
Manhattan            West Village                   852
