## Project Proposal

Airbnb is an online marketplace which bridges the gap between property owners and those looking to rent a space. With listings in over 220 countries, it is no surprise that this company has rapidly increased in popularity. Airbnb has allowed various countries, regions, and cities to experience a tremendous economic impact. The United States, specifically, has earned over 33.8 billion dollars from Airbnb traffic, alone. 

Perhaps one of the most popular cities in the world and on Airbnb is New York City. Boasting over 65 million visitors per year, NYC attracts tourism for its art, history, entertainment, food, architecture, and energy. For this reason, this project serves to explore a NYC Airbnb dataset to learn more about the distribution of rental properties, the impact of location on price, and other various features that impact a property’s price, number of reviews, or availability.    
 
Sources:
- https://www.osc.state.ny.us/reports/osdc/tourism-industry-new-york-city#:~:text=New%20York%20City%20hosted%2066.6,reduction%20(see%20Figure%201).
- https://www.stratosjets.com/blog/airbnb-statistics/#:~:text=Airbnb%20has%20listings%20in%20more%20than%20220%20countries%20and%20regions.&text=People%20stay%20an%20average%20of,Airbnbs%20than%20at%20hotel%20stays.





## Questions:

- How are Airbnb properties distributed in NYC? (Borough and Neighborhood)
- How do considerations like location, number of reviews, and availability affect price of Airbnb property?
- Which neighborhoods have the most listings?
- Which borough or neighborhood has the most affordable properties?


These are interesting questions to consider as they have the potential to reveal what features a customer finds most valuable. For example, will a customer go for a property at a higher price point due to its location, or will they trade of a popular location for a lower priced property. 


## Data

- https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data

## Variables

| Variable Name | Type | Description |
|:- |:- |:- |
|id | (int64) |ID of listing                          
|name | (object)| Name of listing
|host_id |(int64) |ID of property owner
|host_name |(object) |Name of property owner
|neighbourhood_group| (object) |Borough
|neighbourhood| (object) | Neighbourhood / Area
|latitude |(float64) | latitude of listing
|longitude |(float64) |longitude of listing
|room_type| (object) | listing space type
|price |(int64) : | price in dollars
|minimum_nights |(int64) | amount of nights minimum 
|number_of_reviews |(int64) |number of reviews
|last_review| (object) | latest review
|reviews_per_month |(float64) |number of reviews per month
|calculated_host_listings_count |(int64) |amount of listing per host
|availability_365 |(int64) |number of days when listing is available for booking


In [1]:
#importing the modules, libraries and packages
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
#importing csv file
df = pd.read_csv('AB_NYC_2019.csv')

In [4]:
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              48895 non-null  int64  
 1   name                            48879 non-null  object 
 2   host_id                         48895 non-null  int64  
 3   host_name                       48874 non-null  object 
 4   neighbourhood_group             48895 non-null  object 
 5   neighbourhood                   48895 non-null  object 
 6   latitude                        48895 non-null  float64
 7   longitude                       48895 non-null  float64
 8   room_type                       48895 non-null  object 
 9   price                           48895 non-null  int64  
 10  minimum_nights                  48895 non-null  int64  
 11  number_of_reviews               48895 non-null  int64  
 12  last_review                     

In [8]:
df.describe()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
count,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,38843.0,48895.0,48895.0
mean,19017140.0,67620010.0,40.728949,-73.95217,152.720687,7.029962,23.274466,1.373221,7.143982,112.781327
std,10983110.0,78610970.0,0.05453,0.046157,240.15417,20.51055,44.550582,1.680442,32.952519,131.622289
min,2539.0,2438.0,40.49979,-74.24442,0.0,1.0,0.0,0.01,1.0,0.0
25%,9471945.0,7822033.0,40.6901,-73.98307,69.0,1.0,1.0,0.19,1.0,0.0
50%,19677280.0,30793820.0,40.72307,-73.95568,106.0,3.0,5.0,0.72,1.0,45.0
75%,29152180.0,107434400.0,40.763115,-73.936275,175.0,5.0,24.0,2.02,2.0,227.0
max,36487240.0,274321300.0,40.91306,-73.71299,10000.0,1250.0,629.0,58.5,327.0,365.0
