<img src="nyc.jpg" width="98%" />
Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:

- **data/airbnb_price.csv** This is a CSV file containing data on Airbnb listing prices and locations.  
  - `listing_id` : unique identifier of listing  
  - `price` : nightly listing price in USD  
  - `nbhood_full` : name of borough and neighborhood where listing is located  

- **data/airbnb_room_type.xlsx** This is an Excel file containing data on Airbnb listing descriptions and room types.  
  - `listing_id` : unique identifier of listing  
  - `description` : listing description  
  - `room_type` : Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments  

- **data/airbnb_last_review.tsv** This is a TSV file containing data on Airbnb host names and review dates.  
  - `listing_id` : unique identifier of listing  
  - `host_name` : name of listing host  
  - `last_review` : date when the listing was last reviewed  


##### Import CSV for prices

In [12]:
import pandas as pd
airbnb_price = pd.read_csv('data/airbnb_price.csv')
print(airbnb_price.shape)
airbnb_price.head()

(25209, 3)


Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"


#### Import Excel file for room types

In [13]:

airbnb_room_types = pd.read_excel('data/airbnb_room_type.xlsx')
print(airbnb_room_types.shape)
airbnb_room_types.head()

(25209, 3)


Unnamed: 0,listing_id,description,room_type
0,2595,Skylit Midtown Castle,Entire home/apt
1,3831,Cozy Entire Floor of Brownstone,Entire home/apt
2,5099,Large Cozy 1 BR Apartment In Midtown East,Entire home/apt
3,5178,Large Furnished Room Near B'way,private room
4,5238,Cute & Cozy Lower East Side 1 bdrm,Entire home/apt


#### Import TSV for review dates

In [21]:
airbnb_last_review = pd.read_csv('data/airbnb_last_review.tsv', sep='\t')
print(airbnb_last_review.shape)
airbnb_last_review.head()

(25209, 3)


Unnamed: 0,listing_id,host_name,last_review
0,2595,Jennifer,May 21 2019
1,3831,LisaRoxanne,July 05 2019
2,5099,Chris,June 22 2019
3,5178,Shunichi,June 24 2019
4,5238,Ben,June 09 2019


#### Join the three data frames together into one

In [34]:
listings = pd.merge(airbnb_price, airbnb_room_types, on='listing_id')
listings = pd.merge(listings, airbnb_last_review, on='listing_id')
listings.head()

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review
0,2595,225 dollars,"Manhattan, Midtown",Skylit Midtown Castle,Entire home/apt,Jennifer,May 21 2019
1,3831,89 dollars,"Brooklyn, Clinton Hill",Cozy Entire Floor of Brownstone,Entire home/apt,LisaRoxanne,July 05 2019
2,5099,200 dollars,"Manhattan, Murray Hill",Large Cozy 1 BR Apartment In Midtown East,Entire home/apt,Chris,June 22 2019
3,5178,79 dollars,"Manhattan, Hell's Kitchen",Large Furnished Room Near B'way,private room,Shunichi,June 24 2019
4,5238,150 dollars,"Manhattan, Chinatown",Cute & Cozy Lower East Side 1 bdrm,Entire home/apt,Ben,June 09 2019


#### What are the dates of the earliest and most recent reviews? To use a function like max()/min() on last_review date column, it needs to be converted to datetime type

In [31]:
listings['last_review_date'] = pd.to_datetime(listings['last_review'], format='%B %d %Y')
print(listings['last_review_date'])
first_reviewed = listings['last_review_date'].min()
last_reviewed = listings['last_review_date'].max()
first_reviewed


0       2019-05-21
1       2019-07-05
2       2019-06-22
3       2019-06-24
4       2019-06-09
           ...    
25204   2019-07-07
25205   2019-07-07
25206   2019-07-07
25207   2019-07-07
25208   2019-07-08
Name: last_review_date, Length: 25209, dtype: datetime64[ns]


Timestamp('2019-01-01 00:00:00')

#### How many of the listings are private rooms? Since there are differences in capitalization, make capitalization consistent

In [33]:
listings['room_type'] = listings['room_type'].str.lower()
private_room_count = listings[listings['room_type'] == 'private_room'].shape[0]

##### What is the average listing price? To convert price to numeric, remove " dollars" from each value

In [39]:
listings['price_clean'] = listings['price'].str.replace(' dollars', '').astype(float)
avg_price = listings['price_clean'].mean()
avg_price

np.float64(141.7779364512674)

In [40]:
review_dates = pd.DataFrame({
    'first_reviewed': [first_reviewed],
    'last_reviewed': [last_reviewed],
    'nb_private_rooms': [private_room_count],
    'avg_price': [round(avg_price, 2)]
})

print(review_dates)

  first_reviewed last_reviewed  nb_private_rooms  avg_price
0     2019-01-01    2019-07-09                 0     141.78
