![NYC Skyline](nyc.jpg)

Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. 
Three files containing data on 2019 Airbnb listings are available to you:

**data/airbnb_price.csv**
This is a CSV file containing data on Airbnb listing prices and locations.
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

In [256]:
# We've loaded your first package for you! You can add as many cells as you need.
import numpy as np
import pandas as pd

# Begin coding here ...

In [257]:
# Load datasets from data folder
airbnb_price = pd.read_csv("data/airbnb_price.csv")
airbnb_price.dtypes

listing_id      int64
price          object
nbhood_full    object
dtype: object

In [258]:
# Load datasets from data folder
airbnb_room_type = pd.read_excel("data/airbnb_room_type.xlsx")
airbnb_room_type.dtypes

listing_id      int64
description    object
room_type      object
dtype: object

In [259]:
# Load datasets from data folder
airbnb_last_review = pd.read_csv("data/airbnb_last_review.tsv", sep='\t')
airbnb_last_review.dtypes

listing_id      int64
host_name      object
last_review    object
dtype: object

In [260]:
# Converting the last_review column to datetime format
airbnb_last_review['last_review'] = pd.to_datetime(airbnb_last_review['last_review'])
airbnb_last_review['last_review'].value_counts()

2019-06-23    1413
2019-07-01    1359
2019-06-30    1341
2019-06-24     875
2019-07-07     718
              ... 
2019-02-04       8
2019-02-27       8
2019-02-05       7
2019-02-13       6
2019-07-09       1
Name: last_review, Length: 190, dtype: int64

In [261]:
# Dates of the earliest and most recent reviews
first_reviewed = airbnb_last_review['last_review'].min()
last_reviewed = airbnb_last_review['last_review'].max()

print('Earliest Review :', first_reviewed)
print('Latest Review :', last_reviewed)

Earliest Review : 2019-01-01 00:00:00
Latest Review : 2019-07-09 00:00:00


In [262]:
# Checking room_type values
airbnb_room_type['room_type'].value_counts()

Entire home/apt    8458
Private room       7241
entire home/apt    2665
private room       2248
ENTIRE HOME/APT    2143
PRIVATE ROOM       1867
Shared room         380
shared room         110
SHARED ROOM          97
Name: room_type, dtype: int64

In [263]:
# Cleaning and categorizing room_type values
airbnb_room_type['room_type'] = airbnb_room_type['room_type'].str.lower()
airbnb_room_type['room_type'] = airbnb_room_type['room_type'].str.capitalize()
airbnb_room_type['room_type'].value_counts()

Entire home/apt    13266
Private room       11356
Shared room          587
Name: room_type, dtype: int64

In [264]:
# Number of private room listings
nb_private_rooms = airbnb_room_type[airbnb_room_type['room_type'] == 'Private room'].shape[0]

print('Number of Private rooms: ', nb_private_rooms)

Number of Private rooms:  11356


In [265]:
# Checking price values
airbnb_price['price'].value_counts()

150 dollars     982
100 dollars     891
60 dollars      717
50 dollars      709
75 dollars      691
               ... 
1250 dollars      1
555 dollars       1
689 dollars       1
394 dollars       1
323 dollars       1
Name: price, Length: 536, dtype: int64

In [266]:
# Cleaning of price values
airbnb_price['price'] = airbnb_price['price'].str.replace(' dollars', '')
airbnb_price['price'] = airbnb_price['price'].astype(int)
airbnb_price['price'].value_counts()

150     982
100     891
60      717
50      709
75      691
       ... 
1250      1
555       1
689       1
394       1
323       1
Name: price, Length: 536, dtype: int64

In [267]:
# Checking airbnb_price dtype
airbnb_price.dtypes

listing_id      int64
price           int64
nbhood_full    object
dtype: object

In [268]:
# Average listing price. Round to the nearest penny
avg_price = round(airbnb_price['price'].mean(), 2)

print('Average listing price :', avg_price, 'dollars')

Average listing price : 141.78 dollars


In [269]:
# Create review_dates DataFrame with four columns in the following order
# review_dates_order: first_reviewed, last_reviewed, nb_private_rooms, avg_price
review_dates = pd.DataFrame({
    'first_reviewed': [first_reviewed],
    'last_reviewed': [last_reviewed],
    'nb_private_rooms': [nb_private_rooms],
    'avg_price': [avg_price]
})

# Print the DataFrame
print(review_dates)

  first_reviewed last_reviewed  nb_private_rooms  avg_price
0     2019-01-01    2019-07-09             11356     141.78
