![NYC Skyline](nyc.jpg)

Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. 
Three files containing data on 2019 Airbnb listings are available to you:

**data/airbnb_price.csv**
This is a CSV file containing data on Airbnb listing prices and locations.
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

In [71]:
# Import necessary packages
import pandas as pd
import numpy as np

# Begin coding here ...
# Use as many cells as you 


# load csv data 
air_bnb_price_data = pd.read_csv('data/airbnb_price.csv')
print(air_bnb_price_data.head(5))

air_bnb_room_workbook = pd.ExcelFile('data/airbnb_room_type.xlsx')
air_bnb_room_sheetnames = air_bnb_room_workbook.sheet_names
print(air_bnb_room_sheetnames)
air_bnb_room_data = air_bnb_room_workbook.parse('airbnb_room_type')
print(air_bnb_room_data.head(5))

air_bnb_review_data = pd.read_csv('data/airbnb_last_review.tsv',parse_dates=['last_review'],sep = '\t')
print(air_bnb_review_data.head(5))

print('\n What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names. \n')
earliest_dates = air_bnb_review_data.sort_values(by='last_review',ascending=True)['last_review'].values[0]
print(earliest_dates)
most_recent_dates = air_bnb_review_data.sort_values(by='last_review',ascending=False)['last_review'].values[0]
print(most_recent_dates)



   listing_id        price                nbhood_full
0        2595  225 dollars         Manhattan, Midtown
1        3831   89 dollars     Brooklyn, Clinton Hill
2        5099  200 dollars     Manhattan, Murray Hill
3        5178   79 dollars  Manhattan, Hell's Kitchen
4        5238  150 dollars       Manhattan, Chinatown
['airbnb_room_type']
   listing_id                                description        room_type
0        2595                      Skylit Midtown Castle  Entire home/apt
1        3831            Cozy Entire Floor of Brownstone  Entire home/apt
2        5099  Large Cozy 1 BR Apartment In Midtown East  Entire home/apt
3        5178            Large Furnished Room Near B'way     private room
4        5238         Cute & Cozy Lower East Side 1 bdrm  Entire home/apt
   listing_id    host_name last_review
0        2595     Jennifer  2019-05-21
1        3831  LisaRoxanne  2019-07-05
2        5099        Chris  2019-06-22
3        5178     Shunichi  2019-06-24
4        5238   

In [72]:


print('\n How many of the listings are private rooms? Save this into any variable.\n')

print(air_bnb_room_data['room_type'].unique())

air_bnb_room_data['room_type'] = air_bnb_room_data['room_type'].str.lower()

print(air_bnb_room_data['room_type'].unique())
print(air_bnb_room_data['room_type'].isna().sum())
private_rooms_data = air_bnb_room_data.loc[air_bnb_room_data['room_type'] == 'private room']

print(private_rooms_data.shape)
total_private_rooms=private_rooms_data.shape[0]
print(total_private_rooms)


 How many of the listings are private rooms? Save this into any variable.

['Entire home/apt' 'private room' 'Private room' 'entire home/apt'
 'PRIVATE ROOM' 'shared room' 'ENTIRE HOME/APT' 'Shared room'
 'SHARED ROOM']
['entire home/apt' 'private room' 'shared room']
0
(11356, 3)
11356


In [73]:
print('What is the average listing price? Round to the nearest two decimal places and save into a variable.\n')
air_bnb_price_data = pd.read_csv('data/airbnb_price.csv')
print(air_bnb_price_data.head(5))
# Price attribute cleaning and processing
print(air_bnb_price_data['price'].isna().sum())
air_bnb_price_data['price'] = air_bnb_price_data['price'].str.replace('dollars','')
air_bnb_price_data['price'] = air_bnb_price_data['price'].str.strip()
print(air_bnb_price_data.head(5))
air_bnb_price_data['price'] = air_bnb_price_data['price'].astype('float')

avg_list_price = air_bnb_price_data['price'].mean().round(2)

# Assuming earliest_dates and most_recent_dates are defined elsewhere in the notebook
d = {
    'first_reviewed': earliest_dates,
    'last_reviewed': most_recent_dates,
    'nb_private_rooms': total_private_rooms,
    'avg_price': avg_list_price
}
review_dates = pd.DataFrame(data=d,index=[0])
print(review_dates)

What is the average listing price? Round to the nearest two decimal places and save into a variable.

   listing_id        price                nbhood_full
0        2595  225 dollars         Manhattan, Midtown
1        3831   89 dollars     Brooklyn, Clinton Hill
2        5099  200 dollars     Manhattan, Murray Hill
3        5178   79 dollars  Manhattan, Hell's Kitchen
4        5238  150 dollars       Manhattan, Chinatown
0
   listing_id price                nbhood_full
0        2595   225         Manhattan, Midtown
1        3831    89     Brooklyn, Clinton Hill
2        5099   200     Manhattan, Murray Hill
3        5178    79  Manhattan, Hell's Kitchen
4        5238   150       Manhattan, Chinatown
  first_reviewed last_reviewed  nb_private_rooms  avg_price
0     2019-01-01    2019-07-09             11356     141.78
