![NYC Skyline](nyc.jpg)

Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. 
Three files containing data on 2019 Airbnb listings are available to you:

**data/airbnb_price.csv**
This is a CSV file containing data on Airbnb listing prices and locations.
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

### Analysis

#### Reading in data and preparation for the tasks

In [7]:
# Import necessary packages
import pandas as pd
from IPython.display import Markdown as md

In [3]:
# read in the price dataset
price = pd.read_csv("data/airbnb_price.csv")
price.head()

Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"


In [4]:
# read in the room type dataset
room_type = pd.read_excel("data/airbnb_room_type.xlsx")
room_type.head()

Unnamed: 0,listing_id,description,room_type
0,2595,Skylit Midtown Castle,Entire home/apt
1,3831,Cozy Entire Floor of Brownstone,Entire home/apt
2,5099,Large Cozy 1 BR Apartment In Midtown East,Entire home/apt
3,5178,Large Furnished Room Near B'way,private room
4,5238,Cute & Cozy Lower East Side 1 bdrm,Entire home/apt


In [5]:
# read in the reviews dataset
review = pd.read_table("data/airbnb_last_review.tsv")
review.head()

Unnamed: 0,listing_id,host_name,last_review
0,2595,Jennifer,May 21 2019
1,3831,LisaRoxanne,July 05 2019
2,5099,Chris,June 22 2019
3,5178,Shunichi,June 24 2019
4,5238,Ben,June 09 2019


In [6]:
# dates of the earliest and most recent reviews
review['dates'] = pd.to_datetime(review['last_review']).dt.date
review.head()

Unnamed: 0,listing_id,host_name,last_review,dates
0,2595,Jennifer,May 21 2019,2019-05-21
1,3831,LisaRoxanne,July 05 2019,2019-07-05
2,5099,Chris,June 22 2019,2019-06-22
3,5178,Shunichi,June 24 2019,2019-06-24
4,5238,Ben,June 09 2019,2019-06-09


### Task 1 

Airbnb is a busy company, which can make it difficult to track rental data in the busiest city in the world. However, we do want to investigate the dates in which AirBnB has operated within New York City. As a result, were were posed with a simple task to investigate.

**What are the dates of the earliest and most recent reviews?**

In [20]:
# separate the dates column as its own
date_df = review[['dates']]

# earliest reviews
earliest = date_df.sort_values(by=['dates'], ascending=True)
earliest = earliest['dates'].iloc[0]

# most recent
recent = date_df.sort_values(by=['dates'], ascending=False)
recent = recent['dates'].iloc[0]

In [22]:
# final output
print(f"After we manipulated the data, we were able to identify the most recent and earliest dates in which reviews were left.")
print(f"From the data, we found the earliest date was: {earliest}.")
print(f"Additionally, we found the most recent review to be: {recent}.")

After we manipulated the data, we were able to identify the most recent and earliest dates in which reviews were left.
From the data, we found the earliest date was: 2019-01-01.
Additionally, we found the most recent review to be: 2019-07-09.


### Task 2

In the previous task, we were able to identify the most recent and earliest reviews on AirBnB for our dataset. However, we have another important task. AirBnB offers several different types of rental spaces. These include, shared rooms, private rooms, and entire homes/apartments. These types of places differentiate on the level of privacy offerred. In this instance, we want to identify one type of these rentable spaces, and how many of these spaces exist in New York City.

**How many of the listings are private rooms?**

In [23]:
# check the unique iterations of room types
room_type['room_type'].unique()

array(['Entire home/apt', 'private room', 'Private room',
       'entire home/apt', 'PRIVATE ROOM', 'shared room',
       'ENTIRE HOME/APT', 'Shared room', 'SHARED ROOM'], dtype=object)

In [24]:
# notice there are multiple private rooms, let's convert to all lower case to ensure the options are consistent
# filter for private room and then count listing id
private = room_type[room_type['room_type'].str.lower() =='private room']
private_no = private.shape[0]

In [25]:
# final output
print(f"The number of private rooms in the dataset are: {private_no}")

The number of private rooms in the dataset are: 11356


### Task 3

In the two previous tasks, we identify the earliest and most recent review dates, as well as the number of private rooms. Now, we need to identify a monetary metric. It may be important to determine the different types of rooms available, the dates of availability and so on. However, rental price is an extremely important metric to measure. This will help us determine how much money customers/tenants will pay, how much owners will make off their rentals, and ultimately, how much money AirBnb can make. 

**What is the average listing price?**

In [26]:
# let's check the price column again
price['price'].head()

0    225 dollars
1     89 dollars
2    200 dollars
3     79 dollars
4    150 dollars
Name: price, dtype: object

In [27]:
# as we notice, price is not actually a numeric variable as it contains the dollars string in the cells
# split the price column at the space 
price[['dols', 'scrap']] = price['price'].str.split(' ', n=1, expand=True)
# transform the amount to integer
price['dols'] = price['dols'].astype(int)

# take the average now, and keep the value to cents
avg_price = price['dols'].mean().round(2)

In [32]:
# final output
print(f"The average listed price of a one night's stay is ${avg_price}.")

The average listed price of a one night's stay is $141.78.


### Task 4

Finally, we have completed all the previous tasks to summarize some key information on AirBnb in New York City. However, we want to collect this data together in one area, to make it easier to read and find solutions to our tasks. 

**Combine the new variables into one DataFrame called review_dates with four columns in the following order: first_reviewed, last_reviewed, nb_private_rooms, and avg_price.**

In [34]:
# create a dictionary of values
review = {'first_reviewed':[earliest],
         'last_reviewed':[recent],
         'nb_private_rooms':[private_no],
         'avg_price':[avg_price]}

# convert the dictionary into a new dataframe 
review_dates = pd.DataFrame(review)

In [38]:
# final output
print(f"The final collection of our key values is:\n{review_dates}")

The final collection of our key values is:
  first_reviewed last_reviewed  nb_private_rooms  avg_price
0     2019-01-01    2019-07-09             11356     141.78


### Review of all our tasks

In [43]:
print("Task 1")
print(f"The most recent date for reviews on Airbnb is: {recent}")
print(f"The earlist date for reviews on Airbnb is: {earliest} ")
print("\nTask 2")
print(f"The number of private rooms in the dataset is: {private_no}")
print("\nTask 3")
print(f"The average list price for a single night stay is: {avg_price}")

Task 1
The most recent date for reviews on Airbnb is: 2019-07-09
The earlist date for reviews on Airbnb is: 2019-01-01 

Task 2
The number of private rooms in the dataset is: 11356

Task 3
The average list price for a single night stay is: 141.78
