# Exploratory Data Analysis (EDA)

You will learn how to systematically approach investigating an unknown dataset while maintaining a creative and open mind to search for insights.

## Context
Airbnb is an online marketplace for people to rent places to stay. 

Airbnb has rolled out a new service to help listers set prices. Airbnb makes a percentage commission off of the listings, so they are incentivized to help listers price optimally; that is, at the maximum possible point where they will still close a deal. You are an Airbnb consultant helping with this new pricing service.

## Goal

We are going to focus on a question: which features are helpful for finding out the appropriate listing price?

## Load Data

In [2]:
import pandas as pd

In [3]:
calendar = pd.read_csv('data/calendar.csv', delimiter=',')

In [4]:
calendar

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,34374201,2022-06-19,f,$130.00,$130.00,3.0,1125.0
1,1098863,2022-04-07,t,$88.00,$88.00,4.0,45.0
2,43680849,2022-01-20,t,$34.00,$34.00,150.0,1125.0
3,28833297,2022-01-03,t,$190.00,$190.00,3.0,1125.0
4,1905865,2022-05-06,t,$207.00,$207.00,3.0,1125.0
...,...,...,...,...,...,...,...
99995,24348078,2022-10-05,t,$65.00,$65.00,1.0,1125.0
99996,27579844,2022-10-11,f,$339.00,$339.00,30.0,1125.0
99997,6570718,2022-01-09,f,$85.00,$85.00,7.0,9.0
99998,29879085,2022-03-27,f,$190.00,$190.00,30.0,1125.0


## Activities

**Q**: What is the type of each element of ```calendar['date']```?

In [6]:
type(calendar['date'][0])

str

**Q**: Convert `date` into datetime class, and replace the existing column with the datetime class date.

In [8]:
calendar['date']= pd.to_datetime(calendar['date'])

In [9]:
type(calendar['date'][0])

pandas._libs.tslibs.timestamps.Timestamp

**Q**: Create additional columns in `calendar` dataframe for `year`, `month`, and `year-month` (up to year and month, but not days) of the listing.

In [15]:
calendar['year'] = calendar['date'].dt.year

In [16]:
calendar['month'] = calendar['date'].dt.month

In [18]:
calendar['year_month'] = pd.to_datetime({'year': calendar['year'], 'month': calendar['month'], 'day': 1})



In [19]:
print(calendar.head())

   listing_id       date available    price adjusted_price  minimum_nights  \
0    34374201 2022-06-19         f  $130.00        $130.00             3.0   
1     1098863 2022-04-07         t   $88.00         $88.00             4.0   
2    43680849 2022-01-20         t   $34.00         $34.00           150.0   
3    28833297 2022-01-03         t  $190.00        $190.00             3.0   
4     1905865 2022-05-06         t  $207.00        $207.00             3.0   

   maximum_nights  year  month year_month  
0          1125.0  2022      6 2022-06-01  
1            45.0  2022      4 2022-04-01  
2          1125.0  2022      1 2022-01-01  
3          1125.0  2022      1 2022-01-01  
4          1125.0  2022      5 2022-05-01  


**Q**: Create a dataframe called `Monthly Price` where each data point (or a row) is a median price for each `year-month`. 

Hint: Notice that `price` variable is string due ot the dollar sign. We need to replace the column with the numerical (either integer or float) variable. Use this code:
`calendar['price'].replace('[$,]', '', regex=True).astype(float)`

In [21]:
calendar['price']= calendar['price'].replace('[$,]', '', regex=True).astype(float)

In [23]:
Monthly_Price = calendar.groupby('year_month')['price'].median()

In [25]:
Monthly_Price.head()

year_month
2021-11-01    110.0
2021-12-01    115.0
2022-01-01    112.0
2022-02-01    118.0
2022-03-01    115.0
Name: price, dtype: float64

## References

"New York", Inside Airbnb, http://insideairbnb.com/get-the-data.html

