# AirBnb Listing Analysis

## Date Prep & QA
* Read in the AirBNB listings Data (use low_memory=False and encoding="ISO-8859-1 in read_csv)
* Cast any data columns as a datetime format.
* Filter down the data to just listings in the city of Paris.
* QA the paris listings data: check for missing values, and calculate the minimum, and average for each numberic field.

In [1]:
import pandas as pd

listings = pd.read_csv(
    "Listings.csv", 
    encoding="ISO-8859-1", 
    low_memory=False,
    parse_dates=["host_since"]
)

FileNotFoundError: [Errno 2] No such file or directory: 'Listings.csv'

In [None]:
listings.head()

In [None]:
listings.info()

In [None]:
paris_listings = (
    listings
    .query("city == 'Paris'")
    .loc[:, ["host_since", "neighbourhood", "city", "accommodates", "price" ]]
)
         
paris_listings.info()

In [None]:
paris_listings.isna().sum()

In [None]:
paris_listings.describe()

In [None]:
paris_listings.query("accommodates == 0").count()

In [None]:
paris_listings.query("price == 0").count()

In [None]:
paris_listings.query("price == 0 and accommodates == 0").count()

# Prepare for Visualization

* Create a DataFrame called paris_listings_neighbourhood. Group the Paris listings by neighbourhood and calculate the average price for each. Sort by price in ascending order.
* Create a DataFrame called paris_listings_neighbourhood. Filter your data down to the most expensice neighbourhood in Parus. Group it by accommodates and calculate the average price for each. Sort by price in ascending order.
* Finally, creeate a DataFrame called paris_listings_over_time. Group the data by the year component of host_sincee. Calculate a count of rows to get the number of new hosts for each year, and the average price of listings for each year.

In [None]:
paris_listings_neighbourhood = (
    paris_listings
    .groupby("neighbourhood")
    .agg({"price": "mean"})
    .sort_values("price")
)

paris_listings_neighbourhood.head()

In [None]:
paris_listings_neighbourhood.tail()

In [None]:
paris_listings_accommodates = (
    paris_listings
    .query("neighbourhood == 'Elysee'")
    .groupby("accommodates")
    .agg({"price": "mean"})
    .sort_values("price")
)

paris_listings_accommodates.head()

In [None]:
paris_listings_accommodates.tail()

In [None]:
paris_listings_over_time = (
    paris_listings
    .set_index("host_since")
    .resample("Y")
    .agg({
        "neighbourhood": "count", 
        "price": "mean"
    })
)

paris_listings_over_time.head()

In [None]:
paris_listings_over_time.tail()

# Objective 3: Visualize the Data
* Build a horizontal bar chart of average rent price by neighbourhood. Which neighbourhoods stand out?
* Build a horizontal bar chart of averahe price by accomodates in the most expensive neighbourhood. are the results intuitive?
* Finally, build line charts of new hosts per year and average price by year. What happened to new hosts in 2015? Was average price impacted
* Plot both time series in a dual axis line chart!

In [None]:
import seaborn as sns
(paris_listings_neighbourhood
    .plot
    .barh(
        title="Average Listing Price by Price Neighbourhood",
        xlabel="Price Per Night (Euros)",
        ylabel="neighbourhood",
        legend=None
    )
         
)

sns.despine()

In [None]:
import seaborn as sns
(paris_listings_accommodates
    .plot
    .barh(
        title="Average Listing Price by Accomodation Number",
        xlabel="Price Per Night (Euros)",
        ylabel="Accomodation Capacity",
        legend=None
    )
         
)

sns.despine()

In [None]:
paris_listings_over_time["neighbourhood"].plot(
    ylabel="New Hosts",
    title="New AirBnb Hosts in Paris Over Time"
)

sns.despine()

In [None]:
paris_listings_over_time["price"].plot(
    ylabel="Average Price (Euros)",
    title="Average AirBnb Price in Hosts in Paris Over Time"
)

sns.despine()

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.plot(
    paris_listings_over_time.index,
    paris_listings_over_time["neighbourhood"],
    label="New Hosts",
    c="pink"
)

ax.set_ylabel("New Hosts")

ax2 = ax.twinx()

ax2.plot(
    paris_listings_over_time.index,
    paris_listings_over_time["price"],
    label="Average Price"
)
ax2.set_ylim(0)
ax2.set_ylabel("Average Price")

ax.set_title("2015 Regulations Lead to Fewer New Hosts, Higher Prices")