# Airbnb in New York City - an explorative data analysis

On todays session we want to analyze the Airbnb offer in New York City using and improving our already learned knowledge about data science!

<img src="./_img/NYC_2.jpg"> 

For this tutorial the following commands will be very helpful:

| commands | meaning |
| :-: | :-: |
| `pd.read_csv(<path>)` | Reading a .csv-File from your file System | 
| `len()` | return the length of an input object, for example the row count of an pandas dataframe | 
| `df.columns()` | returns the column names of a dataframe |
| `df.head(<n>)` | returns the first n rows of a dataframe | 
| `df.tail(<n>)` | returns the last n rows of a dataframe | 
| `df.loc[]` | locator to filter a dataframe | 
| `df.loc["<column_name>"]` | filter a dataframe to a given column / return only the specified column of a dataframe | 
| `df.sample(<n>)` | picks randomly n rows out of a dataframe | 
| `df.shape()` | returns the dimensions of a dataframe | 
| `df.unique()` | returns the unique values for each column of a dataframe |  
| `df.groupby("<column_name>")` | groups the information of a dataframe by the specified column | 
| `df.groupby("<column_name>").size()` | return the number of observation in each group | 
| `df.groupby("<column_name>").sort_values()` | sort the output by values, may based on a specified column | 
| `df.groupby("<column_name>").xs("<column_name>")` | extract a specific value set out of a grouped information which consists out of a multi level index | 
| `series.mean()` | calculates the mean value based on the given information list | 
| `df.describe()` | returns a statistical description for all columns of a given dataframe | 

## Loading packages and dataset

In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
airbnb_NY = pd.read_csv("../data/Airbnb_NYC.csv")

## Get to know the dataset

> **Question 1A)** How many observations yields the dataset?

In [None]:
airbnb_NY.shape

> **Question 1B)** What columns are included in the dataset?

In [None]:
airbnb_NY.columns

Okay, now let's get a quick overview:

> **Question 2A)** show the first 10 observations of the dataset!

In [None]:
airbnb_NY.head(10)

> **Question 2B)** show the last 15 observations of the dataset!

In [None]:
airbnb_NY.tail(15)

> **Question 2C)** choose randomly 15 observations out of the dataset and show them!

In [None]:
airbnb_NY.sample(15)

> **Question 2D)** like C) but only the borough, property type and the price should be shown

In [None]:
airbnb_NY.sample(15)[["Boroughs", "Prop_Type", "Price"]]

## The spatial context matters

Let's take a deeper look on the dataset

> **Question 3A)** Which boroughs of NYC are covered by the dataset?

In [None]:
airbnb_NY["Boroughs"].unique()

> **Question 3B)** Which different kind of accomodations exists?

In [None]:
airbnb_NY["Prop_Type"].unique()

> **Question 3C)** How many airbnb offers exists per borough?

In [None]:
airbnb_NY.groupby(["Boroughs"]).size()

> **Question 3D)** Order the output from C) by the count of offers!

In [None]:
airbnb_NY.groupby(["Boroughs"]).size().sort_values()

> **Question 3E)** Refine your search: how is the type of property influenced by the boroughs? Provide a list view!

In [None]:
airbnb_NY.groupby(["Prop_Type", "Boroughs"]).size().unstack()

> **Question 3F)** Provide a plot to visualize E) 
hint: try at first to provide a barplot only for Bronx and afterwars for all boroughs at the same time)

In [None]:
airbnb_NY.groupby(["Prop_Type", "Boroughs"]).size().unstack(level=0).plot.bar()

## What about the prices?

> **Question 4A)** Which is the most expensive AirBnB?

In [None]:
airbnb_NY.loc[airbnb_NY["Price"] == airbnb_NY["Price"].max()]

> **Question 4B)** How many airbnb exists that are in price less than 50 $?

In [None]:
(airbnb_NY["Price"] < 50).sum()

> **Question 4C)** Refining B) in which borough are most of them located?

In [None]:
airbnb_NY.loc[airbnb_NY["Price"] < 50].groupby("Boroughs").size()

> **Question 4D)** Take a closer look at C): provide a ranking of the mean Review-count of these airbnbs per borough!

In [None]:
airbnb_NY.loc[airbnb_NY["Price"] < 50].groupby("Boroughs")["Review_Cnt"].mean()

## Statistics count!

> **Question 5A**) What is the mean cost of an airbnb in NYC?

In [None]:
airbnb_NY["Price"].mean()

> **Question 5B)** How does the price vary?

In [None]:
airbnb_NY["Price"].std()

> **Question 5C)** How is the price changing according to the property type? which property type the less expansive one?

In [None]:
airbnb_NY.groupby("Prop_Type")["Price"].mean()

> **Question 5D)** how does the mean airbnb price is changing according to the boroughs?

In [None]:
airbnb_NY.groupby("Boroughs")["Price"].mean()

> **Question 5E)** Which are top 10 rated airbnb per borough? calculate the mean price of them!

In [None]:
for borough in airbnb_NY["Boroughs"].unique():
    top_10 = airbnb_NY.loc[airbnb_NY["Boroughs"] == borough].sort_values("Reviews30d", ascending=False)[:10]
    print(borough, top_10["Price"].mean())