## First, we need to import pandas. If you are using anaconda, it comes standard. Otherwise, you'll need to do pip install pandas in the command line. We are going to tell jupyter to refer to pandas as pd from now on, this is a "best practice" and makes for easier typing.

In [None]:
import pandas as pd

## Next, let's import the dataset as a pandas dataframe. Pandas can handle many file types, and has a different function for each one. Tonight, our data is in a csv.

In [None]:
df = pd.read_csv("NYCairbnb.csv")

## Let's look at the first 6 rows to get an idea of what the table looks like. Note that the index starts at 0, not 1.

In [None]:
df.head()

## I'd also love to know how big the dataset is. We can use the shape function to see how many rows and columns the dataframe has.

In [None]:
df.shape

## the .describe() function can be helpful for understanding our too, but only for columns with numerical data.

In [None]:
df.describe()

## Hmmm...some of this data doesn't really make sense. The above breakdown suggests that 25% of the airBNBs are available 0 nights a year. We're also suspicious of airBNBs that cost 0 dollars per night. We probably don't want to see those. While we're at it, let's remove any columns with NA values.

In [None]:
df = df.dropna()

df.drop(df.loc[df['availability_365']==0].index, inplace=True) ## drops rows with a 0 in the availability_365 column

df.drop(df.loc[df['price']==0].index, inplace=True) ## drops rows with a 0 in the price column

## Let's check out how that changes the size and stats 

In [None]:
df.shape


In [None]:
df.describe()

# Let's ask our data some questions!

## Where are most airBNBs located?

### Let's make a new dataframe where we count up the frequency of each neighborhood.

In [None]:
df = df.reset_index()

nbhoods = pd.DataFrame()

for i in range(0, df.shape[0]):
    nbhood = df.loc[i, "neighbourhood_group"]
    try:
        nbhoods.loc[nbhood, "frequency"] += 1
    except:
        nbhoods.loc[nbhood, "frequency"] = 1
        
nbhoods

## What are the stats for airBNBs just in Manhattan?

In [None]:
rooms_manhattan = df.loc[df['neighbourhood_group'] == 'Manhattan']

In [None]:
rooms_manhattan.describe()

## What else? Let's think together!