# Exploratory Data Analysis

This notebook focuses on exploration of our sample dataset. 

In [None]:
import wandb
import pandas as pd
import pandas_profiling

In [None]:
run = wandb.init(project="nyc_airbnb", group="eda", save_code=True)

## Loading dataset

In [None]:
local_path = wandb.use_artifact("sample.csv:latest").file()
df = pd.read_csv(local_path)

## Profiling dataset

In [None]:
profile = pandas_profiling.ProfileReport(df)
profile.to_notebook_iframe()

## Applying transformations
After investigating the dataset, we see some areas for cleaning up the set:
- removing prices outside [10, 350] range,
- converting last_review data to datetime.

In [None]:
# Drop outliers
min_price = 10
max_price = 350
idx = df['price'].between(min_price, max_price)
df = df[idx].copy()

In [None]:
# Convert last_review to datetime
df['last_review'] = pd.to_datetime(df['last_review'])

In [None]:
## Investigate the cleaned up set

In [None]:
profile = pandas_profiling.ProfileReport(df)
profile.to_notebook_iframe()

In [None]:
run.finish()