# Exploratory Data Analysis (EDA): Data Visualization

---


In [None]:
# Load libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load the dataset
rentals = pd.read_csv("../data/streeteasy.csv")
rentals.head()

## Univariate Analysis

---


### Question

- What is the typical price of a rental in New York City?
- What proportion of NYC rentals have a gym?


#### Quantitative Variables


In [None]:
# Create the plot
sns.boxplot(x="rent", data=rentals)
plt.show()

We can see that most rental prices fall within a range of $2500-$5000; however, there are many outliers, particularly on the high end.


In [None]:
# Create a histogram of the rent variable
sns.displot(rentals.rent, bins=10, kde=False)
plt.show()

In [None]:
# Increasing bins num for the histogram of the rent variable
sns.displot(rentals.rent, bins=50, kde=False)
plt.show()

The histogram above shows the skewed distribution of rental prices in New York City. The majority of rental prices fall within the range of $2500-$5000, with a peak around $3000. However, there are many outliers, particularly on the high end, with some rentals priced at over $10,000.


#### Categorical Variables


In [None]:
# Create boxplot of the counts in the borough variable
# The palette paramter will set the color scheme for the plot
sns.countplot(x="borough", data=rentals, hue="borough", legend=False, palette="winter")
plt.show()

From the bar chart above, we can see that most available apartments for each borough are in Manhattan, followed by Brooklyn. The Queens borough has the fewest available apartments.

In [None]:
sns.color_palette("PiYG")

In [None]:
# Define the labels in the pie chart
pie_labels = rentals["borough"].unique().tolist()
pie_labels

# Define palette for the pie chart
pie_colors = sns.color_palette("pastel")

# Create a pie chart of the available apartments in the borough variable
plt.pie(
    x=rentals.borough.value_counts(),
    labels=pie_labels,
    autopct="%1.1f%%",
    colors=pie_colors,
    shadow=True,
    
)

plt.axis("equal")
plt.show()