Importing libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Let's load and explore the dataset. This is a dataset containing information on rental properties in Bengaluru.

In [None]:
df = pd.read_csv('Bengaluru_rent.csv')
df

In [None]:
df.info()

Let's convert the price column from a string to a float so it can be used in our analysis

In [None]:
df['price'] = df['price'].str.replace(',', '').astype(float)

Let's make a box plot of monthly charges for each type of property

In [None]:
plt.figure(figsize = (6, 4))
sns.boxplot(
    data = df,
    x = 'property_type',
    y = 'price',
    order = sorted(list(df['property_type'].unique())),  # sorting boxes alphabetically
)
plt.title('Box plot of rent for each property type')
plt.xlabel('Property type')
plt.xticks(rotation = 40)
plt.ylabel('Rent in rupees')
plt.show()

But what does this box plot convey?

The first quartile (Q1) is the value below which 25% of our data lies; the third quartile (Q3) is the value below which 75% of our data lies. We can calculate these using the `.quantile()` method.

We can calculate potential outliers using the IQR method. The interquartile range, or IQR, is the range in which the middle 50% of the data falls. To calculate the IQR, you need to find the difference between the first and third quartiles of your data.

To remove potential outliers using the IQR method, we remove the data that is outside the following boundaries:

*   $Lower = Q1 - 1.5 \times IQR$
*   $Upper = Q3 + 1.5 \times IQR$

The Seaborn box plot considers the minimum and maximum values without the potential outliers to make the upper and lower ends of its box plot visually.

In [None]:
# Get first and third quartiles of the data
def q1(x): return x.quantile(0.25)  # 25% of our observations lie below this figure
def q3(x): return x.quantile(0.75)  # 75% of our observations lie below this figure

price_by_property_type = df.groupby('property_type')['price'].agg(['median', q1, q3, 'min', 'max'])
price_by_property_type

Let's see if we can study the variations in these prices in more detail

In [None]:
plt.figure(figsize = (6, 4))
sns.boxplot(
    data = df,
    x = 'property_type',
    y = 'price',
    order = sorted(list(df['property_type'].unique())),
    hue = 'seller_type'
)
plt.title('Box plot of rent for each property type')
plt.xlabel('Property type')
plt.xticks(rotation = 40)
plt.ylabel('Rent in rupees')
plt.legend(bbox_to_anchor = (1, 1))
plt.show()