# Housing Price Anomalies

![](https://www.loscien.org/wp-content/uploads/2018/10/workforce-housing.png)

From the dataset description:

"The dataset may contain erroneous data due to input errors on services, as well as outliers, and so on."

Let's take a look at some of the data, and see where the errors / anomalies lie...

### Libraries and Data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('../input/russia-real-estate-20182021/all_v2.csv')
df.head()

### Statistical Distributions

#### Price

In [None]:
fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(14,10))

# Price
box_parts = axs[0].boxplot(df.price, 0, 'o', 0)
axs[0].set_title('Price of Units, in Billions of Ruples',fontsize=18)

box_parts = axs[1].boxplot(df.price,0, 'o', 0, showfliers = False)
axs[1].set_title('Price of Units without Outliers, in Hundred Thousand Ruples',fontsize=18)

for ax in axs:
    ax.yaxis.grid(False)
    ax.spines[['right', 'top']].set_visible(False)

print(df.price.describe())

Being self-reported data due to a lack of a central reporting tool or official agency (as noted by the uploader of the data, Mr. Daniilak), there seem to be some type of discrepency - numerous houses have negative prices, some being over negative 2 billion ruples. 

What could this mean?
* Could it be a sale that reversed?
* Could it be entering a negative symbol on accident?
* Are the houses in question so terrible that the have to pay someone to live in it?
* Is it a laundering scheme / corruption?
* Are individuals just trying to create noise and cause data anomalies in the system?

![](https://ak.picdn.net/shutterstock/videos/1044310561/thumb/1.jpg)


Once outliers are adjusted for, we no longer see any negative numbers, but the lower extreme does sit at or near 0. What could 0 rubles for a house mean? Maybe:
* An individual inherited a house? 
* It reflects some type of welfare program / government provided housing?
* Maybe it was sold via auction?
* Maybe the buyer and seller refused to list the real selling price for privacy reasons?

#### Variables and Price

In [None]:
rhstat = df[['price', 'level', 'rooms', 'area', 'kitchen_area']]
rhstat.describe()

In [None]:
fig, axs = plt.subplots(nrows=3, ncols=2, figsize=(14,18))

# Price
box_parts = axs[0,0].boxplot(rhstat.price, 0, 'o', 0)
axs[0,0].set_title('Price of Units', fontsize=18)
axs[0,0].spines[['right', 'top']].set_visible(False)

box_parts = axs[0,1].boxplot(rhstat.price, 0, 'o', 0, showfliers = False)
axs[0,1].set_title('Price of Units, without Outliers', fontsize=18)
axs[0,1].spines[['right', 'top']].set_visible(False)

# Areas
box_parts = axs[1,0].boxplot(rhstat[['area', 'kitchen_area']], 0, 'o', 0)
axs[1,0].set_title('Area and Kitchen Area of Unit', fontsize=18)
axs[1,0].spines[['right', 'top']].set_visible(False)

box_parts = axs[1,1].boxplot(rhstat[['area', 'kitchen_area']], 0, 'o', 0, showfliers = False)
axs[1,1].set_title('Area and Kitchen Area, without outliers', fontsize=18)
axs[1,1].spines[['right', 'top']].set_visible(False)

# Rooms and Level    
box_parts = axs[2,0].boxplot(rhstat[['rooms', 'level']], 0, 'o', 0)
axs[2,0].set_title('Number of Rooms and Floor Level', fontsize=18)
axs[2,0].spines[['right', 'top']].set_visible(False)

box_parts = axs[2,1].boxplot(rhstat[['rooms', 'level']], 0, 'o', 0, showfliers = False)
axs[2,1].set_title('Rooms and Floor Level, without Outliers', fontsize=18)
axs[2,1].spines[['right', 'top']].set_visible(False)

plt.show()

The distribution of other variables is much less questionable than that of price, but a few questions still come up:
* The uploader of the data mentioned "-1" room is a studio apartment; somehow, there are units with fewer than that. Would this signify an error, or perhaps just a room, rather than a whole apartment?
* The minimum unit area is .07 meters squared; the minimum kitchen area is .01 meters squared. Was someone sold a doll house? Or is it an apartment for non-humans?

![](https://i.pinimg.com/originals/d8/36/59/d83659a03c8c13632cb5c512eeded4c7.jpg)