RentHop has a feature that shows when the rent of a certain listing is much cheaper than others with a similar number of bathrooms and bedrooms in the area. Can we use KNearestRegression to determine what the predicted price of a listing would be and take the ratio of the actual price versus the predicted price to make a new feature? Let's see.

First we'll load up the training set and train KNeighborsRegressor on other listing with a similar number of bedrooms, bathrooms, latitude, and longitude.

In [None]:
import pandas as pd
with open("../input/train.json") as train_json:
    raw_train = pd.read_json(train_json.read()).reset_index()
    
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor(n_neighbors=300)
price_df = pd.concat([raw_train['bedrooms'],raw_train['bathrooms'],raw_train['latitude'],raw_train['longitude'],raw_train['price']], axis=1)
model.fit(price_df.drop(['price'], axis=1), price_df['price'])

Now, let's look at an example of the n nearest neighbors.

In [None]:
print(model.kneighbors(price_df.drop(['price'], axis=1).loc[2].reshape(1,-1), n_neighbors=300))

In [None]:
print(price_df.drop(['price'], axis=1).loc[2])
print(price_df.drop(['price'], axis=1).loc[311])

The two points are basically two blocks away. It's pretty clear there could be some useful data for us.

In [None]:
pred_price = model.predict(price_df.drop(['price'], axis=1))

price_df['predicted_price'] = pd.DataFrame(pred_price, columns=['predicted_price'])


price_df['pred_price_ratio'] = price_df['price'] / price_df['predicted_price']

price_df['interest_level'] = raw_train['interest_level']

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
new_price_df = price_df[price_df['pred_price_ratio'] < 4]
%matplotlib inline
plt.figure(figsize=(10,20))
sns.boxplot(x='interest_level', y='pred_price_ratio', data=new_price_df)
plt.title("Interest Level and Price / Predicted Price Ratio", fontsize=32)
plt.show()

Generally, listings with a lower price to predicted price ratio have a higher interest.

Let me know if this was helpful to you, it should be pretty simple to implement at the feature extraction step.

Thanks to all of the awesome notebooks people have posted on here, I've learned a lot!