# Content Based Filtering

Data source: https://www.kaggle.com/datasets/anthonypino/melbourne-housing-market

The goal of this notebook is to take data about houses in Melbourne and create a Nearest Neighbours model using sklearn to find similar houses based on search information.

### Imports

In [1]:
import pandas as pd
from sklearn.neighbors import NearestNeighbors

#For user input at the bottom of the notebook
import ipywidgets as widgets
from IPython.display import display

### Load data

In [2]:
file = "Melbourne_housing_FULL.csv"
df = pd.read_csv(file)
df.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,68 Studley St,2,h,,SS,Jellis,3/09/2016,2.5,3067.0,...,1.0,1.0,126.0,,,Yarra City Council,-37.8014,144.9958,Northern Metropolitan,4019.0
1,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra City Council,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra City Council,-37.8079,144.9934,Northern Metropolitan,4019.0
3,Abbotsford,18/659 Victoria St,3,u,,VB,Rounds,4/02/2016,2.5,3067.0,...,2.0,1.0,0.0,,,Yarra City Council,-37.8114,145.0116,Northern Metropolitan,4019.0
4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra City Council,-37.8093,144.9944,Northern Metropolitan,4019.0


Check for empty values in the dataset and drop them.

In [3]:
df.isnull().sum()

Suburb               0
Address              0
Rooms                0
Type                 0
Price             7610
Method               0
SellerG              0
Date                 0
Distance             1
Postcode             1
Bedroom2          8217
Bathroom          8226
Car               8728
Landsize         11810
BuildingArea     21115
YearBuilt        19306
CouncilArea          3
Lattitude         7976
Longtitude        7976
Regionname           3
Propertycount        3
dtype: int64

In [4]:
df.dropna(inplace=True)

Define the feature columns and the enter some values for a simulated search for the modle to predict off of.

In [25]:
X = df.loc[: , ["Price","Distance","Bedroom2", "Bathroom","Landsize", "BuildingArea", "YearBuilt"]].values
search_params = [1350000, 2,2,3,220,200,2005]

Fit the model to find the closest three neighbours.

In [26]:
model = NearestNeighbors(n_neighbors=3).fit(X)

In [27]:
results = model.kneighbors([search_params])

Results shows the distances and the location of the nearest neighbors in the dataframe.

In [37]:
results

(array([[11.75797602, 17.0926885 , 46.65715379]]),
 array([[   6, 3002, 6669]], dtype=int64))

The below code will print the most similar property in relation to the search parameters.

In [29]:
df.iloc[6] #TRY 6, 3002, 6669  

Suburb                      Abbotsford
Address                40 Nicholson St
Rooms                                3
Type                                 h
Price                        1350000.0
Method                              VB
SellerG                         Nelson
Date                        12/11/2016
Distance                           2.5
Postcode                        3067.0
Bedroom2                           3.0
Bathroom                           2.0
Car                                2.0
Landsize                         214.0
BuildingArea                     190.0
YearBuilt                       2005.0
CouncilArea         Yarra City Council
Lattitude                     -37.8085
Longtitude                    144.9964
Regionname       Northern Metropolitan
Propertycount                   4019.0
Name: 24, dtype: object

Here we can see the first recommendation the mode makes. The property shown has similar characteristics to the search parameters. Below are some sliders to make for an easier time searching for new parameter. This will also show one of the drawbacks of this system that being a small dataset.

### Time to make a function to allow for easier use

In [38]:
def recommend(price, distance, bedroom, bathroom, landsize, buildingarea , year):
    #Print the best three matches for the search params given
    search = [price, distance, bedroom, bathroom, landsize, buildingarea, year]
    results = model.kneighbors([search])
    print(f"\nRecommendation 1: \n{df.iloc[results[1][0][0]]}")
    print(f"\nRecommendation 2: \n{df.iloc[results[1][0][1]]}")
    print(f"\nRecommendation 3: \n{df.iloc[results[1][0][2]]}")

In [39]:
price = widgets.IntSlider(value=500000, min=df["Price"].min(), max=df["Price"].max(), step=5000, description="Price:")
distance = widgets.FloatSlider(value=2, min=df["Distance"].min(), max=df["Distance"].max(), step=0.1, description="Distance:")
bedroom2 = widgets.Dropdown(options=range(int(df["Bedroom2"].min()),int(df["Bedroom2"].max()), 1), description="Bedrooms:")
bathroom = widgets.Dropdown(options=range(int(df["Bathroom"].min()),int(df["Bathroom"].max()), 1), description="Bathrooms:")
landsize = widgets.FloatSlider(value=20000, min=df["Landsize"].min(), max=df["Landsize"].max(), step=0.1, description="Landsize:")
buildingarea = widgets.FloatSlider(value=2, min=df["BuildingArea"].min(), max=df["BuildingArea"].max(), step=0.1, description="Building Area:")
year = widgets.IntSlider(value=1600, min=df["YearBuilt"].min(), max=df["YearBuilt"].max(), step=1, description="YearBuilt:")

button = widgets.Button(description="Search", disabled=False)

def on_button_click(b):
    print("Searching...\n ")
    recommend(
        int(price.value), 
        float(distance.value), 
        int(bedroom2.value), 
        int(bathroom.value), 
        float(landsize.value), 
        float(buildingarea.value), 
        int(year.value)
    )

button.on_click(on_button_click)

display(price)
display(distance)
display(bedroom2)
display(bathroom)
display(landsize)
display(buildingarea)
display(year)
display(button)



IntSlider(value=500000, description='Price:', max=9000000, min=131000, step=5000)

FloatSlider(value=2.0, description='Distance:', max=47.4)

Dropdown(description='Bedrooms:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), value=0)

Dropdown(description='Bathrooms:', options=(1, 2, 3, 4, 5, 6, 7, 8), value=1)

FloatSlider(value=20000.0, description='Landsize:', max=42800.0)

FloatSlider(value=2.0, description='Building Area:', max=3112.0)

IntSlider(value=1600, description='YearBuilt:', max=2019, min=1196)

Button(description='Search', style=ButtonStyle())

Button(description='Search', style=ButtonStyle())

Searching...
 

Recommendation 1: 
Suburb                           Donvale
Address                    1 Bernarra Ct
Rooms                                  7
Type                                   h
Price                          2705000.0
Method                                 S
SellerG                              Ray
Date                           1/07/2017
Distance                            16.1
Postcode                          3111.0
Bedroom2                             7.0
Bathroom                             2.0
Car                                  4.0
Landsize                          5022.0
BuildingArea                      409.54
YearBuilt                         2004.0
CouncilArea      Manningham City Council
Lattitude                      -37.76598
Longtitude                     145.19329
Regionname          Eastern Metropolitan
Propertycount                     4790.0
Name: 14342, dtype: object

Recommendation 2: 
Suburb                    Carlton North
Address          

Searching...
 

Recommendation 1: 
Suburb                    Carlton North
Address                 967 Drummond St
Rooms                                 4
Type                                  h
Price                         2718000.0
Method                                S
SellerG                             Ray
Date                         11/03/2017
Distance                            3.2
Postcode                         3054.0
Bedroom2                            4.0
Bathroom                            2.0
Car                                 0.0
Landsize                          538.0
BuildingArea                      142.0
YearBuilt                        2015.0
CouncilArea      Melbourne City Council
Lattitude                      -37.7816
Longtitude                     144.9714
Regionname        Northern Metropolitan
Propertycount                    3106.0
Name: 2891, dtype: object

Recommendation 2: 
Suburb                       Williamstown
Address                      16 Hanna

Spending even a small amount of time generating recommendations using these sliders will show that this model would work best on a larger dataset with more densely packed data points to overcome the issue of making recommendations that are not similar to the search parameters although they are the most similar from the points available.

### Conclusion

During this notebook, I loaded housing data for Melbourne suburbs and used to nearest neighbors model to make a content-based filtering algorithm to recommend the 3 most closely related properties based on the search parameters. Lastly, I added some sliders and a function to make for easier use and searches.