## **Recommender system for real estate data**

 We have planned to build 3 recommendation systems.
1. Based on `Facilities`
2. Based on `Price`
3. Based on `Location`

We will assign weight of these three recommendation systems manually respectively and use hybrid approach (combination of all three) to get the recommendations.

In [997]:
# importing the required libraries
import numpy as np
import pandas as pd
import re
import json
import ast

## **Load the data**

In [998]:
input_path = '../data/scrap-data/appartments.csv'
df = pd.read_csv(input_path)
print("shape of  the dataframe:",df.shape)
df.head()

shape of  the dataframe: (247, 7)


Unnamed: 0,PropertyName,PropertySubName,NearbyLocations,LocationAdvantages,Link,PriceDetails,TopFacilities
0,Smartworld One DXP,"2, 3, 4 BHK Apartment in Sector 113, Gurgaon","['Bajghera Road', 'Palam Vihar Halt', 'DPSG Pa...","{'Bajghera Road': '800 Meter', 'Palam Vihar Ha...",https://www.99acres.com/smartworld-one-dxp-sec...,"{'2 BHK': {'building_type': 'Apartment', 'area...","['Swimming Pool', 'Salon', 'Restaurant', 'Spa'..."
1,M3M Crown,"3, 4 BHK Apartment in Sector 111, Gurgaon","['DPSG Palam Vihar Gurugram', 'The NorthCap Un...","{'DPSG Palam Vihar Gurugram': '1.4 Km', 'The N...",https://www.99acres.com/m3m-crown-sector-111-g...,"{'3 BHK': {'building_type': 'Apartment', 'area...","['Bowling Alley', 'Mini Theatre', 'Manicured G..."
2,Adani Brahma Samsara Vilasa,"Land, 3, 4 BHK Independent Floor in Sector 63,...","['AIPL Business Club Sector 62', 'Heritage Xpe...","{'AIPL Business Club Sector 62': '2.7 Km', 'He...",https://www.99acres.com/adani-brahma-samsara-v...,{'3 BHK': {'building_type': 'Independent Floor...,"['Terrace Garden', 'Gazebo', 'Fountain', 'Amph..."
3,Sobha City,"2, 3, 4 BHK Apartment in Sector 108, Gurgaon","['The Shikshiyan School', 'WTC Plaza', 'Luxus ...","{'The Shikshiyan School': '2.9 KM', 'WTC Plaza...",https://www.99acres.com/sobha-city-sector-108-...,"{'2 BHK': {'building_type': 'Apartment', 'area...","['Swimming Pool', 'Volley Ball Court', 'Aerobi..."
4,Signature Global City 93,"2, 3 BHK Independent Floor in Sector 93 Gurgaon","['Pranavananda Int. School', 'DLF Site central...","{'Pranavananda Int. School': '450 m', 'DLF Sit...",https://www.99acres.com/signature-global-city-...,{'2 BHK': {'building_type': 'Independent Floor...,"['Mini Theatre', 'Doctor on Call', 'Concierge ..."


## **Data Cleaning**

### 1. removing garbage rows

In [999]:
# we found that there is a garbage row in the dataframe
df.iloc[22]

PropertyName                PropertyName
PropertySubName          PropertySubName
NearbyLocations          NearbyLocations
LocationAdvantages    LocationAdvantages
Link                                Link
PriceDetails                PriceDetails
TopFacilities              TopFacilities
Name: 22, dtype: object

In [1000]:
# removing the garbage row
df.drop(22,inplace=True)

print("shape of  the dataframe:",df.shape)
df.head()

shape of  the dataframe: (246, 7)


Unnamed: 0,PropertyName,PropertySubName,NearbyLocations,LocationAdvantages,Link,PriceDetails,TopFacilities
0,Smartworld One DXP,"2, 3, 4 BHK Apartment in Sector 113, Gurgaon","['Bajghera Road', 'Palam Vihar Halt', 'DPSG Pa...","{'Bajghera Road': '800 Meter', 'Palam Vihar Ha...",https://www.99acres.com/smartworld-one-dxp-sec...,"{'2 BHK': {'building_type': 'Apartment', 'area...","['Swimming Pool', 'Salon', 'Restaurant', 'Spa'..."
1,M3M Crown,"3, 4 BHK Apartment in Sector 111, Gurgaon","['DPSG Palam Vihar Gurugram', 'The NorthCap Un...","{'DPSG Palam Vihar Gurugram': '1.4 Km', 'The N...",https://www.99acres.com/m3m-crown-sector-111-g...,"{'3 BHK': {'building_type': 'Apartment', 'area...","['Bowling Alley', 'Mini Theatre', 'Manicured G..."
2,Adani Brahma Samsara Vilasa,"Land, 3, 4 BHK Independent Floor in Sector 63,...","['AIPL Business Club Sector 62', 'Heritage Xpe...","{'AIPL Business Club Sector 62': '2.7 Km', 'He...",https://www.99acres.com/adani-brahma-samsara-v...,{'3 BHK': {'building_type': 'Independent Floor...,"['Terrace Garden', 'Gazebo', 'Fountain', 'Amph..."
3,Sobha City,"2, 3, 4 BHK Apartment in Sector 108, Gurgaon","['The Shikshiyan School', 'WTC Plaza', 'Luxus ...","{'The Shikshiyan School': '2.9 KM', 'WTC Plaza...",https://www.99acres.com/sobha-city-sector-108-...,"{'2 BHK': {'building_type': 'Apartment', 'area...","['Swimming Pool', 'Volley Ball Court', 'Aerobi..."
4,Signature Global City 93,"2, 3 BHK Independent Floor in Sector 93 Gurgaon","['Pranavananda Int. School', 'DLF Site central...","{'Pranavananda Int. School': '450 m', 'DLF Sit...",https://www.99acres.com/signature-global-city-...,{'2 BHK': {'building_type': 'Independent Floor...,"['Mini Theatre', 'Doctor on Call', 'Concierge ..."


Observation : 
-  `NearbyLocations` is subset of `LocationAdvantages` column .
- `PropertySubName` is subset of  `PriceDetails` column.
- `Link` column can be used to get the details of the property.

So these we are dropping the subset columns.

### 2. Dropping the subset columns

In [1001]:
df.drop(columns=['NearbyLocations','PropertySubName'],inplace=True)
df.head()

Unnamed: 0,PropertyName,LocationAdvantages,Link,PriceDetails,TopFacilities
0,Smartworld One DXP,"{'Bajghera Road': '800 Meter', 'Palam Vihar Ha...",https://www.99acres.com/smartworld-one-dxp-sec...,"{'2 BHK': {'building_type': 'Apartment', 'area...","['Swimming Pool', 'Salon', 'Restaurant', 'Spa'..."
1,M3M Crown,"{'DPSG Palam Vihar Gurugram': '1.4 Km', 'The N...",https://www.99acres.com/m3m-crown-sector-111-g...,"{'3 BHK': {'building_type': 'Apartment', 'area...","['Bowling Alley', 'Mini Theatre', 'Manicured G..."
2,Adani Brahma Samsara Vilasa,"{'AIPL Business Club Sector 62': '2.7 Km', 'He...",https://www.99acres.com/adani-brahma-samsara-v...,{'3 BHK': {'building_type': 'Independent Floor...,"['Terrace Garden', 'Gazebo', 'Fountain', 'Amph..."
3,Sobha City,"{'The Shikshiyan School': '2.9 KM', 'WTC Plaza...",https://www.99acres.com/sobha-city-sector-108-...,"{'2 BHK': {'building_type': 'Apartment', 'area...","['Swimming Pool', 'Volley Ball Court', 'Aerobi..."
4,Signature Global City 93,"{'Pranavananda Int. School': '450 m', 'DLF Sit...",https://www.99acres.com/signature-global-city-...,{'2 BHK': {'building_type': 'Independent Floor...,"['Mini Theatre', 'Doctor on Call', 'Concierge ..."


---

## **Recommendation System 1 : Based on `Facilities`**

#### Step1 : Pre-processing the `TopFacilities` column

 - converting the string representation of the list into an actual list
 - joins the list of facilities into a single string for each property


 

In [1002]:
df[['PropertyName','TopFacilities']].head()

Unnamed: 0,PropertyName,TopFacilities
0,Smartworld One DXP,"['Swimming Pool', 'Salon', 'Restaurant', 'Spa'..."
1,M3M Crown,"['Bowling Alley', 'Mini Theatre', 'Manicured G..."
2,Adani Brahma Samsara Vilasa,"['Terrace Garden', 'Gazebo', 'Fountain', 'Amph..."
3,Sobha City,"['Swimming Pool', 'Volley Ball Court', 'Aerobi..."
4,Signature Global City 93,"['Mini Theatre', 'Doctor on Call', 'Concierge ..."


In [1003]:
# This function extracts all the strings enclosed in single quotes from the input string
def extract_list(s):
    return re.findall(r"'(.*?)'", s)

# Applying the extract_list function to the 'TopFacilities' column to convert the string representation of the list into an actual list
df['Facilities'] = df['TopFacilities'].apply(extract_list)

# joins the list of facilities into a single string for each property
df['Facilities'] = df['Facilities'].apply(' '.join)


df[['PropertyName','Facilities']].head()

Unnamed: 0,PropertyName,Facilities
0,Smartworld One DXP,Swimming Pool Salon Restaurant Spa Cafeteria S...
1,M3M Crown,Bowling Alley Mini Theatre Manicured Garden Sw...
2,Adani Brahma Samsara Vilasa,Terrace Garden Gazebo Fountain Amphitheatre Pa...
3,Sobha City,Swimming Pool Volley Ball Court Aerobics Centr...
4,Signature Global City 93,Mini Theatre Doctor on Call Concierge Service ...


### Step2 : Vectorizing the `Facilities` column

In [1004]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize the TfidfVectorizer with stop words set to 'english' and ngram range of (1, 2)
tfidf_vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1, 2))

tfidf_matrix = tfidf_vectorizer.fit_transform(df['Facilities'])

In [1005]:
# tfidf_matrix.toarray()[0]

#### Step3 : Computing the cosine similarity matrix

In [1006]:
from sklearn.metrics.pairwise import cosine_similarity

facilities_cosine_similarity = cosine_similarity(tfidf_matrix, tfidf_matrix)

print("shape of the facilities cosine similarity matrix:",facilities_cosine_similarity.shape)
facilities_cosine_similarity

shape of the facilities cosine similarity matrix: (246, 246)


array([[1.        , 0.01095159, 0.        , ..., 0.01183329, 0.08656385,
        0.0110727 ],
       [0.01095159, 1.        , 0.01982121, ..., 0.11904241, 0.01555534,
        0.00963852],
       [0.        , 0.01982121, 1.        , ..., 0.07020502, 0.03820314,
        0.01962826],
       ...,
       [0.01183329, 0.11904241, 0.07020502, ..., 1.        , 0.09825738,
        0.03255851],
       [0.08656385, 0.01555534, 0.03820314, ..., 0.09825738, 1.        ,
        0.06257614],
       [0.0110727 , 0.00963852, 0.01962826, ..., 0.03255851, 0.06257614,
        1.        ]])

**Observation** : 
- The diagonal elements of the matrix are 1 as each property is most similar to itself.
- The off-diagonal elements are the cosine similarity between the properties based on the facilities.
- The higher the cosine similarity, the more similar the properties are.


#### Step 4 : Get the recommendations for a property based on facilities

In [1007]:
def recommend_properties(property_name, cosine_similarity_matrix):
    # Get the index of the property that matches the name
    idx = df.index[df['PropertyName'] == property_name].tolist()[0]

    # Get the pairwise similarity scores with that property
    sim_scores = list(enumerate(cosine_similarity_matrix[idx]))

    # Sort the properties based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar properties
    sim_scores = sim_scores[1:6]

    # Get the property indices
    property_indices = [i[0] for i in sim_scores]
    
    recommendations_df = pd.DataFrame({
        'PropertyName': df['PropertyName'].iloc[property_indices],
        'SimilarityScore': sim_scores
    })

    recommendations_df.sort_values(by='SimilarityScore',ascending=False,inplace=True)
    
    # Return the top 10 most similar properties
    return recommendations_df

In [1008]:
# get the recommendations for a property based on facilities
recommend_properties("DLF The Arbour",facilities_cosine_similarity)

Unnamed: 0,PropertyName,SimilarityScore
217,Yashika 104,"(216, 0.4199606322926784)"
154,India Rashtra,"(153, 0.398954234680194)"
93,JMS The Nation,"(92, 0.4166584649363288)"
64,Ace Palm Floors,"(63, 0.4529382062441955)"
0,Smartworld One DXP,"(0, 0.38885046199432893)"


- We will use this function to recommend the properties based on the `facilities`.

----


## **Recommendation System 2 : Based on `Price`**

#### Step1 : Pre-processing the `PriceDetails` column

**Purpose:**
- The preprocessing step transforms raw PriceDetails data into a structured format, making it suitable for further analysis and building recommendation systems.

- Extracted features include building types, area ranges, and price ranges for different property configurations, which are essential for making accurate and relevant property recommendations.

In [1009]:
df[['PropertyName','PriceDetails']]

Unnamed: 0,PropertyName,PriceDetails
0,Smartworld One DXP,"{'2 BHK': {'building_type': 'Apartment', 'area..."
1,M3M Crown,"{'3 BHK': {'building_type': 'Apartment', 'area..."
2,Adani Brahma Samsara Vilasa,{'3 BHK': {'building_type': 'Independent Floor...
3,Sobha City,"{'2 BHK': {'building_type': 'Apartment', 'area..."
4,Signature Global City 93,{'2 BHK': {'building_type': 'Independent Floor...
...,...,...
242,DLF Princeton Estate,"{'2 BHK': {'building_type': 'Apartment', 'area..."
243,Pyramid Urban Homes 2,"{'1 BHK': {'building_type': 'Apartment', 'area..."
244,Satya The Hermitage,"{'2 BHK': {'building_type': 'Apartment', 'area..."
245,BPTP Spacio,"{'2 BHK': {'building_type': 'Apartment', 'area..."


**Parsing and Extracting Features:**
- A function refined_parse_modified() is defined to parse the PriceDetails column, which contains property details in a string format.

- The function converts the string into a dictionary and extracts key features such as building type, area, and price details for various property configurations (e.g., '1 BHK', '2 BHK').

- It handles ranges in area and price details, converting them into numerical values for easier analysis.

**Generating a New DataFrame:**
- The code iterates over each row in the original DataFrame and applies the parsing function to the PriceDetails column.

- For each row, it constructs a new row with the extracted features and appends it to a list.

- A new DataFrame is created from this list, with PropertyName set as the index.

In [1010]:
# Function to parse and extract the required features from the PriceDetails column
def refined_parse_modified(detail_str):
    try:
        details = json.loads(detail_str.replace("'", "\""))
    except:
        return {}

    extracted = {}
    for bhk, detail in details.items():
        # Extract building type
        extracted[f'building type_{bhk}'] = detail.get('building_type')

        # Parsing area details
        area = detail.get('area', '')
        area_parts = area.split('-')
        if len(area_parts) == 1:
            try:
                value = float(area_parts[0].replace(',', '').replace(' sq.ft.', '').strip())
                extracted[f'area low {bhk}'] = value
                extracted[f'area high {bhk}'] = value
            except:
                extracted[f'area low {bhk}'] = None
                extracted[f'area high {bhk}'] = None
        elif len(area_parts) == 2:
            try:
                extracted[f'area low {bhk}'] = float(area_parts[0].replace(',', '').replace(' sq.ft.', '').strip())
                extracted[f'area high {bhk}'] = float(area_parts[1].replace(',', '').replace(' sq.ft.', '').strip())
            except:
                extracted[f'area low {bhk}'] = None
                extracted[f'area high {bhk}'] = None

        # Parsing price details
        price_range = detail.get('price-range', '')
        price_parts = price_range.split('-')
        if len(price_parts) == 2:
            try:
                extracted[f'price low {bhk}'] = float(price_parts[0].replace('₹', '').replace(' Cr', '').replace(' L', '').strip())
                extracted[f'price high {bhk}'] = float(price_parts[1].replace('₹', '').replace(' Cr', '').replace(' L', '').strip())
                if 'L' in price_parts[0]:
                    extracted[f'price low {bhk}'] /= 100
                if 'L' in price_parts[1]:
                    extracted[f'price high {bhk}'] /= 100
            except:
                extracted[f'price low {bhk}'] = None
                extracted[f'price high {bhk}'] = None

    return extracted

In [1011]:
# Apply the refined parsing and generate the new DataFrame structure
data_refined = []

for _, row in df.iterrows():
    features = refined_parse_modified(row['PriceDetails'])
    
    # Construct a new row for the transformed dataframe
    new_row = {'PropertyName': row['PropertyName']}
    
    # Populate the new row with extracted features
    for config in ['1 BHK', '2 BHK', '3 BHK', '4 BHK', '5 BHK', '6 BHK', '1 RK', 'Land']:
        new_row[f'building type_{config}'] = features.get(f'building type_{config}')
        new_row[f'area low {config}'] = features.get(f'area low {config}')
        new_row[f'area high {config}'] = features.get(f'area high {config}')
        new_row[f'price low {config}'] = features.get(f'price low {config}')
        new_row[f'price high {config}'] = features.get(f'price high {config}')
    
    data_refined.append(new_row)

df_refined = pd.DataFrame(data_refined).set_index('PropertyName')

In [1012]:
df_refined.sample(5)

Unnamed: 0_level_0,building type_1 BHK,area low 1 BHK,area high 1 BHK,price low 1 BHK,price high 1 BHK,building type_2 BHK,area low 2 BHK,area high 2 BHK,price low 2 BHK,price high 2 BHK,...,building type_1 RK,area low 1 RK,area high 1 RK,price low 1 RK,price high 1 RK,building type_Land,area low Land,area high Land,price low Land,price high Land
PropertyName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Antriksh Heights,,,,,,Apartment,1125.0,1450.0,,,...,,,,,,,,,,
Birla Navya,,,,,,Independent Floor,763.81,893.94,1.7,4.48,...,,,,,,,538.2,4335.49,2.69,21.68
M3M Soulitude,,,,,,Independent Floor,1105.0,1105.0,88.0,0.9393,...,,,,,,,,,,
DLF Park Place,,,,,,Apartment,1985.0,1985.0,,,...,,,,,,,,,,
BPTP Park Serene,,,,,,Apartment,1540.0,1610.0,,,...,,,,,,,,,,


In [1013]:
df_refined[['building type_Land']].sample(5)

Unnamed: 0_level_0,building type_Land
PropertyName,Unnamed: 1_level_1
Shree Vardhman Victoria,
JMS Prime Land,
DLF Princeton Estate,
Zara Rossa,
Krrish Florence Estate,


In [1014]:
# Replace empty strings in 'building type_Land' column with 'Land'
df_refined['building type_Land'] = df_refined['building type_Land'].replace({'':'Land'})

In [1015]:
df_refined.head(2)

Unnamed: 0_level_0,building type_1 BHK,area low 1 BHK,area high 1 BHK,price low 1 BHK,price high 1 BHK,building type_2 BHK,area low 2 BHK,area high 2 BHK,price low 2 BHK,price high 2 BHK,...,building type_1 RK,area low 1 RK,area high 1 RK,price low 1 RK,price high 1 RK,building type_Land,area low Land,area high Land,price low Land,price high Land
PropertyName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Smartworld One DXP,,,,,,Apartment,1370.0,1370.0,2.0,2.4,...,,,,,,,,,,
M3M Crown,,,,,,,,,,,...,,,,,,,,,,


**Observation:**

- Some of the columns are categorical variables.
- We need to convert these categorical variables into numerical variables.[one hot encoding]
- We can use the `pd.get_dummies()` function to convert the categorical variables into numerical variables.


In [1016]:
categorical_columns = df_refined.select_dtypes(include=['object']).columns.tolist()
print("Categorical Columns:",categorical_columns)

# One Hot Encoding
ohe_df = pd.get_dummies(df_refined, columns=categorical_columns, drop_first=True)

# filling the NaN values to 0
ohe_df.fillna(0,inplace=True)

ohe_df.head()

Categorical Columns: ['building type_1 BHK', 'building type_2 BHK', 'building type_3 BHK', 'building type_4 BHK', 'building type_5 BHK', 'building type_6 BHK', 'building type_1 RK', 'building type_Land']


Unnamed: 0_level_0,area low 1 BHK,area high 1 BHK,price low 1 BHK,price high 1 BHK,area low 2 BHK,area high 2 BHK,price low 2 BHK,price high 2 BHK,area low 3 BHK,area high 3 BHK,...,building type_2 BHK_Independent Floor,building type_2 BHK_Service Apartment,building type_3 BHK_Independent Floor,building type_3 BHK_Service Apartment,building type_3 BHK_Villa,building type_4 BHK_Independent Floor,building type_4 BHK_Villa,building type_5 BHK_Independent Floor,building type_5 BHK_Villa,building type_6 BHK_Villa
PropertyName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Smartworld One DXP,0.0,0.0,0.0,0.0,1370.0,1370.0,2.0,2.4,1850.0,2050.0,...,False,False,False,False,False,False,False,False,False,False
M3M Crown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1605.0,2170.0,...,False,False,False,False,False,False,False,False,False,False
Adani Brahma Samsara Vilasa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1800.0,3150.0,...,False,False,True,False,False,True,False,False,False,False
Sobha City,0.0,0.0,0.0,0.0,1381.0,1692.0,1.55,3.21,1711.0,2343.0,...,False,False,False,False,False,False,False,False,False,False
Signature Global City 93,0.0,0.0,0.0,0.0,981.0,1118.0,0.9301,1.06,1235.0,1530.0,...,True,False,True,False,False,False,False,False,False,False


#### Step2 : Scaling the data

- Since the data is not in the same scale, we need to scale the data.

In [1017]:
from sklearn.preprocessing import StandardScaler

# Initialize the scaler
scaler = StandardScaler()

# Apply the scaler to the entire dataframe
ohe_df_normalized = pd.DataFrame(scaler.fit_transform(ohe_df), columns=ohe_df.columns, index=ohe_df.index)
ohe_df_normalized.head()


Unnamed: 0_level_0,area low 1 BHK,area high 1 BHK,price low 1 BHK,price high 1 BHK,area low 2 BHK,area high 2 BHK,price low 2 BHK,price high 2 BHK,area low 3 BHK,area high 3 BHK,...,building type_2 BHK_Independent Floor,building type_2 BHK_Service Apartment,building type_3 BHK_Independent Floor,building type_3 BHK_Service Apartment,building type_3 BHK_Villa,building type_4 BHK_Independent Floor,building type_4 BHK_Villa,building type_5 BHK_Independent Floor,building type_5 BHK_Villa,building type_6 BHK_Villa
PropertyName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Smartworld One DXP,-0.252266,-0.169584,-0.105197,-0.082332,1.223499,1.020101,-0.173712,1.158423,0.553787,0.370864,...,-0.28931,-0.063888,-0.372678,-0.063888,-0.171139,-0.254824,-0.236208,-0.111111,-0.216353,-0.063888
M3M Crown,-0.252266,-0.169584,-0.105197,-0.082332,-0.893541,-0.89666,-0.283546,-0.387986,0.293086,0.472749,...,-0.28931,-0.063888,-0.372678,-0.063888,-0.171139,-0.254824,-0.236208,-0.111111,-0.216353,-0.063888
Adani Brahma Samsara Vilasa,-0.252266,-0.169584,-0.105197,-0.082332,-0.893541,-0.89666,-0.283546,-0.387986,0.500583,1.304803,...,-0.28931,-0.063888,2.683282,-0.063888,-0.171139,3.924283,-0.236208,-0.111111,-0.216353,-0.063888
Sobha City,-0.252266,-0.169584,-0.105197,-0.082332,1.240497,1.47061,-0.198425,1.680336,0.405879,0.619632,...,-0.28931,-0.063888,-0.372678,-0.063888,-0.171139,-0.254824,-0.236208,-0.111111,-0.216353,-0.063888
Signature Global City 93,-0.252266,-0.169584,-0.105197,-0.082332,0.622383,0.667529,-0.232468,0.295011,-0.100626,-0.070634,...,3.456497,-0.063888,2.683282,-0.063888,-0.171139,-0.254824,-0.236208,-0.111111,-0.216353,-0.063888


#### Step3 : Compute the cosine similarity matrix
- We can use the `cosine_similarity()` function from the `sklearn.metrics.pairwise` module to compute the cosine similarity matrix.

In [1018]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute the cosine similarity matrix
price_cosine_similarity = cosine_similarity(ohe_df_normalized)

print("Price Cosine Similarity Matrix Shape:",price_cosine_similarity.shape)


Price Cosine Similarity Matrix Shape: (246, 246)


In [1019]:
# price_cosine_similarity

#### Step4 : Get the recommendations for a property based on `price`

In [1020]:
def recommend_properties_with_scores(property_name, cosine_sim_matrix=price_cosine_similarity):
    
    # Get the similarity scores for the property using its name as the index
    sim_scores = list(enumerate(cosine_sim_matrix[ohe_df_normalized.index.get_loc(property_name)]))
    
    # Sort properties based on the similarity scores
    sorted_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the scores of the 10 most similar properties
    top_indices = [i[0] for i in sorted_scores[1:11]]
    top_scores = [i[1] for i in sorted_scores[1:11]]
    
    # Retrieve the names of the top properties using the indices
    top_properties = ohe_df_normalized.index[top_indices].tolist()
    
    # Create a dataframe with the results
    recommendations_df = pd.DataFrame({
        'PropertyName': top_properties,
        'SimilarityScore': top_scores
    })
    recommendations_df.sort_values(by='SimilarityScore',ascending=False,inplace=True)

    return recommendations_df

# Test the recommender function using a property name
recommend_properties_with_scores('M3M Golf Hills')

Unnamed: 0,PropertyName,SimilarityScore
0,AIPL The Peaceful Homes,0.955462
1,Smartworld One DXP,0.95467
2,Unitech Escape,0.953092
3,M3M Capital,0.951156
4,BPTP Terra,0.943128
5,Sobha City,0.928748
6,Unitech Harmony,0.925164
7,Corona Optus,0.919231
8,Puri Emerald Bay,0.917345
9,Ireo Skyon,0.915991


---

## **Recommendation System 3 : Based on `Location`**

#### Step1 : pre-processing the `LocationAdvantages` column

- Extract the distances for each location

In [1021]:
df[['PropertyName','LocationAdvantages','Link']]

df1 = df[['PropertyName','Link']]
df1

Unnamed: 0,PropertyName,Link
0,Smartworld One DXP,https://www.99acres.com/smartworld-one-dxp-sec...
1,M3M Crown,https://www.99acres.com/m3m-crown-sector-111-g...
2,Adani Brahma Samsara Vilasa,https://www.99acres.com/adani-brahma-samsara-v...
3,Sobha City,https://www.99acres.com/sobha-city-sector-108-...
4,Signature Global City 93,https://www.99acres.com/signature-global-city-...
...,...,...
242,DLF Princeton Estate,https://www.99acres.com/dlf-princeton-estate-d...
243,Pyramid Urban Homes 2,https://www.99acres.com/pyramid-urban-homes-2-...
244,Satya The Hermitage,https://www.99acres.com/satya-the-hermitage-se...
245,BPTP Spacio,https://www.99acres.com/bptp-spacio-sector-37d...


In [1022]:
# Function to convert the distance to meters
def distance_to_meters(distance_str):
    try:
        # Check if the distance is in kilometers
        if 'Km' in distance_str or 'KM' in distance_str:
            return float(distance_str.split()[0]) * 1000
        # Check if the distance is in meters
        elif 'Meter' in distance_str or 'meter' in distance_str:
            return float(distance_str.split()[0])
        else:
            return None
    except:
        return None

In [1023]:
# Extract distances for each location
location_matrix = {}
for index, row in df.iterrows():
    distances = {}
    for location, distance in ast.literal_eval(row['LocationAdvantages']).items():
        distances[location] = distance_to_meters(distance)
    location_matrix[index] = distances

# Convert the dictionary to a dataframe
location_df = pd.DataFrame.from_dict(location_matrix, orient='index')

# Display the first few rows
location_df.head()

Unnamed: 0,Bajghera Road,Palam Vihar Halt,DPSG Palam Vihar,Park Hospital,Gurgaon Railway Station,The NorthCap University,Dwarka Expy,Hyatt Place Gurgaon Udyog Vihar,"Dwarka Sector 21, Metro Station",Pacific D21 Mall,...,MCC Cricket Ground Dhankot,The Shri Ram School Aravali,Taj City Centre Gurugram,Minda Industries Corporate Office,"Rampura Flyover, Naurangpur Rd",Manesar toll plaza - Kherki Daula,"Imt Manesar, Gurugram",Holiday Inn,Sector 84 Road,Skyview Corporate Park
0,800.0,2500.0,3100.0,3100.0,4900.0,5400.0,1200.0,7700.0,7200.0,7400.0,...,,,,,,,,,,
25,550.0,,,,,6700.0,3800.0,,,7500.0,...,,,,,,,,,,
37,5300.0,,,,2500.0,8800.0,,,,,...,,,,,,,,,,
69,1500.0,,,,6500.0,6700.0,5100.0,,,8200.0,...,,,,,,,,,,
9,,,,5500.0,,,,,,,...,,,,,,,,,,


In [1024]:
len(location_df.columns)

1070

---

- We will find and merge columns with similar names based on a similarity threshold. Adjust the threshold parameter as needed to control the sensitivity of the similarity matching.

In [1025]:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import pandas as pd

# Function to find similar columns
def find_similar_columns(columns, threshold=95):
    similar_columns = {}
    for col in columns:
        matches = process.extract(col, columns, limit=len(columns))
        for match in matches:
            if match[1] >= threshold and match[0] != col:
                if col not in similar_columns:
                    similar_columns[col] = []
                similar_columns[col].append(match[0])
    return similar_columns

# Find similar columns
columns = location_df.columns.tolist()
similar_columns = find_similar_columns(columns)

# Print shape before merging
print("Shape before merging:", location_df.shape)

# Merge similar columns
for key, values in similar_columns.items():
    if key in location_df.columns:
        for value in values:
            if value in location_df.columns:
                location_df[key] = location_df[[key, value]].min(axis=1)
                location_df.drop(columns=[value], inplace=True)

# Print shape after merging
print("Shape after merging:", location_df.shape)

# Optionally, save the similar columns mapping
import pickle
with open('similar_columns.pkl', 'wb') as f:
    pickle.dump(similar_columns, f)

Shape before merging: (246, 1070)
Shape after merging: (246, 901)


---

In [1026]:
# Fill NaN values with the maximum value in the whole dataframe

max_value = location_df.max().max()
print("max_value in location_df:",max_value)

location_df.fillna(max_value, inplace=True)
location_df.head()

max_value in location_df: 54500.0


Unnamed: 0,Bajghera Road,Palam Vihar Halt,DPSG Palam Vihar,Park Hospital,Gurgaon Railway Station,The NorthCap University,Dwarka Expy,Hyatt Place Gurgaon Udyog Vihar,"Dwarka Sector 21, Metro Station",Pacific D21 Mall,...,MCC Cricket Ground Dhankot,The Shri Ram School Aravali,Taj City Centre Gurugram,Minda Industries Corporate Office,"Rampura Flyover, Naurangpur Rd",Manesar toll plaza - Kherki Daula,"Imt Manesar, Gurugram",Holiday Inn,Sector 84 Road,Skyview Corporate Park
0,800.0,2500.0,3100.0,3100.0,4900.0,5400.0,1200.0,7700.0,7200.0,7400.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
25,550.0,54500.0,54500.0,54500.0,54500.0,6700.0,3800.0,54500.0,54500.0,7500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
37,5300.0,54500.0,54500.0,54500.0,2500.0,8800.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
69,1500.0,54500.0,54500.0,54500.0,6500.0,6700.0,5100.0,54500.0,8100.0,8200.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
9,54500.0,54500.0,54500.0,5500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0


---

In [1027]:
location_df.index = df.PropertyName
location_df.head()

Unnamed: 0_level_0,Bajghera Road,Palam Vihar Halt,DPSG Palam Vihar,Park Hospital,Gurgaon Railway Station,The NorthCap University,Dwarka Expy,Hyatt Place Gurgaon Udyog Vihar,"Dwarka Sector 21, Metro Station",Pacific D21 Mall,...,MCC Cricket Ground Dhankot,The Shri Ram School Aravali,Taj City Centre Gurugram,Minda Industries Corporate Office,"Rampura Flyover, Naurangpur Rd",Manesar toll plaza - Kherki Daula,"Imt Manesar, Gurugram",Holiday Inn,Sector 84 Road,Skyview Corporate Park
PropertyName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Smartworld One DXP,800.0,2500.0,3100.0,3100.0,4900.0,5400.0,1200.0,7700.0,7200.0,7400.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
M3M Crown,550.0,54500.0,54500.0,54500.0,54500.0,6700.0,3800.0,54500.0,54500.0,7500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
Adani Brahma Samsara Vilasa,5300.0,54500.0,54500.0,54500.0,2500.0,8800.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
Sobha City,1500.0,54500.0,54500.0,54500.0,6500.0,6700.0,5100.0,54500.0,8100.0,8200.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
Signature Global City 93,54500.0,54500.0,54500.0,5500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0


In [1028]:
# Assuming location_df and df1 are already defined
location_df = location_df.merge(df1, on='PropertyName', how='left')
location_df

Unnamed: 0,PropertyName,Bajghera Road,Palam Vihar Halt,DPSG Palam Vihar,Park Hospital,Gurgaon Railway Station,The NorthCap University,Dwarka Expy,Hyatt Place Gurgaon Udyog Vihar,"Dwarka Sector 21, Metro Station",...,The Shri Ram School Aravali,Taj City Centre Gurugram,Minda Industries Corporate Office,"Rampura Flyover, Naurangpur Rd",Manesar toll plaza - Kherki Daula,"Imt Manesar, Gurugram",Holiday Inn,Sector 84 Road,Skyview Corporate Park,Link
0,Smartworld One DXP,800.0,2500.0,3100.0,3100.0,4900.0,5400.0,1200.0,7700.0,7200.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/smartworld-one-dxp-sec...
1,M3M Crown,550.0,54500.0,54500.0,54500.0,54500.0,6700.0,3800.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/m3m-crown-sector-111-g...
2,Adani Brahma Samsara Vilasa,5300.0,54500.0,54500.0,54500.0,2500.0,8800.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/adani-brahma-samsara-v...
3,Sobha City,1500.0,54500.0,54500.0,54500.0,6500.0,6700.0,5100.0,54500.0,8100.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/sobha-city-sector-108-...
4,Signature Global City 93,54500.0,54500.0,54500.0,5500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/signature-global-city-...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,DLF Princeton Estate,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/dlf-princeton-estate-d...
242,Pyramid Urban Homes 2,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/pyramid-urban-homes-2-...
243,Satya The Hermitage,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/satya-the-hermitage-se...
244,BPTP Spacio,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/bptp-spacio-sector-37d...


In [1029]:
# freezing this location_df, it will be used in recommendation system customization

location_df.to_pickle("location_df.pkl")
location_df.to_pickle("../backend/models/location_df.pkl")

#### Step2 : Scaling the data

In [1031]:
# load location_df
location_df = pd.read_pickle("location_df.pkl")

In [1032]:
location_df.reset_index(drop=True, inplace=True)
location_df.drop(columns=['Link'],inplace=True)



In [1035]:
location_df.drop(columns=['PropertyName'],inplace=True)

In [1036]:
location_df

Unnamed: 0,Bajghera Road,Palam Vihar Halt,DPSG Palam Vihar,Park Hospital,Gurgaon Railway Station,The NorthCap University,Dwarka Expy,Hyatt Place Gurgaon Udyog Vihar,"Dwarka Sector 21, Metro Station",Pacific D21 Mall,...,MCC Cricket Ground Dhankot,The Shri Ram School Aravali,Taj City Centre Gurugram,Minda Industries Corporate Office,"Rampura Flyover, Naurangpur Rd",Manesar toll plaza - Kherki Daula,"Imt Manesar, Gurugram",Holiday Inn,Sector 84 Road,Skyview Corporate Park
0,800.0,2500.0,3100.0,3100.0,4900.0,5400.0,1200.0,7700.0,7200.0,7400.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
1,550.0,54500.0,54500.0,54500.0,54500.0,6700.0,3800.0,54500.0,54500.0,7500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
2,5300.0,54500.0,54500.0,54500.0,2500.0,8800.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
3,1500.0,54500.0,54500.0,54500.0,6500.0,6700.0,5100.0,54500.0,8100.0,8200.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
4,54500.0,54500.0,54500.0,5500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
242,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
243,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0
244,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0


In [1037]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()


# Apply the scaler to the entire dataframe
location_df_normalized = pd.DataFrame(scaler.fit_transform(location_df), columns=location_df.columns, index=location_df.index)

print("location_df_normalized:",location_df_normalized.shape)
location_df_normalized.head()

location_df_normalized: (246, 901)


Unnamed: 0,Bajghera Road,Palam Vihar Halt,DPSG Palam Vihar,Park Hospital,Gurgaon Railway Station,The NorthCap University,Dwarka Expy,Hyatt Place Gurgaon Udyog Vihar,"Dwarka Sector 21, Metro Station",Pacific D21 Mall,...,MCC Cricket Ground Dhankot,The Shri Ram School Aravali,Taj City Centre Gurugram,Minda Industries Corporate Office,"Rampura Flyover, Naurangpur Rd",Manesar toll plaza - Kherki Daula,"Imt Manesar, Gurugram",Holiday Inn,Sector 84 Road,Skyview Corporate Park
0,-7.95929,-15.652476,-15.652476,-3.148555,-2.902773,-3.145418,-3.724986,-10.240056,-5.155383,-6.02129,...,0.0,0.063888,0.063888,0.063888,0.063888,0.063888,0.063888,0.0,0.063888,0.063888
1,-7.996942,0.063888,0.063888,0.328288,0.376078,-3.05325,-3.529577,0.090313,0.205673,-6.008143,...,0.0,0.063888,0.063888,0.063888,0.063888,0.063888,0.063888,0.0,0.063888,0.063888
2,-7.281544,0.063888,0.063888,0.328288,-3.061427,-2.904363,0.280892,0.090313,0.205673,0.171075,...,0.0,0.063888,0.063888,0.063888,0.063888,0.063888,0.063888,0.0,0.063888,0.063888
3,-7.853863,0.063888,0.063888,0.328288,-2.797004,-3.05325,-3.431873,0.090313,-5.053375,-5.916112,...,0.0,0.063888,0.063888,0.063888,0.063888,0.063888,0.063888,0.0,0.063888,0.063888
4,0.128478,0.063888,0.063888,-2.986212,0.376078,0.335702,0.280892,0.090313,0.205673,0.171075,...,0.0,0.063888,0.063888,0.063888,0.063888,0.063888,0.063888,0.0,0.063888,0.063888


#### Step3 : Compute the cosine similarity matrix

In [1038]:
location_cosine_similarity = cosine_similarity(location_df_normalized)

print("location_cosine_similarity:",location_cosine_similarity.shape)

location_cosine_similarity: (246, 246)


In [None]:
location_cosine_similarity

array([[ 1.        ,  0.15764489,  0.11576865, ..., -0.01559374,
        -0.07070775, -0.07070775],
       [ 0.15764489,  1.        ,  0.18226733, ..., -0.01277955,
        -0.02025407, -0.02025407],
       [ 0.11576865,  0.18226733,  1.        , ..., -0.01647841,
        -0.02681601, -0.02681601],
       ...,
       [-0.01559374, -0.01277955, -0.01647841, ...,  1.        ,
         0.05181754,  0.05181754],
       [-0.07070775, -0.02025407, -0.02681601, ...,  0.05181754,
         1.        ,  1.        ],
       [-0.07070775, -0.02025407, -0.02681601, ...,  0.05181754,
         1.        ,  1.        ]])

#### Step4 : Get the recommendations for a property based on `location`

In [1040]:
def get_location_based_recommendations(property_name, top_n=10):
    # Get the index of the property that matches the property_name
    idx = location_df.index.get_loc(property_name)
    
    # Get the pairwise similarity scores of all properties with that property
    sim_scores = list(enumerate(location_cosine_similarity[idx]))
    
    # Sort the properties based on the similarity scores (increasing distance)
    sim_scores = sorted(sim_scores, key=lambda x: x[1])
    
    # Get the scores of the top_n most similar properties
    sim_scores = sim_scores[1:top_n+1]
    
    # Get the property indices
    property_indices = [i[0] for i in sim_scores]
    
    # Return the top_n most similar properties
    return location_df.iloc[property_indices]

# Example usage
get_location_based_recommendations('Smartworld One DXP', top_n=5)


KeyError: 'Smartworld One DXP'

----

## **Final Recommendation System**

- We will use a weighted average of the three similarity matrices to get the final recommendations.
- The weights are manually set to 30, 20, and 8 respectively.
- The weights can be adjusted based on the importance of each similarity matrix.

In [1044]:
def recommend_properties_with_scores(property_name,cosine_sim_matrix,top_n=5):

    # Get the similarity scores for the property using its name as the index
    sim_scores = list(enumerate(cosine_sim_matrix[location_df_normalized.index.get_loc(property_name)]))
    
    # Sort properties based on the similarity scores
    sorted_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the indices and scores of the top_n most similar properties
    top_indices = [i[0] for i in sorted_scores[1:top_n+1]]
    top_scores = [i[1] for i in sorted_scores[1:top_n+1]]
    
    # Retrieve the names of the top properties using the indices
    top_properties = location_df_normalized.index[top_indices].tolist()
    
    # Create a dataframe with the results
    recommendations_df = pd.DataFrame({
        'PropertyName': top_properties,
        'SimilarityScore': top_scores
    })
    
    return recommendations_df


In [None]:

cosine_sim_matrix = 30*location_cosine_similarity + 20*price_cosine_similarity + 8*facilities_cosine_similarity

# Test the recommender function using a property name
recommend_properties_with_scores('Ireo Victory Valley',cosine_sim_matrix, top_n =5)


In [1042]:
# Save the combined matrix
np.save('../backend/models/cosine_sim_matrix.npy', cosine_sim_matrix)

**End**

---

## property API

In [1043]:
# load the location_df
location_df = pd.read_pickle('location_df.pkl')
print("Loaded location_df:", location_df.shape)
location_df.head()

Loaded location_df: (246, 903)


Unnamed: 0,PropertyName,Bajghera Road,Palam Vihar Halt,DPSG Palam Vihar,Park Hospital,Gurgaon Railway Station,The NorthCap University,Dwarka Expy,Hyatt Place Gurgaon Udyog Vihar,"Dwarka Sector 21, Metro Station",...,The Shri Ram School Aravali,Taj City Centre Gurugram,Minda Industries Corporate Office,"Rampura Flyover, Naurangpur Rd",Manesar toll plaza - Kherki Daula,"Imt Manesar, Gurugram",Holiday Inn,Sector 84 Road,Skyview Corporate Park,Link
0,Smartworld One DXP,800.0,2500.0,3100.0,3100.0,4900.0,5400.0,1200.0,7700.0,7200.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/smartworld-one-dxp-sec...
1,M3M Crown,550.0,54500.0,54500.0,54500.0,54500.0,6700.0,3800.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/m3m-crown-sector-111-g...
2,Adani Brahma Samsara Vilasa,5300.0,54500.0,54500.0,54500.0,2500.0,8800.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/adani-brahma-samsara-v...
3,Sobha City,1500.0,54500.0,54500.0,54500.0,6500.0,6700.0,5100.0,54500.0,8100.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/sobha-city-sector-108-...
4,Signature Global City 93,54500.0,54500.0,54500.0,5500.0,54500.0,54500.0,54500.0,54500.0,54500.0,...,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,54500.0,https://www.99acres.com/signature-global-city-...


In [None]:
import json

with open('location_df_columns.json', 'w') as f:
    json.dump(location_df.columns.to_list(), f)

---

## Recommendation API