# property_name: The name or title of the property being listed for sale or rent.

# link: A URL or hyperlink to the property's listing page, often leading to more detailed information or media (images, videos) about the property.

# society: The name of the residential complex or society where the property is located.

# price: The asking price of the property, typically expressed in the local currency (e.g., INR for Indian listings).

# rate: The rate per unit area (like square feet or square meter), giving the price per unit area of the property.

# area: The total size of the property, usually in square feet or square meters.

# areaWithType: The area of the property with its type, such as "1,200 sq.ft" or "2BHK", where the unit of measurement and property type might be included.

# bedRoom: The number of bedrooms in the property.

# bathroom: The number of bathrooms in the property.

# balcony: Indicates if the property has a balcony or not, and might provide additional details like the number of balconies or their sizes.

# additionalRoom: Additional rooms in the property, such as a study, servant room, or utility room.

# address: The physical address of the property, providing details about its location (e.g., city, street name, zip code).

# noOfFloor: The total number of floors in the building or complex where the property is located.

# facing: The direction the property faces, such as North, South, East, or West, which can influence sunlight, ventilation, and view.

# agePossession: The age of the property (in years) and/or the date when the property will be available for possession (e.g., "Ready to move", "Under construction").

# nearbyLocations: Points of interest or amenities located near the property, such as schools, hospitals, shopping centers, or transportation hubs.

# description: A detailed description or narrative about the property, highlighting its features, benefits, or unique selling points.

# furnishDetails: Information about the furnishing status of the property, such as whether it is furnished, semi-furnished, or unfurnished.

# features: Specific features or amenities available in the property, like gym, swimming pool, parking, security, etc.

# rating: The rating given to the property based on reviews or feedback from previous tenants or buyers (e.g., out of 5 stars).

# property_id: A unique identifier assigned to each property listing for tracking and referencing.

In [None]:
## import library
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [None]:
## load data
df=pd.read_csv('/content/independent-house -excel-cleaned.csv')

In [None]:
# shape
df.shape

(1036, 21)

In [None]:
## top 5 rows
df.head()

Unnamed: 0,property_name,link,society,price,rate,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,noOfFloor,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating,property_id
0,5 Bedroom House for sale in Sector 70A Gurgaon,https://www.99acres.com/5-bhk-bedroom-independ...,Bptp Visionnaire,5.25 Crore,"₹ 20,115/sq.ft.",(242 sq.m.) Plot Area,Plot area 290(242.48 sq.m.),5 Bedrooms,4 Bathrooms,3+ Balconies,Servant Room,"29b, Sector 70A Gurgaon, Gurgaon, Haryana",3 Floors,North-East,0 to 1 Year Old,"['Good Earth City Center 2', 'Kunskapsskolan I...",Do you wish to buy an independent house in sec...,"['1 Wardrobe', '1 Fan', '1 Exhaust Fan', '1 Ge...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment5 out of 5', 'Lifestyle4 out of 5...",K70037724
1,5 Bedroom House for sale in Sector 21A Faridabad,https://www.99acres.com/5-bhk-bedroom-independ...,,5.7 Crore,"₹ 105,751/sq.ft.",(50 sq.m.) Plot Area,Plot area 539(50.07 sq.m.),5 Bedrooms,4 Bathrooms,2 Balconies,"Store Room,Pooja Room,Servant Room","Sector 21A Faridabad, Gurgaon, Haryana",2 Floors,,5 to 10 Year Old,,"Hi, we have an independent house/villa availab...","['1 Water Purifier', '5 Fan', '1 Exhaust Fan',...","['Private Garden / Terrace', 'Park', 'Visitor ...",,E69288322
2,10 Bedroom House for sale in Sushant Lok Phase 1,https://www.99acres.com/10-bhk-bedroom-indepen...,,2.1 Crore,"₹ 38,251/sq.ft.",(51 sq.m.) Plot Area,Plot area 61(51 sq.m.),10 Bedrooms,10 Bathrooms,3+ Balconies,Servant Room,"Sushant Lok Phase 1, Gurgaon, Haryana",5 Floors,West,0 to 1 Year Old,"['Sector 42-43 metro station', 'Huda city cent...","Monthly rental income is rs1,40,000/- Best opt...","['10 Bed', '3 Fan', '10 Geyser', '2 Light', 'N...","['Maintenance Staff', 'Water Storage', 'Visito...","['Environment5 out of 5', 'Lifestyle5 out of 5...",F69536898
3,21 Bedroom House for sale in Sector 54 Gurgaon,https://www.99acres.com/21-bhk-bedroom-indepen...,,5 Crore,"₹ 43,066/sq.ft.",(108 sq.m.) Plot Area,Plot area 129(107.86 sq.m.),21 Bedrooms,21 Bathrooms,3+ Balconies,Servant Room,"Sector 54 Gurgaon, Gurgaon, Haryana",5 Floors,North,1 to 5 Year Old,"['Sector 53-54 metro station', 'Sector 54 chow...","129 sq yd plot size. 5 floors built up , fully...","['1 Water Purifier', '21 Fan', '1 Fridge', '1 ...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment4 out of 5', 'Lifestyle5 out of 5...",R69483164
4,12 Bedroom House for sale in Sushant Lok Phase 1,https://www.99acres.com/12-bhk-bedroom-indepen...,,3 Crore,"₹ 53,763/sq.ft.",(52 sq.m.) Plot Area,Plot area 62(51.84 sq.m.),12 Bedrooms,12 Bathrooms,3+ Balconies,Others,"1228, Sushant Lok Phase 1, Gurgaon, Haryana",5 Floors,West,Within 6 months,"['Sector 42-43 metro station', 'Huda city cent...",Best for investment purpose. Fully furnished b...,"['1 Water Purifier', '1 Fridge', '12 Fan', '1 ...","['Maintenance Staff', 'Water Storage', 'Visito...","['Environment5 out of 5', 'Lifestyle5 out of 5...",M69381272


In [None]:
## information about data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1036 entries, 0 to 1035
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   property_name    1036 non-null   object
 1   link             1036 non-null   object
 2   society          452 non-null    object
 3   price            967 non-null    object
 4   rate             998 non-null    object
 5   area             1036 non-null   object
 6   areaWithType     986 non-null    object
 7   bedRoom          986 non-null    object
 8   bathroom         986 non-null    object
 9   balcony          986 non-null    object
 10  additionalRoom   589 non-null    object
 11  address          1030 non-null   object
 12  noOfFloor        966 non-null    object
 13  facing           674 non-null    object
 14  agePossession    986 non-null    object
 15  nearbyLocations  912 non-null    object
 16  description      1035 non-null   object
 17  furnishDetails   742 non-null    

In [None]:
## drop irrelevent columns
df=df.drop(columns=['link','property_id'],axis=1)
df.shape

(1036, 19)

In [None]:
## lets add a columns property type
df.insert(loc=1,column='Property_type',value='house')

In [None]:
df.head()

Unnamed: 0,property_name,Property_type,society,price,rate,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,noOfFloor,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,5 Bedroom House for sale in Sector 70A Gurgaon,house,Bptp Visionnaire,5.25 Crore,"₹ 20,115/sq.ft.",(242 sq.m.) Plot Area,Plot area 290(242.48 sq.m.),5 Bedrooms,4 Bathrooms,3+ Balconies,Servant Room,"29b, Sector 70A Gurgaon, Gurgaon, Haryana",3 Floors,North-East,0 to 1 Year Old,"['Good Earth City Center 2', 'Kunskapsskolan I...",Do you wish to buy an independent house in sec...,"['1 Wardrobe', '1 Fan', '1 Exhaust Fan', '1 Ge...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment5 out of 5', 'Lifestyle4 out of 5..."
1,5 Bedroom House for sale in Sector 21A Faridabad,house,,5.7 Crore,"₹ 105,751/sq.ft.",(50 sq.m.) Plot Area,Plot area 539(50.07 sq.m.),5 Bedrooms,4 Bathrooms,2 Balconies,"Store Room,Pooja Room,Servant Room","Sector 21A Faridabad, Gurgaon, Haryana",2 Floors,,5 to 10 Year Old,,"Hi, we have an independent house/villa availab...","['1 Water Purifier', '5 Fan', '1 Exhaust Fan',...","['Private Garden / Terrace', 'Park', 'Visitor ...",
2,10 Bedroom House for sale in Sushant Lok Phase 1,house,,2.1 Crore,"₹ 38,251/sq.ft.",(51 sq.m.) Plot Area,Plot area 61(51 sq.m.),10 Bedrooms,10 Bathrooms,3+ Balconies,Servant Room,"Sushant Lok Phase 1, Gurgaon, Haryana",5 Floors,West,0 to 1 Year Old,"['Sector 42-43 metro station', 'Huda city cent...","Monthly rental income is rs1,40,000/- Best opt...","['10 Bed', '3 Fan', '10 Geyser', '2 Light', 'N...","['Maintenance Staff', 'Water Storage', 'Visito...","['Environment5 out of 5', 'Lifestyle5 out of 5..."
3,21 Bedroom House for sale in Sector 54 Gurgaon,house,,5 Crore,"₹ 43,066/sq.ft.",(108 sq.m.) Plot Area,Plot area 129(107.86 sq.m.),21 Bedrooms,21 Bathrooms,3+ Balconies,Servant Room,"Sector 54 Gurgaon, Gurgaon, Haryana",5 Floors,North,1 to 5 Year Old,"['Sector 53-54 metro station', 'Sector 54 chow...","129 sq yd plot size. 5 floors built up , fully...","['1 Water Purifier', '21 Fan', '1 Fridge', '1 ...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment4 out of 5', 'Lifestyle5 out of 5..."
4,12 Bedroom House for sale in Sushant Lok Phase 1,house,,3 Crore,"₹ 53,763/sq.ft.",(52 sq.m.) Plot Area,Plot area 62(51.84 sq.m.),12 Bedrooms,12 Bathrooms,3+ Balconies,Others,"1228, Sushant Lok Phase 1, Gurgaon, Haryana",5 Floors,West,Within 6 months,"['Sector 42-43 metro station', 'Huda city cent...",Best for investment purpose. Fully furnished b...,"['1 Water Purifier', '1 Fridge', '12 Fan', '1 ...","['Maintenance Staff', 'Water Storage', 'Visito...","['Environment5 out of 5', 'Lifestyle5 out of 5..."


##`Clean columns data  one by one`

In [None]:
## society
df['society'].value_counts(dropna=False)

Unnamed: 0_level_0,count
society,Unnamed: 1_level_1
,584
Emaar MGF Marbella,26
Vipul Tatvam Villa,26
International City by SOBHA Phase 2,26
International City by Sobha Phase 1,23
Unitech Uniworld Resorts,13
DLF City Plots,11
DLF City Plots Phase 2,11
Unitech Espace,11
Bptp Visionnaire,9


In [None]:
df['society']=df['society'].str.split(r'(?<=\D)(\d+\.\d+|\d+)(?=\W)').str.get(0)

In [None]:
##price
df = df[df['price'] != 'price']
df=df[df['price'] != 'Price on Request']

In [None]:
def treat_price(x):
    # Ensure the input is a string before processing
    if isinstance(x, str):
        # Normalize the string (remove spaces and handle uppercase/lowercase)
        x = x.strip().lower()

        # Split the string into number and unit part
        parts = x.split()

        if len(parts) == 2:
            num = float(parts[0])
            unit = parts[1]

            # Check for unit and convert accordingly
            if unit == 'lac':
                return np.round(num / 100, 2)  # Convert 'Lac' to 'Cr' by dividing by 100
            elif unit in ['crore', 'cr']:
                return np.round(num, 2)  # 'Crore' and 'Cr' are the same, keep as is
        else:
            # If there's no unit, return the number as is
            return np.round(float(parts[0]), 2)
    else:
        return x  # If it's not a string, return the value as is

In [None]:
df['price']=df['price'].apply(treat_price)
df['price']=df['price'].astype(float)

In [None]:
df['price']

Unnamed: 0,price
0,5.25
1,5.7
2,2.1
3,5.0
4,3.0
5,4.5
6,12.0
7,20.0
8,10.85
9,1.95


In [None]:
##rate
## rename columns
df.rename(columns={'rate':'Price_per_sqrt'},inplace=True)

In [None]:
df['Price_per_sqrt']=df['Price_per_sqrt'].str.replace('/sq.ft.','').str.replace(',','').str.replace('₹','').str.strip()
df['Price_per_sqrt']=df['Price_per_sqrt'].astype(float)

In [None]:
##area
df=df.drop(columns=['area'])
df.insert(loc=5,column='Area',value=(np.round((df['price']*10000000)/df['Price_per_sqrt'],2)))

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1026 entries, 0 to 1035
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   property_name    1026 non-null   object 
 1   Property_type    1026 non-null   object 
 2   society          452 non-null    object 
 3   price            957 non-null    float64
 4   Price_per_sqrt   998 non-null    float64
 5   Area             957 non-null    float64
 6   areaWithType     976 non-null    object 
 7   bedRoom          976 non-null    object 
 8   bathroom         976 non-null    object 
 9   balcony          976 non-null    object 
 10  additionalRoom   589 non-null    object 
 11  address          1020 non-null   object 
 12  noOfFloor        956 non-null    object 
 13  facing           673 non-null    object 
 14  agePossession    976 non-null    object 
 15  nearbyLocations  909 non-null    object 
 16  description      1025 non-null   object 
 17  furnishDetails   73

In [None]:
##bedroom
df['bedRoom']=df['bedRoom'].str.replace('Bedrooms','').str.replace('Bedroom','').str.strip()
df['bedRoom']=df['bedRoom'].astype(float)

In [None]:
##bathroom
df['bathroom']=df['bathroom'].str.replace('Bathrooms','').str.replace('Bathroom','').str.strip()
df['bathroom']= df['bathroom'].astype(float)

In [None]:
##balcony
df['balcony']=df['balcony'].str.replace(r'No Balcony', '0', regex=True).str.replace('+','').str.replace('Balconies',
                                                                                          '').str.replace('Balcony','').str.strip()

df['balcony']=df['balcony'].astype(float)

In [None]:
##additionalRoom
df['additionalRoom'].value_counts(dropna=False)

Unnamed: 0_level_0,count
additionalRoom,Unnamed: 1_level_1
,437
Servant Room,82
"Pooja Room,Study Room,Servant Room,Store Room",63
Others,59
Pooja Room,38
"Pooja Room,Study Room,Servant Room",34
"Pooja Room,Study Room,Servant Room,Others",32
Store Room,31
"Pooja Room,Servant Room",24
"Study Room,Servant Room",19


In [None]:
# df['additionalRoom']=df['additionalRoom'].fillna('Not available')
# def additionalRoom_processing(x):
#   if x =='Not available':
#     return 0
#   else:
#     return 1
# df['additionalRoom']=df['additionalRoom'].apply(additionalRoom_processing)
# df['additionalRoom']=df['additionalRoom'].astype(int)

In [None]:
df.sample()

Unnamed: 0,property_name,Property_type,society,price,Price_per_sqrt,Area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,noOfFloor,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
834,4 Bedroom House for sale in Sushant Lok Phase 3,house,Ansals Florence Villa,3.3,19820.0,1664.98,Plot area 185(154.68 sq.m.),4.0,3.0,3.0,"Pooja Room,Study Room,Servant Room,Store Room","112, Sushant Lok Phase 3, Gurgaon, Haryana",2 Floors,East,5 to 10 Year Old,"['Sector metro station', 'Sector metro station...",This prime locution in sector 57 gated block c...,,"['Security / Fire Alarm', 'Feng Shui / Vaastu ...","['Environment4 out of 5', 'Lifestyle4 out of 5..."


In [None]:
#floorNum
df['noOfFloor']=df['noOfFloor'].str.split(' ').str.get(0).str.replace(r'Lower','0',regex=True).str.replace(r'Ground','0',regex=True).str.replace(r'Basement',
                '-1',regex=True).replace(r'[A-Za-z]','',regex=True).str.strip()

In [None]:
df['noOfFloor']=df['noOfFloor'].astype(float)

In [None]:
##facing
df['facing']=df['facing'].fillna('NA')

In [None]:
df.sample(10)

Unnamed: 0,property_name,Property_type,society,price,Price_per_sqrt,Area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,noOfFloor,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
762,3 Bedroom House for sale in DLF Phase 1,house,,6.75,27778.0,2429.98,Plot area 270(225.75 sq.m.),3.0,3.0,3.0,,"DLF Phase 1, Gurgaon, Haryana",2.0,North-East,5 to 10 Year Old,"['Guru dronacharya metro station', 'Dlf phase ...",Excellent liveable 3bhk property ne facing in ...,,"['Feng Shui / Vaastu Compliant', 'High Ceiling...","['Environment5 out of 5', 'Lifestyle5 out of 5..."
606,3 Bedroom House for sale in Sector 33 Gurgaon,house,Nitin Vihar,0.85,9444.0,900.04,Plot area 900(83.61 sq.m.),3.0,2.0,0.0,,"Sector 33 Gurgaon, Gurgaon, Haryana",1.0,West,1 to 5 Year Old,"['Rajiv Chowk Mosque', 'Airforce Hospital', 'S...",Independent house available for sale in nitin ...,,"['High Ceiling Height', 'Water Storage', 'No o...","['Environment3 out of 5', 'Lifestyle4 out of 5..."
232,3 Bedroom House for sale in Surya Vihar,house,,0.45,8411.0,535.01,Carpet area: 535 (49.7 sq.m.),3.0,2.0,1.0,,"Surya Vihar, Surya Vihar, Gurgaon, Haryana",1.0,,undefined,,Best in class property available at surya viha...,,,"['Environment4 out of 5', 'Lifestyle4 out of 5..."
1003,5 Bedroom House for sale in Sector 33 Gurgaon,house,Unitech Uniworld Resorts,11.75,23240.0,5055.94,Plot area 502(419.74 sq.m.),5.0,7.0,3.0,Servant Room,"A-6, Sector 33 Gurgaon, Gurgaon, Haryana",3.0,East,1 to 5 Year Old,"['Rajiv Chowk Mosque', 'Icici bank ATM', 'Stan...",520 sqyd 5bhk villa available for sale at unit...,,"['Private Garden / Terrace', 'High Ceiling Hei...","['Environment3 out of 5', 'Safety4 out of 5', ..."
284,3 Bedroom House for sale in Sector 66 Gurgaon,house,Emaar MGF The Palm Drive,,,,,,,,,"Sector 66 Gurgaon, Gurgaon, Haryana",,,,"['Sector 55-56 Rapid Metro Station', 'HUB 66',...","Emaar mgf the palm drive in sector-66, gurgaon...",,,"['Environment3 out of 5', 'Lifestyle4 out of 5..."
88,5 Bedroom House for sale in Sector 63A Gurgaon,house,Anant Raj Estates,,,,,,,,,"Sector 63A Gurgaon, Gurgaon, Haryana",,,,"['Sector 55-56 Metro Station', 'Paras Trinity ...","Anant raj estates in sector-63a gurgaon, gurga...",,,"['Environment4 out of 5', 'Lifestyle4 out of 5..."
154,16 Bedroom House for sale in Shivaji Nagar,house,,4.5,25000.0,1800.0,Plot area 200(167.23 sq.m.),16.0,16.0,3.0,,"1111, Shivaji Nagar, Gurgaon, Haryana",4.0,South,1 to 5 Year Old,"['Hanuman Mandir', 'Rajiv Chowk Mosque', 'Rama...",Shivaji nagar is one of gurgaon's most sought ...,"['24 Wardrobe', '20 Fan', '1 Exhaust Fan', '14...","['Feng Shui / Vaastu Compliant', 'Park', 'Visi...","['Environment4 out of 5', 'Lifestyle4 out of 5..."
731,5 Bedroom House for sale in Sector 33 Gurgaon,house,Unitech Uniworld Resorts,,,,Plot area 520(434.79 sq.m.),5.0,8.0,3.0,"Servant Room,Store Room","A-898, Sector 33 Gurgaon, Gurgaon, Haryana",4.0,West,1 to 5 Year Old,"['Rajiv Chowk Mosque', 'Icici bank ATM', 'Stan...",5 bhk 520 sqyd villas available at unitech uni...,"['5 Wardrobe', '15 Fan', '1 Exhaust Fan', '5 G...","['Private Garden / Terrace', 'High Ceiling Hei...","['Environment3 out of 5', 'Lifestyle4 out of 5..."
945,3 Bedroom House for sale in Sector 82 Gurgaon,house,Vatika Independent Floors,1.4,8235.0,1700.06,Plot area 1700(157.94 sq.m.),3.0,3.0,2.0,Pooja Room,"N-24, Sector 82 Gurgaon, Gurgaon, Haryana",3.0,North-West,0 to 1 Year Old,"['Sapphire 83 Mall', 'Delhi Jaipur Expressway'...","Vatika independent floors in sector-84, gurgao...","['5 Fan', '1 Exhaust Fan', '12 Light', '1 Curt...","['Security / Fire Alarm', 'High Ceiling Height...","['Environment4 out of 5', 'Safety4 out of 5', ..."
148,2 Bedroom House for sale in Laxman Vihar Phase 2,house,,0.61,100000.0,61.0,Plot area 61(5.67 sq.m.),2.0,2.0,2.0,Others,"Laxman Vihar Phase 2, Gurgaon, Haryana",2.0,South-East,5 to 10 Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...",Near by road market near by railways station n...,"['3 Fan', '1 Light', '1 Modular Kitchen', 'No ...","['Water Storage', 'Rain Water Harvesting']",


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1026 entries, 0 to 1035
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   property_name    1026 non-null   object 
 1   Property_type    1026 non-null   object 
 2   society          452 non-null    object 
 3   price            957 non-null    float64
 4   Price_per_sqrt   998 non-null    float64
 5   Area             957 non-null    float64
 6   areaWithType     976 non-null    object 
 7   bedRoom          976 non-null    float64
 8   bathroom         976 non-null    float64
 9   balcony          976 non-null    float64
 10  additionalRoom   589 non-null    object 
 11  address          1020 non-null   object 
 12  noOfFloor        956 non-null    float64
 13  facing           1026 non-null   object 
 14  agePossession    976 non-null    object 
 15  nearbyLocations  909 non-null    object 
 16  description      1025 non-null   object 
 17  furnishDetails   73

In [None]:
df.to_csv('cleaned_house_csv',index=False)