<h1 style='color:purple' align='center'>Data Science Regression Project: Predicting Land Prices in Colombo</h1>

Dataset is downloaded from here: https://www.kaggle.com/datasets/ruchiraayeshmantha/land-prices-of-colombo-district?resource=download

**Usages for data generations**

In [1]:
switch_dict = {
    'Agricultural': 1,
    'Commercial': 2,
    'Residential': 3,
    'Other': 4
}

In [2]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
import matplotlib
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
matplotlib.rcParams["figure.figsize"] = (20,10)
pd.reset_option('display.float_format')
import requests


<h2 style='color:blue'>Data Load: Load Colombo land prices into a dataframe</h2>

In [3]:
df1 = pd.read_csv("datasets/P4_Data.csv")
df1.head()

Unnamed: 0,longitude,latitude,Price per Perch,Agricultural,Commercial,Residential,Other,Address,Price_Scale,Mentioned Price(Rs),...,police_mdist,post_office_count,post_office_mdist,pharmacy_count,pharmacy_mdist,movie_theater_count,movie_theater_mdist,library_count,library_mdist,Air
0,79.980418,6.917638,2372737.53,0.0,0.0,0.0,1.0,malabe,per perch,440000.0,...,773.0,2.0,602.0,9.0,1059.0,0.0,583.0,1.0,567.0,127.0
1,80.02284,6.91034,514669.88,0.0,0.0,1.0,0.0,ranala,total price,4000000.0,...,0.0,0.0,0.0,0.0,0.0,1.0,514.0,0.0,0.0,125.0
2,79.943102,6.882818,2562167.25,0.0,1.0,0.0,0.0,thalawathugoda,per perch,2100000.0,...,0.0,2.0,1150.0,10.0,695.0,1.0,1076.0,0.0,0.0,123.0
3,79.962091,6.890742,2048051.51,0.0,0.0,0.0,1.0,malabe,per perch,850000.0,...,773.0,2.0,602.0,9.0,1059.0,0.0,583.0,1.0,567.0,127.0
4,79.913879,6.835342,2230222.57,0.0,0.0,1.0,0.0,maharagama,per perch,1550000.0,...,0.0,2.0,1261.0,14.0,521.0,2.0,1165.0,0.0,0.0,121.0


In [4]:
df1.shape

(23124, 81)

In [5]:
df1.columns

Index(['longitude', 'latitude', 'Price per Perch', 'Agricultural',
       'Commercial', 'Residential', 'Other', 'Address', 'Price_Scale',
       'Mentioned Price(Rs)', 'Address_ID', 'Land_size(Perches)',
       'Posted_Date_new', 'Distance from fort', 'count_govtschools_A',
       'min_dist_govtschools_a', 'count_govtschools_B',
       'min_dist_govtschools_b', 'count_semigovtschools',
       'min_dist_semigovtschools', 'count_intlschools', 'min_dist_intlschools',
       'count_uni', 'min_dist_uni', 'min_dist_nearest_express',
       'min_dist_nearest_railway', 'min_dist_nearest_bank',
       'count_banks_within_2km', 'min_dist_nearest_FinanceCompany',
       'count_FinanceCompanies_within_2km', 'min_dist_nearest_Govt_Hospital',
       'count_Govt_Hospitals', 'min_dist_nearest_Pvt_Hospital',
       'count_Pvt_Hospital', 'min_dist_nearest_Pvt_Med_center',
       'count_Pvt_Med_Centers', 'min_dist_nearest_Supermarket',
       'count_Supermarkets_within2km', 'min_dist_nearest_Fuel_station

In [6]:
df1['Address'].unique()

array(['malabe', 'ranala', 'thalawathugoda', 'maharagama', 'madapatha',
       'mount lavinia', 'kirulapone', 'kesbewa', 'nugegoda', 'makandana',
       'bope', 'nawala', 'kaduwela', 'bomiriya', 'athurugiriya',
       'piliyandala', 'dehiwala', 'battaramulla', 'boralesgamuwa',
       'hokandara', 'homagama', 'kahathuduwa', 'pitakotte', 'diyagama'],
      dtype=object)

In [7]:
df1['Address'].value_counts()

Address
ranala            2596
makandana         1995
kaduwela          1835
kesbewa           1523
piliyandala       1476
battaramulla      1430
malabe            1178
nugegoda          1023
thalawathugoda     864
bomiriya           848
madapatha          844
bope               824
nawala             817
dehiwala           706
homagama           681
athurugiriya       675
kahathuduwa        654
mount lavinia      576
diyagama           450
pitakotte          450
hokandara          428
maharagama         425
boralesgamuwa      424
kirulapone         402
Name: count, dtype: int64

<h2 style='color:blue'>Data Filter: Filter based on data counts higher than 50 for each city name</h2>

In [8]:
# Assuming df1 is your DataFrame
address_counts = df1['Address'].value_counts()
filtered_addresses = address_counts[address_counts >= 50].index
filtered_df = df1[df1['Address'].isin(filtered_addresses)]
filtered_df['Address'].value_counts()
row_count = filtered_df.shape[0]
print("Number of rows:", row_count)
filtered_df['Address'].value_counts()



Number of rows: 23124


Address
ranala            2596
makandana         1995
kaduwela          1835
kesbewa           1523
piliyandala       1476
battaramulla      1430
malabe            1178
nugegoda          1023
thalawathugoda     864
bomiriya           848
madapatha          844
bope               824
nawala             817
dehiwala           706
homagama           681
athurugiriya       675
kahathuduwa        654
mount lavinia      576
diyagama           450
pitakotte          450
hokandara          428
maharagama         425
boralesgamuwa      424
kirulapone         402
Name: count, dtype: int64

**Drop features that are not required to build our model**

In [9]:
df2 = df1.drop(['Land_size(Perches)', 'Price_Scale', 'Distance from fort', 'Mentioned Price(Rs)'], axis='columns')
df2.shape

(23124, 77)

<h2 style='color:blue'>Data Cleaning</h2>

**Handle NA values**

In [10]:
df2.isnull().sum()

longitude              0
latitude               0
Price per Perch        0
Agricultural           0
Commercial             0
                      ..
movie_theater_count    0
movie_theater_mdist    0
library_count          0
library_mdist          0
Air                    0
Length: 77, dtype: int64

In [11]:
df2.shape

(23124, 77)

In [12]:
df3 = df2.dropna()
df3.isnull().sum()

longitude              0
latitude               0
Price per Perch        0
Agricultural           0
Commercial             0
                      ..
movie_theater_count    0
movie_theater_mdist    0
library_count          0
library_mdist          0
Air                    0
Length: 77, dtype: int64

In [13]:
df3.shape

(23124, 77)

In [14]:
df3.head()

Unnamed: 0,longitude,latitude,Price per Perch,Agricultural,Commercial,Residential,Other,Address,Address_ID,Posted_Date_new,...,police_mdist,post_office_count,post_office_mdist,pharmacy_count,pharmacy_mdist,movie_theater_count,movie_theater_mdist,library_count,library_mdist,Air
0,79.980418,6.917638,2372737.53,0.0,0.0,0.0,1.0,malabe,12.0,07/15/2021,...,773.0,2.0,602.0,9.0,1059.0,0.0,583.0,1.0,567.0,127.0
1,80.02284,6.91034,514669.88,0.0,0.0,1.0,0.0,ranala,1.0,11/13/2021,...,0.0,0.0,0.0,0.0,0.0,1.0,514.0,0.0,0.0,125.0
2,79.943102,6.882818,2562167.25,0.0,1.0,0.0,0.0,thalawathugoda,10.0,10/01/2023,...,0.0,2.0,1150.0,10.0,695.0,1.0,1076.0,0.0,0.0,123.0
3,79.962091,6.890742,2048051.51,0.0,0.0,0.0,1.0,malabe,12.0,10/04/2021,...,773.0,2.0,602.0,9.0,1059.0,0.0,583.0,1.0,567.0,127.0
4,79.913879,6.835342,2230222.57,0.0,0.0,1.0,0.0,maharagama,21.0,06/19/2022,...,0.0,2.0,1261.0,14.0,521.0,2.0,1165.0,0.0,0.0,121.0
