# Introduction to Pandas

In [536]:
import pandas as pd

## Reading and Importing Data

We are going to use the House Sales in King County, USA dataset from Kaggle: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction?select=kc_house_data.csv

In [537]:
df = pd.read_csv('kc_house_data.csv')

## Inspecting the Data

In [538]:
df.head(7)

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3,1.0,1180,5650,1.0,0,0,...,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3,2.25,2570,7242,2.0,0,0,...,7,2170,400,1951,1991,98125,47.721,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2,1.0,770,10000,1.0,0,0,...,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4,3.0,1960,5000,1.0,0,0,...,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
4,1954400510,20150218T000000,510000.0,3,2.0,1680,8080,1.0,0,0,...,8,1680,0,1987,0,98074,47.6168,-122.045,1800,7503
5,7237550310,20140512T000000,1230000.0,4,4.5,5420,101930,1.0,0,0,...,11,3890,1530,2001,0,98053,47.6561,-122.005,4760,101930
6,1321400060,20140627T000000,257500.0,3,2.25,1715,6819,2.0,0,0,...,7,1715,0,1995,0,98003,47.3097,-122.327,2238,6819


We can use the following code to have a better view of our data and to control what to be displayed:

In [539]:
with pd.option_context('display.max_rows', 8, 'display.max_columns', None):
    display(df)

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3,1.00,1180,5650,1.0,0,0,3,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3,2.25,2570,7242,2.0,0,0,3,7,2170,400,1951,1991,98125,47.7210,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2,1.00,770,10000,1.0,0,0,3,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4,3.00,1960,5000,1.0,0,0,5,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21609,6600060120,20150223T000000,400000.0,4,2.50,2310,5813,2.0,0,0,3,8,2310,0,2014,0,98146,47.5107,-122.362,1830,7200
21610,1523300141,20140623T000000,402101.0,2,0.75,1020,1350,2.0,0,0,3,7,1020,0,2009,0,98144,47.5944,-122.299,1020,2007
21611,291310100,20150116T000000,400000.0,3,2.50,1600,2388,2.0,0,0,3,8,1600,0,2004,0,98027,47.5345,-122.069,1410,1287
21612,1523300157,20141015T000000,325000.0,2,0.75,1020,1076,2.0,0,0,3,7,1020,0,2008,0,98144,47.5941,-122.299,1020,1357


### Dataset Information

We can even easily check the number of columns and rows:

In [540]:
df.shape
# Output: (nb of rows, nb of columns)

(21613, 21)

Let's check more info:

In [541]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21613 non-null  int64  
 1   date           21613 non-null  object 
 2   price          21613 non-null  float64
 3   bedrooms       21613 non-null  int64  
 4   bathrooms      21613 non-null  float64
 5   sqft_living    21613 non-null  int64  
 6   sqft_lot       21613 non-null  int64  
 7   floors         21613 non-null  float64
 8   waterfront     21613 non-null  int64  
 9   view           21613 non-null  int64  
 10  condition      21613 non-null  int64  
 11  grade          21613 non-null  int64  
 12  sqft_above     21613 non-null  int64  
 13  sqft_basement  21613 non-null  int64  
 14  yr_built       21613 non-null  int64  
 15  yr_renovated   21613 non-null  int64  
 16  zipcode        21613 non-null  int64  
 17  lat            21613 non-null  float64
 18  long  

We don't have null values in any of our columns. If we did, we can use dropna() to drop all rows containing null values, or use fillna() to replace these null values with the mean, median, or mode depending on our variable types (numeric or categorical).

## null Values: How to deal with them?

Let's introduce some null values in our dataset just for the sake of understanding how to deal with them when a dataset actually has null values.

First, we need to use numpy module to introduce a percentage of null values in the dataset, but let's work on a copy with some of the columns.

In [542]:
df2 = df.iloc[:,[2,3,4]].copy()

In [543]:
import numpy as np
missing_percentage = 0.15
mask = np.random.rand(*df2.shape) < missing_percentage
df2[mask] = np.nan

Let's see if that worked.

In [544]:
df2.head()

Unnamed: 0,price,bedrooms,bathrooms
0,221900.0,3.0,1.0
1,538000.0,3.0,2.25
2,180000.0,,1.0
3,604000.0,4.0,
4,510000.0,,2.0


In [545]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   price      18366 non-null  float64
 1   bedrooms   18417 non-null  float64
 2   bathrooms  18312 non-null  float64
dtypes: float64(3)
memory usage: 506.7 KB


As we can see, almost 25% of the data in each column now contains null values. Let's deal with it.<br>
Let's delete the rows containing missing values as a first approach:

In [546]:
df2_ = df2.dropna()
df2_.info()

<class 'pandas.core.frame.DataFrame'>
Index: 13257 entries, 0 to 21612
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   price      13257 non-null  float64
 1   bedrooms   13257 non-null  float64
 2   bathrooms  13257 non-null  float64
dtypes: float64(3)
memory usage: 414.3 KB


Another approach would be to replace the missing values with ones that make sense.<br>
It sensible to use the average as an alternative for missing values in the price and sqft_lot columns, and the mode or median in the rest, assuming that they are categorical variables.

In [547]:
cols2 = ['bedrooms','bathrooms','waterfront','view','condition','grade','yr_built']
df2['price'].fillna(df2['price'].mean(), inplace = True)
df2['bedrooms'].fillna(df2['bedrooms'].mode().iloc[0], inplace = True)
df2['bathrooms'].fillna(df2['bathrooms'].median(), inplace = True)
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   price      21613 non-null  float64
 1   bedrooms   21613 non-null  float64
 2   bathrooms  21613 non-null  float64
dtypes: float64(3)
memory usage: 506.7 KB


```All missing values have been replaced. :)```

## Data Manipulation

By looking at our dataset, we can logically guess which columns are of interest. The target variable is the price, and the goal is to look at the factors impacting the price.

In [548]:
# Checking our columns
df.columns

Index(['id', 'date', 'price', 'bedrooms', 'bathrooms', 'sqft_living',
       'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade',
       'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode',
       'lat', 'long', 'sqft_living15', 'sqft_lot15'],
      dtype='object')

It makes sense to take the following variables into consideration:
1. **bedrooms**: Normally, more bedrooms mean higher price
2. **sqft_living** & **sqft_lot**: area affects price in most cases
3. **floors**: number of floors could be taken into consideration
4. **view** and **waterfront**: The better the view, the higher the price usually is, having a waterfrront view may affect prices as well
5. **yr_built**, **year_renovated** & **condition**: all three are related and could affect the price
6. **grade**: the grading of a house naturally affects the price

We'll take a subset of our dataset:


In [549]:
var_of_interest = ['id', 'price', 'bedrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade', 'yr_built', 'yr_renovated']
df_ = df[var_of_interest].copy()

### Bedrooms and floors

In [550]:
df_.bedrooms.value_counts()

bedrooms
3     9824
4     6882
2     2760
5     1601
6      272
1      199
7       38
0       13
8       13
9        6
10       3
11       1
33       1
Name: count, dtype: int64

In [551]:
df_.loc[df_.bedrooms == 33, :]

Unnamed: 0,id,price,bedrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,yr_built,yr_renovated
15870,2402100895,640000.0,33,1620,6000,1.0,0,0,5,7,1947,0


This house has 33 bedrooms even though its area is not that big and it's relatively less expensive than most houses. We can assume it's a typo and drop that row.

In [552]:
df_ = df_[df_.id != 2402100895]
df_.iloc[15869:15871,:]

Unnamed: 0,id,price,bedrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,yr_built,yr_renovated
15869,1402630190,362000.0,3,2310,7485,2.0,0,0,3,8,1986,0
15871,3750604417,172500.0,3,1140,8800,1.0,0,0,3,7,1972,0


We can create a flag column for bedrooms that takes a value of 0 when number of bedrooms is less than 4 and 1 otherwise.

In [553]:
df_['bedroom_nb'] = df_.apply(lambda x: '>= 4' if x['bedrooms'] >= 4 else '<4', axis=1)
df_.head()

Unnamed: 0,id,price,bedrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,yr_built,yr_renovated,bedroom_nb
0,7129300520,221900.0,3,1180,5650,1.0,0,0,3,7,1955,0,<4
1,6414100192,538000.0,3,2570,7242,2.0,0,0,3,7,1951,1991,<4
2,5631500400,180000.0,2,770,10000,1.0,0,0,3,6,1933,0,<4
3,2487200875,604000.0,4,1960,5000,1.0,0,0,5,7,1965,0,>= 4
4,1954400510,510000.0,3,1680,8080,1.0,0,0,3,8,1987,0,<4


Another way to do this is by using .loc[ ] :

In [554]:
df_['bedroom_nb'] = '>4'  # to nitialize the 'bedroom_nb' column with 0
df_.loc[df_['bedrooms'] >= 4, 'bedroom_nb'] = '>= 4'
df_.head()


Unnamed: 0,id,price,bedrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,yr_built,yr_renovated,bedroom_nb
0,7129300520,221900.0,3,1180,5650,1.0,0,0,3,7,1955,0,>4
1,6414100192,538000.0,3,2570,7242,2.0,0,0,3,7,1951,1991,>4
2,5631500400,180000.0,2,770,10000,1.0,0,0,3,6,1933,0,>4
3,2487200875,604000.0,4,1960,5000,1.0,0,0,5,7,1965,0,>= 4
4,1954400510,510000.0,3,1680,8080,1.0,0,0,3,8,1987,0,>4


Now let's use this column for drawing insights:

In [555]:
df_.groupby('bedroom_nb').agg({'id' : ['count'],
                                'price' : ['mean'],
                                'sqft_living' : ['mean', 'max', 'min'],
                                'sqft_lot' : ['mean', 'max', 'min'],
                                'floors': ['median', 'max', 'min'],
                                'grade' : ['min', 'max', 'median']}).round()

Unnamed: 0_level_0,id,price,sqft_living,sqft_living,sqft_living,sqft_lot,sqft_lot,sqft_lot,floors,floors,floors,grade,grade,grade
Unnamed: 0_level_1,count,mean,mean,max,min,mean,max,min,median,max,min,min,max,median
bedroom_nb,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2
>4,12796,449912.0,1669.0,6840,290,13954.0,1164794,572,1.0,4.0,1.0,1,13,7.0
>= 4,8816,671193.0,2676.0,13540,800,16782.0,1651359,520,2.0,4.0,1.0,5,13,8.0


- There are much more houses with 4 or more bedrooms than that with less than 4
- Those houses are more expensive on average, and have bigger living areas.
- The average square footage of total area is not that different in the two cases, which makes sense considering the fact that bedrooms affect the living area but not necessarily the lot area.
- Houses with 4 or more bedrooms have more floors in general.
- And as or the grade, bedrooms with 4 or more bedrooms are given a grade of at least 5, so the number of bedrooms could affect the grade as well.

### Grade

In [556]:
df_grade = df_.groupby('grade')['price'].mean().round()
df_grade

grade
1      142000.0
3      205667.0
4      214381.0
5      248524.0
6      301917.0
7      402567.0
8      542895.0
9      773738.0
10    1072347.0
11    1497792.0
12    2192500.0
13    3710769.0
Name: price, dtype: float64

We break down the code and understand what is happening:<br>
**df_.groupby('grade')** groups the data by grade, and then **['price']** is used to retrieve the column corresponding to the prices. But this alone doesn't make sense... Say we're interested in the average prices of each grade, that's what **.mean()** is for, and finally, **.round()** just rounds up the averages to the nearest integer.
#### Insights:
```
The 'grade' variable determines the overall rating of the houses, it takes the values between 1 and 13. As observed, the average price of houses of each grade grows bigger as the grade gets higher. So, we can conclude that a house with a good grade is more expensive than that with a low grade, which makes sense.
```

### View & Waterfront

Grouping the average price by view:

In [557]:
df_view = df_.groupby('view')['price'].mean().round()
df_view

view
0     496616.0
1     812519.0
2     792746.0
3     972468.0
4    1464363.0
Name: price, dtype: float64

If we sort the prices in descending order:

In [558]:
df_view.sort_values(ascending = False)

view
4    1464363.0
3     972468.0
1     812519.0
2     792746.0
0     496616.0
Name: price, dtype: float64

```Obviously, the better the view (4 being the highest rating), the pricier the house.```

One can be tempted to check if the view rating is related the having a waterfront view, it will only make sense that it is:

In [559]:
df_.groupby('waterfront')['view'].mean().reset_index().rename(columns = {'view':'avg_view_rating'})

Unnamed: 0,waterfront,avg_view_rating
0,0,0.207469
1,1,3.766871


```As expected, the average view rating is much higher (3.76) when a house has a waterfront view than that with no waterfront view (0.2).```

Now let's see how that affects the average prices and grade:

In [560]:
df_.groupby(['view','waterfront'])[['price','grade']].mean().round()

Unnamed: 0_level_0,Unnamed: 1_level_0,price,grade
view,waterfront,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,496616.0,8.0
1,0,813055.0,8.0
1,1,635000.0,7.0
2,0,783955.0,8.0
2,1,1842188.0,8.0
3,0,964685.0,9.0
3,1,1173605.0,8.0
4,0,1270713.0,9.0
4,1,1728300.0,9.0


```
The grade given to a house is not greatly influenced by having a waterfront view or a good view in general. Although the view could affect the grade, there could be other factors that play a more important role.
As for the prices, they are generally higher when the house has a waterfront view, and as seen above, houses with better view are generally more expensive.
```

### Square Footage of Living and Total Square Footage

Do houses with bigger areas tend to be graded higher?

In [561]:
df_[['grade','sqft_living','sqft_lot']].groupby('grade').mean().round()

Unnamed: 0_level_0,sqft_living,sqft_lot
grade,Unnamed: 1_level_1,Unnamed: 2_level_1
1,290.0,20875.0
3,597.0,26953.0
4,660.0,22101.0
5,983.0,24020.0
6,1192.0,12647.0
7,1689.0,11767.0
8,2185.0,13510.0
9,2868.0,20639.0
10,3520.0,28191.0
11,4395.0,38373.0


```
If we look at the first 5 rows, we notice that houses graded '1' to '5' (low grade) have a small living area and a huge total area on average. This could mean that most people wouldn't care to buy a small house with a huge yard or drive.
The average graded houses have average areas, and the highest graded houses have both big living and lot areas.
So, the bigger the living area is, the more expensive the house is.
```

### Condition

Let's see what the maximum and minimum grades given to a house in a bad condition is:

First, let's see our variable 'condition':

In [562]:
df_['condition'].describe().round()

count    21612.0
mean         3.0
std          1.0
min          1.0
25%          3.0
50%          3.0
75%          4.0
max          5.0
Name: condition, dtype: float64

```The house in the best condition is represented as a 5, the worst as a 1. 25% of the data points have a condition value of 3 or lower; this is called the first quartile. 50% and 75% represent the 2nd and 3rd quartiles respectively. The second quartile is also called the 'median'. In our case, it means that half the houses are in condition 3 or lower, and the other half are in conditions that are either 3 or higher. As for the 3rd quartile, it means that 75% of the data points have a condition value of 4 or lower.```

In [563]:
df_[['condition','grade']].groupby('condition').agg(['min','max','mean']).round()

Unnamed: 0_level_0,grade,grade,grade
Unnamed: 0_level_1,min,max,mean
condition,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
1,1,8,6.0
2,3,10,7.0
3,3,13,8.0
4,4,13,7.0
5,3,12,7.0


```
The average grades in all conditions are close to each other, it appears that the condition doesn't affect the grade much...
```

In [564]:
df_[['condition','price']].groupby('condition').agg(['mean','min','max']).round(2)

Unnamed: 0_level_0,price,price,price
Unnamed: 0_level_1,mean,min,max
condition,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
1,334431.67,78000.0,1500000.0
2,327316.22,80000.0,2560000.0
3,542097.09,75000.0,7060000.0
4,521300.71,89000.0,7700000.0
5,612561.61,110000.0,3650000.0


Given these observations, we can infer that the condition of a property does indeed affect its price. Houses in better condition tend to command higher prices, while houses in poorer condition have lower prices on average.

### Year built & year renovated

We will create a flag column for the year renovated and check which houses were renovated.

In [565]:
df_['renovated'] = df_.apply(lambda x: 'yes' if x['yr_renovated'] > 0 else 'no', axis=1)
df_.head()

Unnamed: 0,id,price,bedrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,yr_built,yr_renovated,bedroom_nb,renovated
0,7129300520,221900.0,3,1180,5650,1.0,0,0,3,7,1955,0,>4,no
1,6414100192,538000.0,3,2570,7242,2.0,0,0,3,7,1951,1991,>4,yes
2,5631500400,180000.0,2,770,10000,1.0,0,0,3,6,1933,0,>4,no
3,2487200875,604000.0,4,1960,5000,1.0,0,0,5,7,1965,0,>= 4,no
4,1954400510,510000.0,3,1680,8080,1.0,0,0,3,8,1987,0,>4,no


In [566]:
df_renovated = df_[['price','yr_built','renovated']]

In [567]:
df_renovated.groupby('renovated').agg(['mean','min','max','count']).round()

Unnamed: 0_level_0,price,price,price,price,yr_built,yr_built,yr_built,yr_built
Unnamed: 0_level_1,mean,min,max,count,mean,min,max,count
renovated,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
no,530443.0,75000.0,6890000.0,20698,1972.0,1900,2015,20698
yes,760629.0,110000.0,7700000.0,914,1940.0,1900,2003,914


Renovated houses are mostly those built before the 2000's and are more expensive on average. Also, houses renovated were built in older years based on the average year the houses were built in, which is logical.

### Binning prices

We can bin our prices to draw useful insights:

In [568]:
intervals = [0, 100000, 500000, 1000000, 3000000, 6000000, 8000000]  
labels = ['0-100k', '100k-500k', '500k-1M', '1M-3M', '3M-6M','6M+']
# The following function iterates over the indices of intervals up to the second-to-last index and returns the corresponding label,
# and returns the last label for the last index in the interval.
def bin_price(price):
    for i in range(len(intervals)-1):
        if price < intervals[i+1]:
            return labels[i]
    return labels[-1]
df_['price_bins'] = df_['price'].apply(bin_price)
df_.head()

Unnamed: 0,id,price,bedrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,yr_built,yr_renovated,bedroom_nb,renovated,price_bins
0,7129300520,221900.0,3,1180,5650,1.0,0,0,3,7,1955,0,>4,no,100k-500k
1,6414100192,538000.0,3,2570,7242,2.0,0,0,3,7,1951,1991,>4,yes,500k-1M
2,5631500400,180000.0,2,770,10000,1.0,0,0,3,6,1933,0,>4,no,100k-500k
3,2487200875,604000.0,4,1960,5000,1.0,0,0,5,7,1965,0,>= 4,no,500k-1M
4,1954400510,510000.0,3,1680,8080,1.0,0,0,3,8,1987,0,>4,no,500k-1M


In [569]:
cols = ['id','bedrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 'view', 'grade','price_bins']
df_[cols].groupby('price_bins').agg({'bedrooms': ['median', 'min', 'max'],
                                     'sqft_living': 'mean',
                                     'sqft_lot': 'mean',
                                     'floors': 'median',
                                     'view': ['min','max','median'],
                                     'grade': ['min','max','mean'],
                                     'id': 'count'
                                    }).round(2)

Unnamed: 0_level_0,bedrooms,bedrooms,bedrooms,sqft_living,sqft_lot,floors,view,view,view,grade,grade,grade,id
Unnamed: 0_level_1,median,min,max,mean,mean,median,min,max,median,min,max,mean,count
price_bins,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
0-100k,2.0,1,3,779.2,10464.08,1.0,0,0,0.0,3,7,5.32,25
100k-500k,3.0,0,9,1667.07,11964.52,1.0,0,4,0.0,1,11,7.11,12383
1M-3M,4.0,0,10,3697.27,23142.27,2.0,0,4,0.0,6,13,9.68,1441
3M-6M,5.0,2,8,5843.96,21892.85,2.0,0,4,4.0,8,13,11.33,48
500k-1M,4.0,1,11,2418.08,18618.71,2.0,0,4,0.0,5,12,8.14,7712
6M+,6.0,5,6,10660.0,32099.67,2.0,2,4,3.0,11,13,12.33,3


This makes it easy to understand the pattern in our dataset.
- The number of bedrooms tends to increase with price, and so do areas and the grade.
- Almost half of the house prices range between 100k and 500k, and a significant number range between 500k and 1M.

# Summary

The houses sold between the years 1900 and 2015 are graded according to several factors, that includes the view, area, condition, grade, number of bedrooms, renovation and potentially other factors. 