# Analysis of Tehran House Prices using T-Tests
**Objective:** This notebook uses statistical t-tests to analyze the Tehran house price dataset and determine which features (like parking, elevator, area, etc.) have a significant impact on property prices.

**Author:** @hoangvd  
**Date:** 23/06/2025

## 1. Data Preparation

### 1.1. Import Libraries

In [2]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

- Import data, check for comlumns and variables
### 1.2. Load and Clean Data
Load the dataset, inspect its structure, and perform initial cleaning. We drop the original 'Price' column (in Toman) and use 'Price(USD)' for our analysis, renaming it to 'Price'.

In [3]:
houses = pd.read_csv("./data/housePrice.csv")
houses.head()
houses.info()
houses = houses.drop('Price', axis=1)
houses = houses.rename(columns={'Price(USD)': 'Price'})
houses.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3479 entries, 0 to 3478
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Area        3479 non-null   object 
 1   Room        3479 non-null   int64  
 2   Parking     3479 non-null   bool   
 3   Warehouse   3479 non-null   bool   
 4   Elevator    3479 non-null   bool   
 5   Address     3456 non-null   object 
 6   Price       3479 non-null   float64
 7   Price(USD)  3479 non-null   float64
dtypes: bool(3), float64(2), int64(1), object(2)
memory usage: 146.2+ KB


Unnamed: 0,Area,Room,Parking,Warehouse,Elevator,Address,Price
0,63,1,True,True,True,Shahran,61666.67
1,60,1,True,True,True,Shahran,61666.67
2,79,2,True,True,True,Pardis,18333.33
3,95,2,True,True,True,Shahrake Qods,30083.33
4,123,2,True,True,True,Shahrake Gharb,233333.33


- Remove 1% outliers

### 1.3. Outlier Removal
To prevent extreme values from skewing the analysis, we remove the top and bottom 1% of properties based on price.

In [4]:
# Calculate 1st and 99th percentiles
lower_percentile = houses['Price'].quantile(0.01)
upper_percentile = houses['Price'].quantile(0.99)

print(f"1st percentile (1% lowest): ${lower_percentile:.2f}")
print(f"99th percentile (1% highest): ${upper_percentile:.2f}")

# Remove outliers
houses_before = len(houses)
houses = houses[(houses['Price'] >= lower_percentile) & (houses['Price'] <= upper_percentile)]
houses_after = len(houses)

print(f"\nDataset size before removing outliers: {houses_before}")
print(f"Dataset size after removing outliers: {houses_after}")
print(f"Removed {houses_before - houses_after} outliers ({((houses_before - houses_after) / houses_before * 100):.1f}%)")

1st percentile (1% lowest): $9833.33
99th percentile (1% highest): $1333333.33

Dataset size before removing outliers: 3479
Dataset size after removing outliers: 3418
Removed 61 outliers (1.8%)


## 2. Hypothesis Testing with T-Tests
We will now conduct several independent t-tests to compare the average prices between different groups of houses. The significance level (alpha) is set to 0.05. A p-value below 0.05 will lead us to reject the null hypothesis.

### Hypothesis 1: Parking
**Hypothesis:** Houses with parking have a significantly higher average price than houses without parking.
*   **H0 (Null):** Mean price with parking = Mean price without parking.
*   **H1 (Alternative):** Mean price with parking > Mean price without parking.

In [5]:
from scipy import stats

# Separate houses with and without parking
houses_with_parking = houses[houses['Parking'] == True]['Price']
houses_without_parking = houses[houses['Parking'] == False]['Price']

print(f"Houses with parking: {len(houses_with_parking)} properties")
print(f"Houses without parking: {len(houses_without_parking)} properties")

# Calculate descriptive statistics
print(f"\nMean price with parking: ${houses_with_parking.mean():.2f}")
print(f"Mean price without parking: ${houses_without_parking.mean():.2f}")

# Perform independent t-test
# H0: Mean price with parking = Mean price without parking
# H1: Mean price with parking > Mean price without parking (one-tailed test)
t_stat, p_value = stats.ttest_ind(houses_with_parking, houses_without_parking, alternative='greater')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.6f}")

# Interpret results
alpha = 0.05
if p_value < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with parking have significantly higher average price than houses without parking")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between houses with and without parking")

Houses with parking: 2909 properties
Houses without parking: 509 properties

Mean price with parking: $180957.87
Mean price without parking: $54476.89

T-test results:
T-statistic: 13.9051
P-value: 0.000000

Result: Reject H0 (p < 0.05)
Conclusion: Houses with parking have significantly higher average price than houses without parking


### Hypothesis 2: Warehouse
**Hypothesis:** Houses with a warehouse have a significantly higher average price than those without.
*   **H0 (Null):** Mean price with warehouse = Mean price without warehouse.
*   **H1 (Alternative):** Mean price with warehouse > Mean price without warehouse.

In [6]:
# Separate houses with and without warehouse
houses_with_warehouse = houses[houses['Warehouse'] == True]['Price']
houses_without_warehouse = houses[houses['Warehouse'] == False]['Price']

print(f"Houses with warehouse: {len(houses_with_warehouse)} properties")
print(f"Houses without warehouse: {len(houses_without_warehouse)} properties")

# Calculate descriptive statistics
print(f"\nMean price with warehouse: ${houses_with_warehouse.mean():.2f}")
print(f"Mean price without warehouse: ${houses_without_warehouse.mean():.2f}")

# Perform independent t-test
# H0: Mean price with warehouse = Mean price without warehouse
# H1: Mean price with warehouse > Mean price without warehouse (one-tailed test)
t_stat_warehouse, p_value_warehouse = stats.ttest_ind(houses_with_warehouse, houses_without_warehouse, alternative='greater')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat_warehouse:.4f}")
print(f"P-value: {p_value_warehouse:.6f}")

# Interpret results
if p_value_warehouse < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with warehouse have significantly higher average price than houses without warehouse")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between houses with and without warehouse")

Houses with warehouse: 3134 properties
Houses without warehouse: 284 properties

Mean price with warehouse: $169875.23
Mean price without warehouse: $76571.19

T-test results:
T-statistic: 7.8057
P-value: 0.000000

Result: Reject H0 (p < 0.05)
Conclusion: Houses with warehouse have significantly higher average price than houses without warehouse


### Hypothesis 3: Elevator
**Hypothesis:** Houses with an elevator have a significantly higher average price than those without.
*   **H0 (Null):** Mean price with elevator = Mean price without elevator.
*   **H1 (Alternative):** Mean price with elevator > Mean price without elevator.

In [7]:
# Separate houses with and without elevator
houses_with_elevator = houses[houses['Elevator'] == True]['Price']
houses_without_elevator = houses[houses['Elevator'] == False]['Price']

print(f"Houses with elevator: {len(houses_with_elevator)} properties")
print(f"Houses without elevator: {len(houses_without_elevator)} properties")

# Calculate descriptive statistics
print(f"\nMean price with elevator: ${houses_with_elevator.mean():.2f}")
print(f"Mean price without elevator: ${houses_without_elevator.mean():.2f}")

# Perform independent t-test
# H0: Mean price with elevator = Mean price without elevator
# H1: Mean price with elevator > Mean price without elevator (one-tailed test)
t_stat_elevator, p_value_elevator = stats.ttest_ind(houses_with_elevator, houses_without_elevator, alternative='greater')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat_elevator:.4f}")
print(f"P-value: {p_value_elevator:.6f}")

# Interpret results
if p_value_elevator < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with elevator have significantly higher average price than houses without elevator")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between houses with and without elevator")

Houses with elevator: 2706 properties
Houses without elevator: 712 properties

Mean price with elevator: $181124.46
Mean price without elevator: $89905.05

T-test results:
T-statistic: 11.3364
P-value: 0.000000

Result: Reject H0 (p < 0.05)
Conclusion: Houses with elevator have significantly higher average price than houses without elevator


### Hypothesis 4: Elevator in Small Houses (≤50m²)
**Hypothesis:** For small houses, those with an elevator have a significantly higher average price.
*   **H0 (Null):** For houses ≤50m², the mean price of those with an elevator is equal to those without.
*   **H1 (Alternative):** For houses ≤50m², the mean price of those with an elevator is greater.

In [8]:
# T-test Elevator: Size-Dependent Impact
# Convert Area to numeric and filter houses with area 0-50m²
houses['Area'] = pd.to_numeric(houses['Area'], errors='coerce')
houses_small = houses[houses['Area'] <= 50]

print(f"Total small houses (≤50m²): {len(houses_small)} properties")

# Separate small houses with and without elevator
small_houses_with_elevator = houses_small[houses_small['Elevator'] == True]['Price']
small_houses_without_elevator = houses_small[houses_small['Elevator'] == False]['Price']

print(f"Small houses with elevator: {len(small_houses_with_elevator)} properties")
print(f"Small houses without elevator: {len(small_houses_without_elevator)} properties")

# Calculate descriptive statistics
print(f"\nMean price of small houses with elevator: ${small_houses_with_elevator.mean():.2f}")
print(f"Mean price of small houses without elevator: ${small_houses_without_elevator.mean():.2f}")

# Perform independent t-test
# H0: Mean price of small houses with elevator = Mean price of small houses without elevator
# H1: Mean price of small houses with elevator > Mean price of small houses without elevator (one-tailed test)
t_stat_small_elevator, p_value_small_elevator = stats.ttest_ind(small_houses_with_elevator, small_houses_without_elevator, alternative='greater')

print(f"\nT-test results for small houses (≤50m²):")
print(f"T-statistic: {t_stat_small_elevator:.4f}")
print(f"P-value: {p_value_small_elevator:.6f}")

# Interpret results
if p_value_small_elevator < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: For small houses (≤50m²), houses with elevator have significantly higher average price than houses without elevator")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: For small houses (≤50m²), no significant difference in average price between houses with and without elevator")

Total small houses (≤50m²): 213 properties
Small houses with elevator: 83 properties
Small houses without elevator: 130 properties

Mean price of small houses with elevator: $49834.14
Mean price of small houses without elevator: $28904.87

T-test results for small houses (≤50m²):
T-statistic: 6.3352
P-value: 0.000000

Result: Reject H0 (p < 0.05)
Conclusion: For small houses (≤50m²), houses with elevator have significantly higher average price than houses without elevator


### Hypothesis 5: Parking in Medium Houses (81-120m²)
**Hypothesis:** For medium-sized houses, those with parking have a significantly higher average price.
*   **H0 (Null):** For houses 81-120m², the mean price of those with parking is equal to those without.
*   **H1 (Alternative):** For houses 81-120m², the mean price of those with parking is greater.


In [9]:
# T-test Parking: The Universal Value Driver
# Filter houses with area 81-120m² (medium-sized houses)
houses_medium = houses[(houses['Area'] >= 81) & (houses['Area'] <= 120)]

print(f"Total medium houses (81-120m²): {len(houses_medium)} properties")

# Separate medium houses with and without parking
medium_houses_with_parking = houses_medium[houses_medium['Parking'] == True]['Price']
medium_houses_without_parking = houses_medium[houses_medium['Parking'] == False]['Price']

print(f"Medium houses with parking: {len(medium_houses_with_parking)} properties")
print(f"Medium houses without parking: {len(medium_houses_without_parking)} properties")

# Calculate descriptive statistics
print(f"\nMean price of medium houses with parking: ${medium_houses_with_parking.mean():.2f}")
print(f"Mean price of medium houses without parking: ${medium_houses_without_parking.mean():.2f}")

# Perform independent t-test
# H0: Mean price of medium houses with parking = Mean price of medium houses without parking
# H1: Mean price of medium houses with parking > Mean price of medium houses without parking (one-tailed test)
t_stat_medium_parking, p_value_medium_parking = stats.ttest_ind(medium_houses_with_parking, medium_houses_without_parking, alternative='greater')

print(f"\nT-test results for medium houses (81-120m²):")
print(f"T-statistic: {t_stat_medium_parking:.4f}")
print(f"P-value: {p_value_medium_parking:.6f}")

# Interpret results
if p_value_medium_parking < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: For medium houses (81-120m²), houses with parking have significantly higher average price than houses without parking")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: For medium houses (81-120m²), no significant difference in average price between houses with and without parking")

Total medium houses (81-120m²): 1254 properties
Medium houses with parking: 1166 properties
Medium houses without parking: 88 properties

Mean price of medium houses with parking: $126553.47
Mean price of medium houses without parking: $61671.02

T-test results for medium houses (81-120m²):
T-statistic: 7.4008
P-value: 0.000000

Result: Reject H0 (p < 0.05)
Conclusion: For medium houses (81-120m²), houses with parking have significantly higher average price than houses without parking


### Hypothesis 6: Area Comparison
**Hypothesis:** Houses with an area less than 70m² have a significantly lower average price than houses in the 70-100m² range.
*   **H0 (Null):** Mean price of houses < 70m² = Mean price of houses 70-100m².
*   **H1 (Alternative):** Mean price of houses < 70m² < Mean price of houses 70-100m².

In [10]:
# T-test Area: Price comparison between different area groups
# Filter houses with area < 70m² and 70-100m²
houses_small_area = houses[houses['Area'] < 70]['Price']
houses_medium_area = houses[(houses['Area'] >= 70) & (houses['Area'] <= 100)]['Price']

print(f"Houses with area < 70m²: {len(houses_small_area)} properties")
print(f"Houses with area 70-100m²: {len(houses_medium_area)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses < 70m²: ${houses_small_area.mean():.2f}")
print(f"Mean price of houses 70-100m²: ${houses_medium_area.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses < 70m² = Mean price of houses 70-100m²
# H1: Mean price of houses < 70m² < Mean price of houses 70-100m² (one-tailed test)
t_stat_area, p_value_area = stats.ttest_ind(houses_small_area, houses_medium_area, alternative='less')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat_area:.4f}")
print(f"P-value: {p_value_area:.6f}")

# Interpret results
if p_value_area < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with area < 70m² have significantly lower average price than houses with area 70-100m²")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between the two area groups")

Houses with area < 70m²: 862 properties
Houses with area 70-100m²: 1236 properties

Mean price of houses < 70m²: $55982.66
Mean price of houses 70-100m²: $91051.87

T-test results:
T-statistic: -14.5178
P-value: 0.000000

Result: Reject H0 (p < 0.05)
Conclusion: Houses with area < 70m² have significantly lower average price than houses with area 70-100m²


### Hypothesis 7: Bedroom Count
**Hypothesis:** The number of bedrooms significantly affects the house price. We test several comparisons.

In [12]:
# T-test: Houses with 0 bedrooms vs houses with 1 bedroom
# Separate houses with 0 and 1 bedrooms
houses_0_rooms = houses[houses['Room'] == 0]['Price']
houses_1_room = houses[houses['Room'] == 1]['Price']

print(f"Houses with 0 bedrooms: {len(houses_0_rooms)} properties")
print(f"Houses with 1 bedroom: {len(houses_1_room)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses with 0 bedrooms: ${houses_0_rooms.mean():.2f}")
print(f"Mean price of houses with 1 bedroom: ${houses_1_room.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses with 0 bedrooms = Mean price of houses with 1 bedroom
# H1: Mean price of houses with 0 bedrooms < Mean price of houses with 1 bedroom (one-tailed test)
t_stat_0vs1_rooms, p_value_0vs1_rooms = stats.ttest_ind(houses_0_rooms, houses_1_room, alternative='less')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat_0vs1_rooms:.4f}")
print(f"P-value: {p_value_0vs1_rooms:.6f}")

# Interpret results
if p_value_0vs1_rooms < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with 0 bedrooms have significantly lower average price than houses with 1 bedroom")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between houses with 0 and 1 bedroom")

Houses with 0 bedrooms: 5 properties
Houses with 1 bedroom: 659 properties

Mean price of houses with 0 bedrooms: $15500.00
Mean price of houses with 1 bedroom: $57842.80

T-test results:
T-statistic: -1.7474
P-value: 0.040517

Result: Reject H0 (p < 0.05)
Conclusion: Houses with 0 bedrooms have significantly lower average price than houses with 1 bedroom


In [13]:
# T-test: Houses with 0 bedrooms vs houses with 5 bedrooms
# Separate houses with 0 and 5 bedrooms
houses_5_rooms = houses[houses['Room'] == 5]['Price']

print(f"Houses with 0 bedrooms: {len(houses_0_rooms)} properties")
print(f"Houses with 5 bedrooms: {len(houses_5_rooms)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses with 0 bedrooms: ${houses_0_rooms.mean():.2f}")
print(f"Mean price of houses with 5 bedrooms: ${houses_5_rooms.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses with 0 bedrooms = Mean price of houses with 5 bedrooms
# H1: Mean price of houses with 0 bedrooms < Mean price of houses with 5 bedrooms (one-tailed test)
t_stat_0vs5_rooms, p_value_0vs5_rooms = stats.ttest_ind(houses_0_rooms, houses_5_rooms, alternative='less')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat_0vs5_rooms:.4f}")
print(f"P-value: {p_value_0vs5_rooms:.6f}")

# Interpret results
if p_value_0vs5_rooms < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with 0 bedrooms have significantly lower average price than houses with 5 bedrooms")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between houses with 0 and 5 bedrooms")

Houses with 0 bedrooms: 5 properties
Houses with 5 bedrooms: 22 properties

Mean price of houses with 0 bedrooms: $15500.00
Mean price of houses with 5 bedrooms: $527553.03

T-test results:
T-statistic: -3.3461
P-value: 0.001296

Result: Reject H0 (p < 0.05)
Conclusion: Houses with 0 bedrooms have significantly lower average price than houses with 5 bedrooms


In [14]:
# T-test: Houses with 4 bedrooms vs houses with 5 bedrooms
# Separate houses with 4 and 5 bedrooms
houses_4_rooms = houses[houses['Room'] == 4]['Price']

print(f"Houses with 4 bedrooms: {len(houses_4_rooms)} properties")
print(f"Houses with 5 bedrooms: {len(houses_5_rooms)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses with 4 bedrooms: ${houses_4_rooms.mean():.2f}")
print(f"Mean price of houses with 5 bedrooms: ${houses_5_rooms.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses with 4 bedrooms = Mean price of houses with 5 bedrooms
# H1: Mean price of houses with 4 bedrooms < Mean price of houses with 5 bedrooms (one-tailed test)
t_stat_4vs5_rooms, p_value_4vs5_rooms = stats.ttest_ind(houses_4_rooms, houses_5_rooms, alternative='less')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat_4vs5_rooms:.4f}")
print(f"P-value: {p_value_4vs5_rooms:.6f}")

# Interpret results
if p_value_4vs5_rooms < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with 4 bedrooms have significantly lower average price than houses with 5 bedrooms")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between houses with 4 and 5 bedrooms")

Houses with 4 bedrooms: 60 properties
Houses with 5 bedrooms: 22 properties

Mean price of houses with 4 bedrooms: $651207.22
Mean price of houses with 5 bedrooms: $527553.03

T-test results:
T-statistic: 1.2586
P-value: 0.894075

Result: Fail to reject H0 (p >= 0.05)
Conclusion: No significant difference in average price between houses with 4 and 5 bedrooms


In [15]:
# T-test: Houses with 3 bedrooms vs houses with 5 bedrooms
# Separate houses with 3 and 5 bedrooms
houses_3_rooms = houses[houses['Room'] == 3]['Price']

print(f"Houses with 3 bedrooms: {len(houses_3_rooms)} properties")
print(f"Houses with 5 bedrooms: {len(houses_5_rooms)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses with 3 bedrooms: ${houses_3_rooms.mean():.2f}")
print(f"Mean price of houses with 5 bedrooms: ${houses_5_rooms.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses with 3 bedrooms = Mean price of houses with 5 bedrooms
# H1: Mean price of houses with 3 bedrooms < Mean price of houses with 5 bedrooms (one-tailed test)
t_stat_3vs5_rooms, p_value_3vs5_rooms = stats.ttest_ind(houses_3_rooms, houses_5_rooms, alternative='less')

print(f"\nT-test results:")
print(f"T-statistic: {t_stat_3vs5_rooms:.4f}")
print(f"P-value: {p_value_3vs5_rooms:.6f}")

# Interpret results
if p_value_3vs5_rooms < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: Houses with 3 bedrooms have significantly lower average price than houses with 5 bedrooms")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: No significant difference in average price between houses with 3 and 5 bedrooms")

Houses with 3 bedrooms: 720 properties
Houses with 5 bedrooms: 22 properties

Mean price of houses with 3 bedrooms: $344623.73
Mean price of houses with 5 bedrooms: $527553.03

T-test results:
T-statistic: -3.3548
P-value: 0.000417

Result: Reject H0 (p < 0.05)
Conclusion: Houses with 3 bedrooms have significantly lower average price than houses with 5 bedrooms


### Hypothesis 8: Amenity Impact in Large Houses (>200m²)
**Hypothesis:** For large houses, amenities like Parking, Warehouse, and Elevator continue to add significant value.

In [16]:
# T-test: Houses with area > 200m² - Parking impact analysis
# Filter houses with area > 200m²
houses_large_area = houses[houses['Area'] > 200]

print(f"Total houses with area > 200m²: {len(houses_large_area)} properties")

# Separate large area houses with and without parking
large_area_with_parking = houses_large_area[houses_large_area['Parking'] == True]['Price']
large_area_without_parking = houses_large_area[houses_large_area['Parking'] == False]['Price']

print(f"Houses > 200m² with parking: {len(large_area_with_parking)} properties")
print(f"Houses > 200m² without parking: {len(large_area_without_parking)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses > 200m² with parking: ${large_area_with_parking.mean():.2f}")
print(f"Mean price of houses > 200m² without parking: ${large_area_without_parking.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses > 200m² with parking = Mean price of houses > 200m² without parking
# H1: Mean price of houses > 200m² with parking > Mean price of houses > 200m² without parking (one-tailed test)
t_stat_large_area_parking, p_value_large_area_parking = stats.ttest_ind(large_area_with_parking, large_area_without_parking, alternative='greater')

print(f"\nT-test results for houses > 200m²:")
print(f"T-statistic: {t_stat_large_area_parking:.4f}")
print(f"P-value: {p_value_large_area_parking:.6f}")

# Interpret results
if p_value_large_area_parking < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: For houses > 200m², houses with parking have significantly higher average price than houses without parking")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: For houses > 200m², no significant difference in average price between houses with and without parking")

Total houses with area > 200m²: 158 properties
Houses > 200m² with parking: 151 properties
Houses > 200m² without parking: 7 properties

Mean price of houses > 200m² with parking: $688519.43
Mean price of houses > 200m² without parking: $467142.86

T-test results for houses > 200m²:
T-statistic: 1.5759
P-value: 0.058533

Result: Fail to reject H0 (p >= 0.05)
Conclusion: For houses > 200m², no significant difference in average price between houses with and without parking


In [17]:
# T-test: Houses with area > 200m² - Warehouse impact analysis
# Filter houses with area > 200m²
houses_large_area_warehouse = houses[houses['Area'] > 200]

print(f"Total houses with area > 200m²: {len(houses_large_area_warehouse)} properties")

# Separate large area houses with and without warehouse
large_area_with_warehouse = houses_large_area_warehouse[houses_large_area_warehouse['Warehouse'] == True]['Price']
large_area_without_warehouse = houses_large_area_warehouse[houses_large_area_warehouse['Warehouse'] == False]['Price']

print(f"Houses > 200m² with warehouse: {len(large_area_with_warehouse)} properties")
print(f"Houses > 200m² without warehouse: {len(large_area_without_warehouse)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses > 200m² with warehouse: ${large_area_with_warehouse.mean():.2f}")
print(f"Mean price of houses > 200m² without warehouse: ${large_area_without_warehouse.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses > 200m² with warehouse = Mean price of houses > 200m² without warehouse
# H1: Mean price of houses > 200m² with warehouse > Mean price of houses > 200m² without warehouse (one-tailed test)
t_stat_large_area_warehouse, p_value_large_area_warehouse = stats.ttest_ind(large_area_with_warehouse, large_area_without_warehouse, alternative='greater')

print(f"\nT-test results for houses > 200m²:")
print(f"T-statistic: {t_stat_large_area_warehouse:.4f}")
print(f"P-value: {p_value_large_area_warehouse:.6f}")

# Interpret results
if p_value_large_area_warehouse < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: For houses > 200m², houses with warehouse have significantly higher average price than houses without warehouse")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: For houses > 200m², no significant difference in average price between houses with and without warehouse")

Total houses with area > 200m²: 158 properties
Houses > 200m² with warehouse: 146 properties
Houses > 200m² without warehouse: 12 properties

Mean price of houses > 200m² with warehouse: $705329.45
Mean price of houses > 200m² without warehouse: $354861.11

T-test results for houses > 200m²:
T-statistic: 3.2959
P-value: 0.000607

Result: Reject H0 (p < 0.05)
Conclusion: For houses > 200m², houses with warehouse have significantly higher average price than houses without warehouse


In [18]:
# T-test: Houses with area > 200m² - Elevator impact analysis
# Separate large area houses with and without elevator
large_area_with_elevator = houses_large_area[houses_large_area['Elevator'] == True]['Price']
large_area_without_elevator = houses_large_area[houses_large_area['Elevator'] == False]['Price']

print(f"Houses > 200m² with elevator: {len(large_area_with_elevator)} properties")
print(f"Houses > 200m² without elevator: {len(large_area_without_elevator)} properties")

# Calculate descriptive statistics
print(f"\nMean price of houses > 200m² with elevator: ${large_area_with_elevator.mean():.2f}")
print(f"Mean price of houses > 200m² without elevator: ${large_area_without_elevator.mean():.2f}")

# Perform independent t-test
# H0: Mean price of houses > 200m² with elevator = Mean price of houses > 200m² without elevator
# H1: Mean price of houses > 200m² with elevator > Mean price of houses > 200m² without elevator (one-tailed test)
t_stat_large_area_elevator, p_value_large_area_elevator = stats.ttest_ind(large_area_with_elevator, large_area_without_elevator, alternative='greater')

print(f"\nT-test results for houses > 200m²:")
print(f"T-statistic: {t_stat_large_area_elevator:.4f}")
print(f"P-value: {p_value_large_area_elevator:.6f}")

# Interpret results
if p_value_large_area_elevator < alpha:
    print(f"\nResult: Reject H0 (p < {alpha})")
    print("Conclusion: For houses > 200m², houses with elevator have significantly higher average price than houses without elevator")
else:
    print(f"\nResult: Fail to reject H0 (p >= {alpha})")
    print("Conclusion: For houses > 200m², no significant difference in average price between houses with and without elevator")

Houses > 200m² with elevator: 109 properties
Houses > 200m² without elevator: 49 properties

Mean price of houses > 200m² with elevator: $799783.79
Mean price of houses > 200m² without elevator: $409387.75

T-test results for houses > 200m²:
T-statistic: 7.1391
P-value: 0.000000

Result: Reject H0 (p < 0.05)
Conclusion: For houses > 200m², houses with elevator have significantly higher average price than houses without elevator


## 3. Conclusion

Based on the series of t-tests conducted, we can draw several key conclusions about the factors influencing house prices in Tehran:

1.  **Core Amenities are Crucial:** The presence of a **Parking**, **Warehouse**, or **Elevator** consistently and significantly increases the average price of a house. This suggests these are highly valued features in the Tehran real estate market.

2.  **Size Matters:**
    *   **Area:** Larger houses (70-100m²) are significantly more expensive than smaller ones (<70m²), confirming that square meterage is a fundamental driver of price.
    *   **Bedrooms:** An increase in the number of bedrooms generally leads to a higher price. Houses with 1, 3, 4, or 5 bedrooms are all significantly more expensive than studio apartments (0 bedrooms). The jump from 4 to 5 bedrooms also shows a significant price increase.

3.  **Contextual Importance:** The value of an amenity can be context-dependent. For example, an **elevator** provides a significant price premium even in **small houses (≤50m²)**, where one might not assume it's a standard feature. Similarly, **parking** remains a significant value driver even for **medium (81-120m²)** and **large (>200m²)** houses.

In summary, the most valuable properties in this dataset are large, have multiple bedrooms, and are equipped with essential amenities like parking, a warehouse, and an elevator.