In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv('/content/sample_data/property.csv')

In [3]:
df.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.7996,144.9984,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.8079,144.9934,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.8093,144.9944,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.7969,144.9969,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.8072,144.9941,Northern Metropolitan,4019.0


In [4]:
altona_properties = df[df['Suburb'] == 'Altona']
altona_prices = altona_properties['Price'].dropna()

print(f"Number of properties in Altona: {len(altona_prices)}")
print(f"Mean price in Altona: ${altona_prices.mean():,.2f}")
print(f"Standard deviation of prices in Altona: ${altona_prices.std():,.2f}")

Number of properties in Altona: 74
Mean price in Altona: $834,830.41
Standard deviation of prices in Altona: $291,546.05


Now, let's perform a one-sample t-test to check if the mean property price in Altona is significantly different from $800,000.

#Null Hypothesis = We will assume the null hypothesis that the mean price is less than equal to ```$800,000```

#Alternate Hypothesis = The alternative hypothesis that it is greater than ```$800,000``` (one-tailed test).

#Significance = 5% "We'll use a significance level of 5% (alpha = 0.05)."

## Null Hypothesis

$$ H_0: \text{[The mean price is less than equal to \$800,000 ]} $$

## Alternate Hypothesis
$$ H_1: \text{[The mean price is greater than \$800,000 ]} $$


##Formula is
$$ z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} $$


$$ \bar{x} =  834830.41$$

$$ \mu = 800000$$

$$ \sigma = 291546.05 $$

$$ s = 74 $$

## Calculate T-statistic Manually

### Subtask:
Calculate the t-statistic for the one-sample t-test manually using the formula: (sample_mean - hypothesized_mean) / (sample_standard_deviation / sqrt(sample_size)).


**Reasoning**:
To manually calculate the t-statistic, I will first calculate the sample mean, sample standard deviation, and sample size from the `altona_prices` data. Then, I will define the hypothesized mean, calculate the standard error of the mean (SEM), and finally compute the t-statistic using the given formula.



**Reasoning**:
The previous code failed because `altona_prices` was not defined. This variable needs to be created by filtering the `df` DataFrame for properties in 'Altona' and then dropping any missing price values before proceeding with the t-statistic calculation.



In [8]:
altona_properties = df[df['Suburb'] == 'Altona']
altona_prices = altona_properties['Price'].dropna()

sample_mean = altona_prices.mean()
sample_std = altona_prices.std()
sample_size = len(altona_prices)
hypothesized_mean = 800000

SEM = sample_std / np.sqrt(sample_size)
t_statistic_manual = (sample_mean - hypothesized_mean) / SEM

print(f"Manually calculated T-statistic: {t_statistic_manual:.4f}")

Manually calculated T-statistic: 1.0277


##Now that the t-statistic has been manually calculated, the next step is to determine the critical t-value for a one-tailed test at a 5% significance level. This involves calculating the degrees of freedom and then using `scipy.stats.t.ppf` to find the critical value.

## Compare T-statistic with Critical T-value and Draw Conclusion

Compare the `t_statistic_manual` with the `critical_t_value` and determine whether to reject or fail to reject the null hypothesis. Summarize the conclusion.

#### Instructions:
1. Compare the calculated `t_statistic_manual` with the `critical_t_value`.
2. Print a statement indicating whether the null hypothesis is rejected or not, based on the comparison.
3. State the conclusion about the property prices in Altona.

In [9]:
from scipy.stats import t

degrees_freedom = sample_size - 1
alpha = 0.05

critical_t_value = t.ppf(1 - alpha, df=degrees_freedom)

print(f"Degrees of Freedom: {degrees_freedom}")
print(f"Critical T-value (one-tailed, alpha=0.05): {critical_t_value:.4f}")

print(f"\nComparing T-statistic ({t_statistic_manual:.4f}) with Critical T-value ({critical_t_value:.4f})")

if t_statistic_manual > critical_t_value:
    print("Conclusion: Reject the null hypothesis. There is sufficient evidence to suggest that the typical property price in Altona is greater than $800,000.")
else:
    print("Conclusion: Fail to reject the null hypothesis. There is not sufficient evidence to suggest that the typical property price in Altona is greater than $800,000.")


Degrees of Freedom: 73
Critical T-value (one-tailed, alpha=0.05): 1.6660

Comparing T-statistic (1.0277) with Critical T-value (1.6660)
Conclusion: Fail to reject the null hypothesis. There is not sufficient evidence to suggest that the typical property price in Altona is greater than $800,000.


## Summary:

The findings indicate that the calculated T-statistic is 1.0277, which is less than the critical T-value of 1.6660 for a one-tailed test at a 5% significance level with 73 degrees of freedom. Therefore, we fail to reject the null hypothesis, concluding that there is insufficient statistical evidence to suggest that the mean property price in Altona is significantly greater than \$800,000.

### Data Analysis Key Findings
*   The manually calculated T-statistic for Altona property prices was determined to be 1.0277.
*   With a sample size of 74, the degrees of freedom for the t-test were 73.
*   For a one-tailed test at a 5% significance level ($\alpha=0.05$), the critical T-value was found to be 1.6660.
*   The calculated T-statistic (1.0277) is less than the critical T-value (1.6660).
*   Based on this comparison, the null hypothesis (mean property price in Altona $\le$ \$800,000) was not rejected, indicating insufficient evidence to conclude that the mean price is significantly greater than \$800,000.
