# *Define your purpose (Hypothesis)*
- Housing construction in Australia is not keeping pace with population growth.

# *Requirements Outline*
## <span style="color:red;">Functional requirements</span>
- Supports loading from common formats: .csv, .xlsx, .json

-  Validates file format and schema before import

- Provides descriptive error messages for: Missing files, Unsupported or incorrect formats, Header mismatches

- Identifies and handles missing values (e.g. replace, drop, impute)

- Allows: Filtering based on conditions (e.g. date, category), Sorting by column values, Grouping data for aggregation or segmentation

- Provides statistical functions: Descriptive stats (mean, median, mode, std deviation), Count and frequency distributions, Correlation analysis (e.g. Pearson, Spearman)

- Interactive display of data using: Pandas DataFrames for quick inspection, Matplotlib for: Line, bar, pie, scatter, box plots

- Configurable axes, titles, legends, color palettes

- Generates: Summary reports with key metrics, Export of cleaned/processed dataset in .csv or .txt

- Enables saving data to local or cloud storage

## <span style="color:blue;">Non-Functional requirements</span>
- User Interface: Clean, intuitive dashboard, Toggle switches and dropdowns for filtering and chart selection

- README documentation: Clear setup instructions, Examples of typical workflows, Explanation of key features and limitations

- Error Handling: Displays informative messages with suggestions for resolution, Logs errors to a file or console for debugging

- Data Integrity: Validates input data structure before processing, Keeps backup of original dataset, Tracks version history for modifications

# *Use Case*
## Preconditions:
- The dataset has been preloaded into the system by an administrator or via automated ingestion.
- The user has access to the system via a terminal or command-line interface.
- The system has successfully validated and cleaned the dataset.

## Main Flow:
Welcome to the Hypothesis Testing System
Please choose an option:
1. View summary statistics
2. Visualize data
3. Filter or search data
4. Update a data entry
5. Export report
6. Exit

Option 1: View summary statistics
- System displays mean, median, mode, and standard deviation for selected columns.

Option 2: Visualize data
- User chooses chart type (e.g. bar, scatter, line) and variables.
- System renders the chart using Matplotlib and displays it.

Option 3: Filter or search data
- User inputs filter criteria (e.g. age > 30, region = 'NSW')
- System displays matching records in tabular format.

Option 4: Update a data entry
- User specifies row ID and column to update.
- System confirms the change and saves it.

Option 5: Export report
- System generates a .csv file with current data and saves it locally.

## Postconditions:
User has successfully viewed, analyzed, or modified the dataset.

Any valid updates are saved to the system and logged.

The dataset remains accessible for future queries, visualizations, or exports.

# *Research your chosen issue*
- https://australianpropertyupdate.com.au/apu/grim-housing-shortfall-numbers-revealed
- https://www.realestate.com.au/insights/new-housing-supply-falls-short-by-62000-homes-in-2024/
- https://www.ahuri.edu.au/sites/default/files/documents/2021-10/AHURI-Final-Report-365-Population-growth-and-mobility-in-Australia.pdf
- https://population.org.au/wp-content/uploads/2023/04/housing-crisis-and-population-briefing-note-final1.pdf
- https://ipa.org.au/publications-ipa/media-releases/as-migration-surges-to-record-levels-housing-construction-collapses

# *Discuss the findings*
Housing and construction data in Australia is comprehensively tracked through sources like the ABS Building Activity report, CSIRO’s Australian Housing Data portal, and the AIHW Housing Data Dashboard. These platforms collectively monitor trends in dwelling commencements, sustainability performance, and housing affordability. For instance, the ABS shows fluctuations in construction activity, while CSIRO tracks energy ratings of new homes, and AIHW presents data on rental stress and ownership. Imagine these tools as different lenses on the same landscape one shows how fast homes are being built, another how green they are, and the third how accessible they are to Australians. Together, they offer a multi dimensional view of the housing sector.

# *Aquire your data*
- https://www.abs.gov.au/statistics/industry/building-and-construction/building-activity-australia/latest-release
- https://ahd.csiro.au/
- https://www.housingdata.gov.au/

# *Planning*
## Data Dictionary – Total Dwellings Commenced

| Field Name             | Description                                                                                               | Data Type | Format / Example | Notes |
|------------------------|-----------------------------------------------------------------------------------------------------------|-----------|------------------|-------|
| **Period**             | Calendar quarter in which dwelling construction commenced, expressed as a year–quarter label             | String    | `Mar-17`         | Months are `Mar`, `Jun`, `Sep`, `Dec` followed by two‑digit year. Data runs from Mar‑2017 to Mar‑2025. |
| **Trend**              | Estimated number of dwellings commenced in that quarter, adjusted for long‑term trend patterns           | Integer   | `54,463`         | Figures are counts of dwellings. Useful for spotting structural shifts without seasonal variation. |
| **Seasonally adjusted**| Estimated number of dwellings commenced in that quarter, adjusted to remove regular seasonal fluctuations | Integer   | `52,066`         | Makes it easier to compare across different quarters within a year. |
| **Source** *(metadata)*| Origin of the dataset                                                                                     | String    | `Australian Bureau of Statistics, Building Activity, Australia March 2025` | Dataset‑level metadata, not a per‑row field. |


# *Test your analysis*
After testing algorithms, I can ensure they provide accurate results.

# *Analyse and conclude*
Over the last few years, the number of new houses being built in Australia has gone up and down a lot, but when you look at the bigger picture, it’s not keeping up with how fast the population is growing. The seasonally adjusted numbers show the short‑term changes, while the trend line shows the longer‑term pattern, and both make it clear there are big peaks and drops in building activity. Even when we hit a high point in construction, it doesn’t last long enough to meet the constant demand from more people needing homes. This suggests that our housing supply isn’t matching population growth, which could lead to problems like higher house prices and more pressure on the rental market, and it’s worth looking into things like government policies, interest rates, and population data to see what’s making it hard for construction to keep up. This proves my hypothesis to be correct.

# *Peer Evaluation*
* Oscar's repo

| Plus ✅  | Minus ❌  | Implication 💡  |
|--------------------------|------------------------------|--------------------------------------------------|
| Data is clearly organised with proper headings. | Some data entries are missing or have inconsistent formats. | Missing/inconsistent data could make analysis less accurate, so cleaning the dataset should be a first step before more work. |
| Calculations match expected results when checked manually. | Only 1 chart is presented. | More charts with different values could be shown to help better understand the data. |
| Outputs include both summary tables and visualisations, giving a clear picture. | Could be a bit more advanced with more data to show. | More info could be shown for a better understanding.

# *Evaluate your project*
The system works well because it can load the housing construction data, clean it up, do the maths to find patterns, and make easy‑to‑read graphs. It matches what I planned in my requirements outline and the numbers are right, based on a reliable source like the ABS. Feedback says the data is well organised and the graphs are useful, but we could make the labels clearer and explain the results more so people understand them better. The project was done in a logical order and runs smoothly, but could still make small improvements to make it even easier to use.

The data itself is accurate and comes from a trusted source, so there’s not much bias, but it will get old if not updated. It’s also safe to use because there’s no private information in it, but the original file should be kept safe so it’s not changed by mistake. The menu system is simple and works in any terminal, but it could've been better for the overall user experience by adding colour, clearer charts, and maybe more instructions for people who aren’t used to data analysis. To conclude, it’s a solid project and does what I wanted, but there’s room to make it easier to read and keep it up to date.