
# Design Brief and Hypothesis

## Design Brief 
### 'How does population density affect public transport usage and car ownership in Australian suburbs?'

To conduct this research, I will be utilising both online and real world examples to understand how density impacts urban planning decisions. This will involve interviewing friends and family who live in specific suburbs in Sydney and the central coast, as well as researching densities, car ownership rates, and even public transport patronage in other suburbs around Australia in cities such as Melbourne, Brisbane and perth. 

## Hypothesis
population density will positively impact public tranpsort usage and cause a reduction in car ownership in the suburbs of Australia by creating more viable options for transport for an increased number of people. Transit oriented development can be utilised to minimise gridlock by centering urban developments around public transport hubs for accessibility.

## Requirements Outline

### Functional Requirements 
- Data loading: Should have the ability to open different files and manage errors within files that are unable to be opened by the user. It should allow for a seamless working of the system and no systematic faults that could cause issues to the system flow.

- Data cleaning: The system will require filtering of information to allow for a more seamless way to understand the given information.
For example, when displaying train patronage charts from NSW train stations, the list will be organised to be able filter out stations with limited data, as well as stations with too low patronage or even clear outliers.

- Data analysis: In order to have clear informational requirements, the mode, median and average will be utilised throughout the program.
This could be used, for example, in a document with everyone's sentiment towards public transport. The average scores could be used to
display information about the general sentiment in certain areas. The median scores could show the most common response, and the user should be able to interact
 with the function to go into
further detail on the topic.

- Data visualisation: to visualise data, form graphs or create interactive environmnents for the user, I will be utilising matplotlib chart types for their ease of use and general aesthetic qualities that put it above pandas and GUIs.

- Data reporting: The program will fetch information from a range of text types such as Txts, md files, and csv files. To output information, most of it will be done within the interface using matplotlib charts. Furthermore, it is possible to link the programming to other files, such as the csv and text files.

### Non-Functional Requirements
- Usability: The program should have a user friendly interface that is easy to use and still interactive. Furthermore, it should be able to handle errors from the user.

- Reliability: The program should be as reliable as possible, fetching the right information from the right files, organising the information correctly for the user to see
and even filtering out possible outliers within the research side of the project.

### Use case:
- Actor:

- Goal: user looking for informatin about the topic, should be able to interact with the UI and should be able to manipulate datasets based on their liking.

#### Preconditions:
- The dataset has already been loaded in by the administrator and has been fully rendered. Furthermore, the User is able to run the program with all files fully active.
- The user has access to the system interface and is able to run the program with no issues regarding the programming.

### Main Flow:
1. User opens interface
2. User is given information about thesis question
3. User is able to continue, and view data
 - Each dataset should be able to be manipulated to the user's liking
4. After data is shown, a conclusion is made
5. Thesis question is answered

#### Post conditions:
- User has interacted with the program adequately, and and has viewed the datasets to full extent
- User understands all information and research that has been displayed
- Data remains available in case of extra queries or use case scenarios


# Phase 2: Research and planning

## Research:
- https://data.nsw.gov.au/data/dataset/?tags=train (Train patronage information NSW)
- https://www.vic.gov.au/transport-patronage (Train patronage information VIC)
- https://profile.id.com.au/sydney/car-ownership (Car ownership rates in sydney)
- https://www.sciencedirect.com/science/article/pii/S0739885920301554 (Connection of public transport to car ownership)

## SEE-I paragraph:

- Train patronage directly impacts car use in Australian suburbs, depending on how efficient and reliable the public transport is.
This means that in Australian suburbs that are served well by buses, trains, light rail and other forms of transit, perform generally better in car ownership, with lower rates
This can be seen for example in the two suburbs 'Chatswood' and 'Edmonson Park'. These two sydney suburbs are polar opposites in terms of price and public transport options.
Despite chatswood being a far more premium option, the car ownership rates are much lower at 74.8. This is due to its train station's high patronage and well served lines, with the station getting approximately 15 million entries in 2024-2025. In contrast, despite Edmonson park being a cheaper offering, its car ownership rate is at 91.1, mostly due to the lack of public transport options apart from some infrequent bus services. This can be explained by showing that people that would usually use cars, when given a viable alternative, would be able to switch to public transport with little inconvenience, as long as it is efficient, cheap and frequent enough to appeal to the masses. This shows the importance of good public transit and its ability to reduce car ownership rates in any Australian suburb.



## Data dictinaries:
### 'NSW_Train_patronage_per_station.csv'
|Field|Datatype|Format for Display|Description|Example|Validation|
|---|---|---|---|---|---|
|_id|integer|NNNN|Identification|27|Must be a number|
|station|String|X|Station names|Barangaroo Station|Must be a string ending with station|
|Entry|integer|NNNNNNN|Number of entries to station|Town Hall Station: 1768424|must be a number|
|Exit|integer|NNNNNNN|Number of station exits|Chatswood Station: 1529846|must be a number|
|Total|integer|NNNNNNN|Total station patronage|Circular Quay: 1327691|must add entry and exit|

In [None]:


def thesis_question():
    print('\n === Main Thesis question: ===')
    print('"How does population density affect public transport usage and car ownership in Australian suburbs?"')
    print('\n This thesis statement shows...')
    while True:
        choice = input('press 1 to move onto dataset list and press 2 to exit')
        if choice == 1:
            print('going to dataset list...')
            break
        elif choice == 2:
            Title_Screen()
            break
        else:
            print('error, press either one or 2')



def Title_Screen():
    while True:
        print('\n === Main Menu === ')
        print('1. View thesis question')
        print('2. View dataset list')
        print('3. Exit')

        choice = input('Choose between 1, 2 and 3 to choose next destination')
        if choice == '1':
            thesis_question()

    







hi


This was my initial bit of code. It dictates the UI System and is able to display the main menu. However, I was having minor issues with the interface as I was not able to create a way to utilise the exit function. However, I was initially able to make it work.

My main issue afterwards was to be able to manipulate dataframes using pandas. Due to my lack of experience with pandas, I had many issues with this, although eventually I got it to work after doing research on how I could strip columns, get rid of rows missing information and even find out how to allow users to add information to datasets

In [4]:
import pandas as pd
NSW_train_patronage = pd.read_csv('NSW_Train_patronage_per_station.csv')
VIC_train_patronage = pd.read_csv('Victoria_Train_patronage_per_station.csv')

Modified_NSW_train_patronage = NSW_train_patronage.str.strip('_id')

def NSW_patronage():
    options = input('choose 1 if you would like to look at NSW train patronage. Press anything else to exit')
    if options == '1':
        print(Modified_NSW_train_patronage)
    else:
        print('Going back...')

NSW_patronage()


AttributeError: 'DataFrame' object has no attribute 'str'

Here is my first attempt at manipulating the NSW train patronage. Unfortunately, as of documenting, I am not finished in the process of creating code to manipulate the information.
In order to improve this coding, I need to be able to strip the ID number of each station, Get rid of repeats of the stations and pick just one day of each month for each station to simplify the dataset for a greater viewer experience.

In [None]:
def dataset_home():
    print('\n === This is the dataset homepage: ===')
    print('\n Here, you can use and manipulate the listed datasets')
    print('\n1. train patronage in NSW suburbs and their densities')
    print('2. train patronage in VIC suburbs and their densities')
    print('3. Car ownership in the city of sydney')
    while True:
        data_choice = input('Choose between 1 and 2 to choose dataset')
        if data_choice == '1':
            print('You are viewing NSW train patronage')
            break
        elif data_choice == '2':
            print('You are viewing VIC patronage')
            break
        elif data_choice == '3':
            print('You are now looking at city of sydney car ownership')
            break
        else:
            print('error, press either one, two or three')

This was my dataset home, which is still a work in progress. However, I now understand how to utilise a text based UI for great viewer comfort. Furthermore, I know how to create loop systems and work arounds in case of any errors from the viewer's perspective.

In [None]:
#!/usr/bin/env python3
"""
NSW Train Patronage CLI
-----------------------
Interactive text-based UI to process and explore NSW Train patronage data.

Features:
- Clean and process data (Trip → numeric, pivot Entry/Exit, sum totals).
- Filter by month (default: Dec-24).
- Drop stations under a chosen patronage threshold (default: 200).
- Sort stations (busiest → quietest by default, toggleable).
- Options to view top/bottom stations, outliers, busiest station, etc.
- Add new rows of data via prompts (column by column).
- Save the processed dataset to CSV.
"""

import pandas as pd
import os
import sys

# File paths (adjust if needed)
CSV_PATH = os.path.join(os.path.dirname(__file__), "NSW_Train_patronage_per_station.csv")
PROCESSED_OUT = os.path.join(os.path.dirname(__file__), "processed_patronage_dec24.csv")


def clean_trip_value(x):
    """Convert Trip string values to numeric. Treat 'Less than 50' as 25."""
    s = str(x).strip().replace(",", "")
    if s.lower().startswith("less than"):
        # e.g. 'Less than 50' → midpoint = 25
        for p in s.split()[::-1]:
            if p.isdigit():
                return int(int(p) // 2)
        return 25
    try:
        return int(float(s))
    except:
        return 0


def process_patronage(df, month="Dec-24", min_total=200, ascending=False):
    """
    Process the raw dataframe:
      - keep only rows for the requested month
      - convert Trip to numeric
      - pivot Entry/Exit into columns then sum totals per Station
      - drop stations with Total < min_total
      - return dataframe sorted by Total
    """
    d = df.copy()
    month_col = "MonthYear" if "MonthYear" in d.columns else "Month"
    d = d[d[month_col] == month].copy()

    if d.empty:
        raise ValueError(f"No rows found for month '{month}'.")

    d["Trip_num"] = d["Trip"].apply(clean_trip_value).astype("Int64")

    if "Entry_Exit" in d.columns:
        pivot = d.pivot_table(
            index="Station",
            columns="Entry_Exit",
            values="Trip_num",
            aggfunc="sum",
            fill_value=0,
        )
        for c in ["Entry", "Exit"]:
            if c not in pivot.columns:
                pivot[c] = 0
        pivot = pivot.reset_index().rename_axis(None, axis=1)
        pivot["Total"] = pivot["Entry"] + pivot["Exit"]
    else:
        pivot = d.groupby("Station", as_index=False).agg(Total=("Trip_num", "sum"))
        pivot["Entry"] = pd.NA
        pivot["Exit"] = pd.NA

    pivot = pivot[pivot["Total"] >= min_total].copy()
    pivot.sort_values("Total", ascending=ascending, inplace=True)
    pivot.reset_index(drop=True, inplace=True)
    return pivot


def list_outliers(df):
    """Find outliers using mean ± 2*std."""
    mean = df["Total"].mean()
    std = df["Total"].std()
    high = df[df["Total"] > mean + 2 * std]
    low = df[df["Total"] < max(0, mean - 2 * std)]
    return low, high


def main():
    if not os.path.exists(CSV_PATH):
        print("CSV file not found at", CSV_PATH)
        sys.exit(1)

    df = pd.read_csv(CSV_PATH)

    sort_desc = True
    month = "Dec-24"
    min_total = 200
    processed = process_patronage(df, month=month, min_total=min_total, ascending=not sort_desc)

    while True:
        print("\n=== NSW Patronage CLI ===")
        print("1) Show top 10 stations")
        print("2) Show bottom 10 stations")
        print(f"3) Toggle sort order (currently {'descending' if sort_desc else 'ascending'})")
        print("4) Add a row of data (Entry or Exit record)")
        print(f"5) Save processed CSV to {PROCESSED_OUT}")
        print("0) Exit")

        choice = input("Choose an option: ").strip()

        if choice == "1":
            print(processed.head(10).to_string(index=False))
        elif choice == "2":
            print(processed.tail(10).to_string(index=False))
        elif choice == "3":
            sort_desc = not sort_desc
            processed = process_patronage(df, month=month, min_total=min_total, ascending=not sort_desc)
            print("Sort order now", "descending" if sort_desc else "ascending")
       
        elif choice == "4":
            # Add one new row interactively
            station = input("Station name: ").strip()
            month_in = input("MonthYear (e.g., Dec-24): ").strip() or month
            ee = input("Entry or Exit (Entry/Exit): ").strip().title()
            trip = input("Trip value (number or 'Less than 50'): ").strip()

            new = {
                "MonthYear": month_in,
                "Station": station,
                "Entry_Exit": ee,
                "Trip": trip,
            }
            # Fill missing original columns if needed
            for c in df.columns:
                if c not in new:
                    new[c] = None
            df = pd.concat([df, pd.DataFrame([new])], ignore_index=True)
            processed = process_patronage(df, month=month, min_total=min_total, ascending=not sort_desc)
            print("Row added. Station totals updated.")
       
        elif choice == "5":
            processed.to_csv(PROCESSED_OUT, index=False)
            print("Saved processed CSV to", PROCESSED_OUT)
        elif choice == "0":
            print("going back to home...")
            break
        else:
            print("Invalid option. Try again.")


if __name__ == "__main__":
    main()


With the help of my parents and AI, I was able to create a text based UI that could manipulate the data to a helpful point. This is not the final product, however, it really aided in my capabilities as a programmer, as I was able to use AI to understand  how the programming worked. For example, I learned that the spreadsheet could be sorted in ascending or descending order, showing the use case of such an interface.