## Thesis Question: "Suburbs with higher density yield better public transport usage"

This is my thesis question, and it outlines my hypothesis, which states that Australian suburbs, such as Hornsby, Woy Woy, Sydney CBD and Epping, will be able to achieve better public transport patronage with train and bus stations with higher density

## Requirements Outline:
### Functional requirements: 
- Data Loading: In order for my program to load certain file types and handle file errors, I will create a program that is able to load it based on the file extension (eg. txt, csv, json). This will allow my program to load data in formats that match those that it is programmed for.

- Data Cleaning: In order to clean my datasets, I will be using pandas. To identify missing values, the function 'df.isnull().sum()' can be used. I can also drop columns with 'df.dropna', which will allow me to cut down on unnecessary columns with too many missing values. I can also identify duplicates with the df.duplicate() function, and these too can be cut down on with some code.

- Data Analysis: My analysis will incorporate the mean patronage of train suburbs in Australia. This can be done by collecting and totalling all monthly reports for train patronage, and dividing to find averages. This can then be linked to average suburb densities. This can be done with 'Column1 + column2 .... ColumnX/X. I will not be using median or mode in my data analysis, but may focus and max and min functions.

- Data Visualisation: I plan on using matplotlib to plot the comparisons between density and train patronage in Australian suburbs. This will allow me to create a visual representation of the patterns, and come to an eventual conclusion on my thesis statement. Functions I might be using in matplotlib include 'matplotlib.pyplot.plot(), which will allow me to create the comparing lines in the program. Furthermore, by using x and y coordinates, I can create a general curve to outline the relation to the density.

- Data Reporting: The output given by the system will incorporate the UI interface given with matplotlib. It will also utilise the text based interface on VScode. However, with my datasets, I will use the extensions txt and csv. These will mostly be for my density and train patronage charts.

### Non Functional Requirements:
- Usability: the README document requires the project title, description, table of contents, technologies used, and more. It is a way for the user to get a basic understanding of the purpose of repository. It is useful to give a basic idea of what the user's role in the system is, and how to interpret the repository. Furthermore, in the programming aspect, It is vital to create an easy to use UI that has clarity and consistency.

- Reliability: To ensure that no errors occur, they program will have to contain 'Else:' statements that make sure that if an error occurs from the User's side, they are redirected to start again or notified of the error. By using If-Else statements, a simple system can be created to account for all possible instances, minimising error and inconsistency in the UI.


## Use Case:

Actor: User

Goal: To access, interact and manipulate given data, as well as get an understanding and come to an opinion about the thesis statement.

Preconditions:
- Datasets have been loaded into the program, and any required intallations of extensions has been completed
- The user has access to the system interface

Main Flow:
1. User opens the program
2. UI opens with a home screen in which the thesis statement and hypothesis is outlined
3. A screen is given with options on what dataset and what medium (eg. graph, text based etc) they would like to view it in.
    a) a table
    b) a graph
    c) text based
4. System displays chosen form
5. System gives option to exit or to continue
6. If continuing, system provides option menu again
    a) a table
    b) a graph
    c) text based
7. If exiting: end program

### Post conditions:
- viewer has interacted with system
- viewer has gained an opinion on the accuracy of thesis statement
- viewer has gained information from the datasheets
- Valid updates have been saved to the system
- Data remains available and updated for additional analysis (if required)


## Phase 2: Research

### Research:
- https://data.nsw.gov.au/data/dataset/?tags=train (Train patronage information NSW): This csv spreadsheet shows the train patronage in NSW suburbs, also showing how some stations may be busier than others. It shows the monthly train patronage (exit and entry), and also shows the train_ID (which may have to be dropped as a column altogether)

- https://www.vic.gov.au/transport-patronage (Train patronage information VIC): This is similar to NSW train patronage, although more work will be done on dropping columns, getting rid of missing values and loading the csv file in the UI as a '.csv'.

- https://en.wikipedia.org/wiki/Urbanization_in_Australia (Wikipedia urbanisation in Australia) This document shows the rapid urbanisation in Australia, also showing the densest suburbs in Australia using the metric 'people/sq km'. By creating a csv myself with this information, I can convert all the given values into a matplotlib graph or chart to compare the data with my train patronage. (I will have to use functions to only add suburbs with train stations to the list as many of the suburbs have no clear transit stops)

# Chosen Issue:
- Suburbs in Australia are sprawling, and are not very dense on average. In fact, on a recent survey done by the ABS on population, Sydney was shown as having a population density of under 400 people/sq km. The recommended population density for a sustainable urban area, being around 10 000 people/sq km is orders of magnitude larger than Sydney's. Furthermore, as marked by the ABS, Sydney's area in sq km is over 12 000 km squared, making it one of the most sprawling cities in the world.
- Having found that the large majority of Sydney-Siders are now forced to live west, several kilometres from the CBD where the vast majority of jobs are located, Public transport and general transport infrastructure is getting harder and harder to upgrade as more of the city moves away from the urban core.
- My program seeks to show information using graphs and charts on how this shift in population is resulting in lower population densities, therefore resulting in lower transit patronage and ridership. Of course, I have researched external factors and have tried to come to a conclusion as accurate as possible by getting rid of external factors such as businesses, TOD (transit oriented developement), and interchanges.


#### Secondary Sources:
- https://www.abs.gov.au/statistics/people/population/regional-population/latest-release
- https://thepropertytribune.com.au/market-insights/how-population-density-is-reshaping-australian-cities/
- https://www.thenewdaily.com.au/finance/property/2018/10/04/apartment-boom-high-density-suburbs


## SEE - I paragraph:
- density directly impacts transit use in Australian suburbs, assuming similar general population, car ownership rates and interchanges.
This means that in Australian suburbs that are denser than their more suburban counterparts general perform better with transit ridership in the form of trains, metro, light rail and bus.
This can be seen for example in the two suburbs 'Chatswood' and 'Edmonson Park'. These two sydney suburbs are polar opposites in terms of price and public transport options. With a density of approximately 2900 people/sq km, Edmonson park is denser than Sydney's average, but still far less than the recommended 10 000 people/sq km for an urban area. With only 1.9 million station entries in 2023, Edmonson park was only the 65th busiest station on the Sydney network. In comparison, Chatswood, with a population density of 14 800 people/sqkm, just over the recommended, the transit ridership is significantly higher with almsot 15 million entries in the same time period. This shows the importance of density when near good public transit, and its ability to reduce car ownership in Australian suburbs and improve general urban planning of the area.

- Issues for/against
while the comparison between chatswood and Edmonson park shows the role of density in predicting transit ridership, what it fails to account for is transit connectivity. With Chatswood far closer to the urban core, and with an extra metro interchange with the M1 line, Chatswood clearly has an advantage over Edmonson park, which is located 40 kilometres from the CBD and is located in a developing region of Sydney that has not achieved its full capacity at the urban fringes.
- On the other hand, while Chatswood does have an interchange with the metro, it is safe to say that even if Edmonson Park recieved an interchange, assuming the current urban planning of both areas holds, Chatswood's ridership would far outcompete Edmonson Park, mostly as a result of the population. In addition, the data used depicts the entry/exit ridership, not really accounting for any interchanges as those are hard to monitor and to calculate. This downplays the importance of extra interchanges and focuses most of the data on the important aspect of population density.



# Data dictinaries:
## 'NSW_Train_patronage_per_station.csv'
|Field|Datatype|Format for Display|Description|Example|Validation|
|---|---|---|---|---|---|
|_id|integer|NNNN|Identification|27|Must be a number|
|station|String|X|Station names|Barangaroo Station|Must be a string ending with station|
|Entry|integer|NNNNNNN|Number of entries to station|Town Hall Station: 1768424|must be a number|
|Exit|integer|NNNNNNN|Number of station exits|Chatswood Station: 1529846|must be a number|
|Total|integer|NNNNNNN|Total station patronage|Circular Quay: 1327691|must add entry and exit|

## Suburban population densities
|Field|Datatype|Format for Display|Description|Example|Validation|
|---|---|---|---|---|---|
|Suburb|string|X|Identification|Hornsby|Must be a string value and be an Australian Suburb|
|station|String|X|Station names|Hornsby Station|Must be a string ending with station|
|Density|integer|NNNNNNN|Population density (per square kilometre)|14 800|must be a number|
