# **Year 9 CT Assessment Task 3** 

---

## **Introduction:**
In this project, you'll navigate the complete data analysis process, from defining the problem to presenting your
findings. This project offers invaluable practical experience with real-world data and various data analysis tools. By
the end, you'll have developed a robust skill set in data analysis and advanced Python. 

**Design Brief:** Identify a complex societal issue (e.g. housing, natural disasters) or a need within the school
community. Form a hypothesis related to the issue. Design a system that processes and visualises data to test
your hypothesis.

---


## **Phase 1: Identifying and Defining:**

*In the starting phase we need to define our purpose (wicked problem) to begin research on. We will also need to include the Functional and Non-Functional Requirements as well as include a three-tiered mind map of data that can be collected.*

**Question:** Do suburbs with lower household incomes have higher crime rates?

**Hypothesis:** I believe that suburbs with higher crime rates generally have lower household incomes. The program aims to identify these areas through data analysis and provide insights that can improve safety and reduce crime. By combining income and crime data, we can identify which areas need more resources and help to make each suburb more liveable. 


#### Functional and Non-Functional Requirements:

**Functional Requirements:**

- Import and process two datasets, one for household income and one for crime rates by suburb.
- Generate graphs that visualise the connection between income and crime.
- Include a working user interface for proper user interaction.
- Filter and list the suburbs with the most crime in New South Wales. 

**Non-Functional Requirements:**
- Interface should be easy to use.
- Lots of comments should be displayed in the code.
- The data should have minimal errors.
- Results should be calculated and displayed in maximum five seconds.


#### Mind Map:

![Alt text](mind-map.png)



---

## **Phase 2: Researching and Planning:**



*In Phase 2, we will begin analysing our question further by researching what is available on the topic from news articles and trusted sites. We will also need to summarise our findings into a SEE paragraph, start collecting data and form a data dictionary.*

#### SEE-I Paragraph:

S -
Lower household income is a key factor associated with higher crime rates in many suburbs across New South Wales.

E -
This means suburbs or areas where people face greater financial hardship, poor education, and limited access to essential services tend to experience more criminal activity. These conditions can create stressful situations, reduce opportunities, and contribute to environments where crime becomes more common and eventually a part of daily life.

E -
For example, the data from the NSW Bureau of Crime Statistics and Research and the ABS SEIFA Index
 show that areas like Mount Druitt and Blacktown, which rank lower on the average household income scale, report higher rates of certain crimes such as theft and assault compared to wealthier suburbs like Mosman or Hunters Hill.

I -
Think about it as two suburbs, one with limited job opportunities and poor education access, and another with high quality school and well maintained infrastructure. In the first, individuals, especially teenagers, may feel more disconnected from society and more likely to engage in criminal behaviour due to their peers in their surroundings or lack of alternatives, where as in the second, there are more protective factors in place to prevent crime and direct young adults towards a better path.

*Links Used:*

https://research.monash.edu/en/publications/crime-and-disorder-in-the-suburbs-a-special-case-of-master-planne

https://www.abs.gov.au/statistics/people/people-and-communities/socio-economic-indexes-areas-seifa-australia/latest-release


#### Data Dictionary:

| Field Name              | Data Type | Description                                                             |
| --------------------------- | ------------- | --------------------------------------------------------------------------- |
| SAL_CODE_2021             | String        | Statistical Area Level (SAL) code representing a specific suburb            |
| Median_tot_hhd_inc_weekly | Integer       | Median total household weekly income in the ABS 2021 Census                         |
| ASGS_Structure            | String        | ABS Australian Statistical Geography Standard (ASGS) structure |
| Census_Code_2021          | String        | Code for the suburb from the Census Geography description                   |
| Census_Name_2021          | String        | Suburb name from the 2021 Census Geography description                      |
| Suburb                    | String        | Name of the suburb from the crime dataset                                       |
| Aug-21-Crime              | Integer       | Total recorded crimes in August 2021 for each suburb                        |

.                                         

---

## **Phase 3: Producing and Implementing:**

The code for this project is in the `main.py` file.

---

## **Phase 4: Testing and Evaluating:**

#### Testing the Program:
 *Based on the information I have gathered, in August 2021 more crime was recorded in lower socio-economic areas where the most crime ranged from $1000 - $3000 weekly on the household income scale. Additionally, on the bar graph, the suburbs with the highest crime rates (excluding Sydney and Haymarket as they do not count as suburbs and are a part of the CBD) include Liverpool, Parramatta and Blacktown. This corresponds perfectly with my hypothesis and after numerous tests of running the program the code has proved to be accurate.*

#### Peer Analysis: (PMI)

**Classmate 1 - Noa:**

| Category     | Feedback                                                                 |
|--------------|--------------------------------------------------------------------------|
| Plus     | - Liked how there was more than one option to display the data by using both scatter and bar graph. The user interface is simple and easy to use.                               |
| Minus    | - The names of the suburbs on the graph could be a bit clearer. There is also no error message shown if any of the file names are incorrect or missing.              |
| Implication | - Improving the labels and adding error messages would make it more user friendly. An overall well built and structured program.               |

**Classmate 2 - Liran:**

| Category     | Feedback                                                                 |
|--------------|--------------------------------------------------------------------------|
| Plus    | - Clean text-based user interface. Liked the use of merging datasets (crime and income).     Liked the scatter plot graph as it could accurately show proof to develop a conclusion.                             |
| Minus    | - The program could benefit from a help menu and adding better error messages. Also only shows data visually using Matplotlib and not in the console.                          |
| Implication | - Adding more features such as pie charts displaying different crimes could make the tool even more interactive. A good base design that can support more features in future.              |


#### Overall Evaluations:

**Requirement Outline:**
My code successfully included all the Functional and Non-Functional requirements. In every test both datasets fully loaded without any issues, and once loaded the graphs were easily accessible. The user interface works as required to, however it lacks complexity and if I had the chance to re-do this project I would aim to produce a more advanced design. 

**Project Management:**
My analysis project was managed well, the code was difficult at first with many errors being displayed however I could find solutions easily and fix them in time before the due date. My code was well made but simple, with only 90 lines. Everything was well organised however if there was one thing I could improve on, it would most likely be time management. I had a different question originally and finding the data was confusing, and I only changed it recently giving me a slight disadvantage. However overall, the project was managed fine.

**Peer Feedback:**
My design recieved thoughtful peer feedback. Noa liked how there was more than one visualisation option and both Liran and Noa commended my easy-to-use user interface. Noa also mentioned however that the names of the suburbs on the graphs were unclear, which I misunderstood at first, however upon testing and investigating closely I understand why he brought it up. He also implied improving on error messages would make it more user friendly, which I agree with him on, and that my program was well structured overall. Liran admired how the datasets were combined together to solve a unique problem though a help menu would be useful, which I agree with. Additionally, he reccomended adding features like pie charts that displayed different crimes. 

**Data And Security:**
The data I used in the program is mostly valid and accurate, as it is sourced from the ABS (Australian Bureau of Statistics), a reliable government website. There may be some bias due to possible underreporting of crime or inconsistencies in how suburb areas are mapped. The security of my system can be improved by ensuring data is stored securely, and if eventually sensitive information is handled, applying user authentication. The user experience (UX) could be made more accessible if there was a more advanced format rather than a simple text based format. 

---

## **Conclusion:**
*Overall, I think my project was successful, despite encountering a couple of problems during the process. In the beginning my question seemed broad, but I was able to gradually narrow it down and focus on something more specific. One of the biggest challenges was finding the right data, particularly when the suburb definitions and underreported crime data created inconsistencies. I'm confident that my code turned out well and does what it is supposed to do. The visualisations and outputs are clear, the data is valid and mostly unbiased, and the user experience is extremely straightforward.*

---