# **Identifying and Defining**
---
---
---
## **Learning Intentions:**
---
---
* Specify the functional requirements of a data analysis, including stating the purpose of a solution, describing use cases, and developing test cases of inputs and expected outputs.
* Specify the non-functional requirements of a data analysis.
---
---
## **Success Criteria:**
---
---
* I can clearly state the purpose of a data analysis solution.
* I can describe use cases that outline how the data analysis solution will be used.
* I can develop test cases that include inputs and expected outputs for the data analysis solution.
* I can specify non-functional requirements such as performance, scalability, security, and usability for a data analysis solution.
* I can explain the importance of non-functional requirements in ensuring the overall quality and effectiveness of a data analysis solution.
---
---

## **Choose your Data Scenario and Define your Purpose**
---
---
### **Data**
The data to be analysed at the Sydney house prices from 2009-2019, it contains data of how many bedrooms, bathrooms, car spaces were in the property, it has information on the type of property, location (postcode and suburb) as well as the type of property.

---
### **Goal**
The plan is to analyse the data to have an approximate idea of what drives the price of a house. I plan to ignore the price of a subrub instead focusing specifically on the amount of bedrooms, bathrooms and car spaces.

---
### **Source**
[Source](https://www.kaggle.com/datasets/mihirhalai/sydney-house-prices/data)

---
### **Access**
As you may have seen from the source the data is publicly available to anyone who would like to use it, all you have to do is register to the platform for no cost at all.

---
### **Access Method**
The data will be accessed as a [.csv](https://support.google.com/google-ads/answer/9004364?hl=en-AU#:~:text=A%20CSV%20comma%2Dseparated%20values,in%20a%20table%20structured%20format.) file.

---
---



## **Functional Requirements**
---
---
### **Data Loading**

It must be capable of loading a .csv file type (as that is what the data uses), if there is an error in loading the file it will likely use a try, except method to give the user a message telling them that their file could not be loaded.
As there is only one file it will have to load the singular one upon being prompted.
It will then either move on to the next step or give an error message claiming that the file could not be loaded.

---

In [None]:
#E.g. of such a method:
try:
    var = int(input(""))
    print(f"The value of the square of that number is: {var*var}")
except:
    print("Non-integer input recieved.")

---
### **Data Cleaning**

It must be capable of handling empty values in the file as it often uses an empty value as 0.
As for the filtering the user will likely be prompted with a message asking what data do they want (e.g. between certain dates, certain suburbs, etc.).
If successful it will send the user to the next step (asking which statistical measurements are to be included.)
 
---
### **Data Analysis**

The system will likely require the ability to find all common statistical measurements other then mode (range, interquartile range, median, mean, etc.)
It will prompt the user asking for which measurements to be included.
The desired measurements will be added before the data is visualised.

---
### **Data Visualisation**

The system must be capable of creating a suitable graph for it's purpose, either line or bar/column depending on what is to be graphed.
The user will be prompted on which graph type should be created for their purpose.
Matplotlib will visualise the data in the desired graph.

---
### **Data Reporting**
The system will output the filtered data as well as the final graph.
The user will be asked whether they wish to save the final data set and also be asked to screenshot the graph if they would like to save it.
Depending on the user's previous decision the data will either be made into a new .csv file or it will be forgotten.

---
---

## **Use Cases**
---
---
**Actor**: The User  
**Goal**: Load the Dataset  
**Preconditions**: The User has the dataset  
**Main Flow**:
1. User places the dataset in the correct folder.
2. The System checks if the dataset's file format is compatible and if the dataset is in the right file, if either are incorrect then an error message will appear specifying the issue.
3. The data is loaded and may be displayed using pandas.

**Post conditions**: Dataset is loaded and is available for analysis.

---
**Actor**: The User  
**Goal**: Fill in the blanks and filter out data that the user does not require.  
**Preconditions**: The dataset has been loaded.  
**Main Flow**:
1. The blanks are filled in with 0s.
2. The user is prompted on what data they would like to analyse.
3. The data is filtered based off this.

**Post conditions**: Dataset has been cleansed to remove unnessecary information.

---
**Actor**: The User  
**Goal**: Add the desired statistical measurements into the data for visualisation.  
**Preconditions**: The data gas been cleansed to include only the data required for analysis.  
**Main Flow**:
1. User is prompted about what statistical measurements should be included.
2. The system adds the requried measurements to the dataset, error handling included.  

**Post conditions**: Dataset now contains extra statistical information for ease of analysis.

---
**Actor**: The User  
**Goal**: Visualise the data in the desired format.  
**Preconditions**: Data has been cleansed and whatever extra statistical information desired has been added.  
**Main Flow**:
1. User is prompted about graph type, reccomendation will be included depending on extra statistical information.
2. Matplotlib will create the graph.  

**Post conditions**: Data has been visualised.

---
**Actor**: The User  
**Goal**: Export the .csv file.  
**Preconditions**: Data analysed.  
**Main Flow**:
1. User prompted on if they want the data to be exported.
2. If so the data is exported.  

**Post conditions**: Data exported if the user decided so.

---
---

## **Non-Functional Requirements**
---
---
### **Usability**
---
The user interface should be easy to understand and be quite forgiving with errors, if the user interface is complex the README file should explain how to use it, the user interface should be capable of handling errors as well as backtracking if the user makes an incorrect decision, the README file should explain the file requirements and where the file should be located.

---
### **Reliability**
---
The system should clearing explain what was the error and how to possibly fix it, the system should also include information about the small changes made in the cleansing process to help explain any possible inaccuracies in the final analysis.

---
---
---