<div style="background:#E9FFF6; color:#440404; padding:8px; border-radius: 4px; text-align: center; font-weight: 500;">IFN619 - Data Analytics for Strategic Decision Makers</div>

# IFN619 :: A2-DataAnalyticsCycle - tutorial exercises

## QUESTION

**Concern:** A key concern of many Queensland organisations and businesses is the impact of climate change on the coral and other marine life at the Great Barrier Reef. 
One indication of climate change is the rise in sea level. The [State of the Environment](https://www.stateoftheenvironment.des.qld.gov.au/climate/coasts-oceans/sea-level) report provides information on this indicator.

> **Question:** To what extent have changes in sea level been observed in the area of the Great Barrier Reef?

In this exercise, you will investigate history data on sea levels in two sites within the Great Barrier Reef. 

## DATA

Historic sea level data from within the Great Barrier Reef is available from [Queensland Government Open Data Portal](https://www.data.qld.gov.au/dataset/soe2020-sea-level/resource/2020-indicator-4-2-0-4)
1. Download the data as a CSV file, and save as `indicator-4-2-0-4.csv` in your `data` folder
2. Look at the contents of the file to understand the data. Note: the data instructions on the webpage provide some useful description of the data. 
3. Import the data as a `pandas` dataframe.

In [None]:
# Import libraries
import pandas as ???
import ??? as px

In [None]:
# Read a CSV into a dataframe
file_path = "???/"
file_name = "???"
df = pd.read_csv(f"{???}{???}")
df

## ANALYSIS

If we were to view all data including months we would be able to see the seasonality (as mentioned in the report). However, as we are just interested in the overall trend, we can get the mean change for the year. We can do this with the `groupby()` and `mean()` functions.

However, first we need to clean the data. Because 2020 is incomplete (doesn't have 12 months), it could skew any averages by reflecting only the seasonal variation for January and February. Therefore, we'll drop 2020 from the data BEFORE calculating the means. We can do this by *filtering* the dataframe to only include years before 2020.

In [None]:
# Remove incomplete 2020 data by selecting only years before 2020
clean_df = df[df[???]<???]
clean_df

In [None]:
# Calculate the means for the year with groupby() and mean()
mean_cols = [???,???, ???]
mean_df = clean_df.groupby(???)[mean_cols].mean()
mean_df

## VISUALISATION

In [None]:
# Visualise the data as a line plot
fig = px.???(???, width=800, height=600)
fig.show()

## INSIGHTS

- What patterns did you find? 
- What is the recommendation for the concern?
- What other information would be helpful?
- What doesn't the data tell us?
- Can we make inferences?

???

---
# Additional Exercises
---

## Research Data Example - Questions & Data

The data from this exercise are from this [published research article](https://www.nature.com/articles/s41598-020-59810-w). Research questions based on these data might be useful for clinicians who need to diagnose and treat people with Parkinson's diesase. 

> Roeder, L., Boonstra, T.W. & Kerr, G.K. Corticomuscular control of walking in older people and people with Parkinson’s disease. Sci Rep 10, 2980 (2020).

1. Download the gait parameters data from [FigShare](https://figshare.com/articles/dataset/Outcome_measures_analyses_scripts_for_Corticomuscular_control_of_walking_in_older_people_and_people_with_Parkinson_s_disease_/7991276). 
2. Upload the CSV to your 'data' directory.
3. Load the data from the CSV into a dataframe
4. Identify questions that you might be able to answer from the data    
    
TIP: You may have to read parts of the article cited above. First, describe the "Group" and "condition" columns, then look at some of the other columns and come up with questions of interest.
    

In [None]:
# Import the required libraries
???

In [None]:
# Read the CSV into a dataframe
file_path = "???/"
file_name = "???"
df = pd.read_csv(f"{file_path}{file_name}")
df

#### Example research questions: 

- What is the mean walking speed of the Parkinson's group, the healthy older group and the healthy younger group? 
- What is the mean duration of the swing phase (in seconds) of the Parkinson's group, the healthy older group and the healthy younger group? 
- Do people with Parkinson's disease walk faster on the treadmill compared to natural overground walking? 
- Do people with Parkinson's disease walk slower than healthy older people? Do they walk slower than healthy young people?
- Is the stride time variability different between healthy young, healty older and people with Parkinson's?





--- 
## Data Search Example

1. Look around on the web for some publicly available data that you can download in .csv format (e.g. https://catalog.data.gov/dataset/?res_format=CSV)
2. Once you have a file, write some code in a Jupyter notebook that will load the data into a pandas dataframe
3. For a numerical column, see if you can calculate some basic statistics like mean, standard deviation...