<div style="background:#E9FFF6; color:#440404; padding:8px; border-radius: 4px; text-align: center; font-weight: 500;">IFN619 - Data Analytics for Strategic Decision Makers (2024_sem1)</div>

# IFN619 :: B2-StructuredAnalytics Tutorial Exercises

---
## Task 1

Using the structured analytics techniques demonstrated in this week's studio, work through the QDAVI cycle for the following business concern:


> **CONCERN:** A business is looking to launch an agricultural product in either Australia or New Zealand. However, management is unsure which country to start with.

### Question

What questions might the business be interested in answering, and how might we use data analytics to address these questions?

### Data

What data may be helpful in finding out the importance of agriculture to each country?

Perhaps, data that shows the contribution of agriculture to the economy:

1. Take a look at [GapMinder](https://www.gapminder.org/data/) - (based on [uw-madison resource](https://uw-madison-aci.github.io/python-novice-gapminder/39-plotting/))
2. Under "Choose individual indicators", navigate to "Agriculture, percent of GDP" (economy>sectors>agriculture) and download the CSV file.
3. Upload the CSV to your Jupyter files section with the 'upload' button into a 'data' directory.

In [None]:
# import the required library to load the data
???


#### Load the data

Now that we have the data file in our Jupyter environment, we can load the data out of the file into our notebook so that we can work with it.

In [None]:
file_path = "data/"
file_name = ???
index_column = ???

ag_gdp_df = pd.read_csv(f"{file_path}{file_name}", index_col= ???)

ag_gdp_df

#### Select relevant data

Select the most recent 5 years. We can do this by getting a list of the columns, and selecting the last 5. However, we need to ensure that both countries have complete data for those years.

Select the relevant rows (Australia and NZ)

In [None]:
# get columns as a list and select recent 5 that have complete data for Aus and NZ
all_years = list(ag_gdp_df.???)
recent5_years = all_years[-7:-2]
recent5_years

In [None]:
ag_gdp5_df = ag_gdp_df[???]
ag_gdp5_df

We are only interested in Australia and New Zealand, so we don't need 189 rows. We can use the .loc function of the dataframe to obtain the row.

In [None]:
countries = [???,???]
ag_gdp5_df.loc[countries]

In [None]:
# Transpose the dataframe (with .T) to better suit analysis
au_nz_df = ag_gdp5_df.loc[countries].T
au_nz_df.index.name = "???" # give the index an appropriate name
au_nz_df

### Analysis

Get a picture of the data, by using descriptives statistics

In [None]:
au_nz_df.???

In [None]:
years = list(au_nz_df.???)
yrs_str = ", ".join(???)
print(f"Agriculture as a percent of GDP for the years: {???}")
for country in countries:
    ds = au_nz_df[???].describe()
    mean_pc = ???
    min_pc = ???
    max_pc = ???
    print(f"{???} mean: {???:???}%, min: {???}% max: {???}%")


### Visualisation

Use plotly.express to create a line chart for the new dataframe

In [None]:
# Import the library required to use plotly express for visualisation
import ??? as ???

In [None]:
px.line(???)

### Insights

What insights can you identify from the analysis above?
Can you address the business concern? To what extent?
What limiting factors might there be in using this analysis to address the concern? (think population of countries, recency of data, etc)

---
## Task 2

### Question 

> What sort of daily ups and downs does the stock market have?

### Data

The file NASDAQ100.csv in the data folder contains daily opening and closing prices of a US stock index called the NASDAQ 100 for the last year downloaded from [Yahoo Finance](https://au.finance.yahoo.com/quote/%5ENDX).

Load the the data from the file into a pandas dataframe. Notice that the column names from the file become the column names of the dataframe.

In [None]:
# import libraries (the same two as imported above)
???

In [None]:
file_path = ???
file_name = ???
# Load the CSV file into a DataFrame
df = pd.read_csv(f"{file_path}{file_name}")

#### "Eyeballing" the data
Print the first 10 rows of data on your screen. Take a quick look to check it seems like good data.


In [None]:
df.head(10)

#### Checking for duplicates

This is "time series" data, so there should not be multiple rows for one day.

Run a command to check for duplicate rows (all values the same as another row). If you find any, delete the duplicate rows, keeping only one. (Hint: the dataframe object has a "duplicated" method.)

In [None]:
# Check duplicates

df[df.???]

In [None]:
# Which row is duplicated? 

In [None]:
# Complete this line of code to remove the duplicate from the dataset.

df = df[~df.???]

In [None]:
# # Check duplicates again to verify it has been removed

df[df.???]

### Analysis

Create a new column equal to the the closing price minus the opening price. Call the new column "Daily Return".

In [None]:
# Create the new column
df['Daily Return'] = df[???] - df[???]

# Look at the new column
df.head(5)

#### Calculate some basic statistics

Calculate the mean and standard deviation of the "Daily Return" column.

Calculate how many days had a positive Daily Return and how many were negative.

In [None]:
# Calculate the mean and standard deviation of the 'Daily Return' column
mean = df['Daily Return'].???
std = df['Daily Return'].???

print("Mean is:", mean, "Standard dev. is:", std)

In [None]:
# Count the number of positive and negative values in the 'Daily Return' column
print((df['Daily Return'] > ???).value_counts())


### Visualisation

Create a histogram of the "Daily Return" column. Experiment with using different "bin" sizes from 5 to 20.

In [None]:
fig = px.histogram(df['Daily Return'], nbins=???)
fig.show()


# Interpreting results

Is the mean very far from zero?

What sort of distribution does "Daily Return" seem to be?

Does there appear to be any asymmetry in the distribution?

What sort of investment strategy might have made money in this year? What sort of investment strategy might have lost money?

---
## Task 3

### Examples of working with lists

One of the most powerful features of python is lists. A list is a data structure that has no special way to get values out except by position.

The next cell creates a list of numbers, but list elements can be any datatype.

In [None]:
# Here's a list

list1 = [0,1,2,3,4,5,6,7,8,9]


## Do some things with the list

If you are new to python, you might have to do some looking around on the course materials or the internet to find the write code to do the following tasks with list1, but it shouldn't be difficult.

### Add up the numbers in list1

In [None]:
sum(list1)

### What is the length of list1?

In [None]:
len(list1)

### So what is the mean of list1?

In [None]:
sum(list1)/len(list1)

In [None]:
### Print the 5th number in list1? (remember there is a "0" element so the first item in the list is list[0])

In [None]:
list1[4]

You can "slice" a list to get different values easily

### What are the last 3 numbers in list1?

In [None]:
list1[7:10]

### Loop through all the elements of list1 using a for loop

In [None]:
for element in list1:
    print("This is:", element)

There is a shortcut way to do things with lists called "list comprehensions". Find out what they are.
### Use a list comprehension to print all the elements of list1 squared

In [None]:
[x**2 for x in list1]

A list can be any combination of things.

### Add a word to list1

In [None]:
list1.append('this list item contains some words')
list1

Now if you try to sum the elements in the list, however, we will get an error because you can't add strings and numbers.

An empty list looks like this: []

To empty list1 we can simply do: list1=[]



In [None]:
list1=[]
list1