<div class="alert alert-block alert-info"><b>IAB303</b> - Data Analytics for Business Insight</div>

## Homework :: The Data Analytics Cycle


---

## Tutorial :: QDAVI

> **CONCERN:** The business is looking to launch an agricultural product in either Australia or New Zealand. However, management is unsure which country to start with.

1. **Q**uestion
2. **D**ata
3. **A**nalysis
4. **V**isualisation
5. **I**nsight

<img src="graphics/QDAVI_cycle_sm.png" width="50%" />

### 1. Question

The business is looking to launch an agricultural product in either Australia or New Zealand. However, management is unsure which country to start with.

### 2. Data

[GapMinder](https://www.gapminder.org/data/) - (based on [uw-madison resource](https://uw-madison-aci.github.io/python-novice-gapminder/39-plotting/))
* Agriculture, percent of GDP (`agric_gdp.csv`)

**Tip:** The file path root is where the notebook is located
* To access a folder write directly the folder name
* To access the parent folder write `../` at the beginning of the path

In [None]:
# Import pandas for dataframes and matplotlib for plotting
import matplotlib.pyplot as plt
import pandas

# Set variables for file and index column
file = "../data/week-2/agric_gdp.csv" #see above
colname = "country" #open the csv and have a look

# Read in the percent of gdp data
ag_gdp = pandas.read_csv(file, index_col= colname)
print(ag_gdp.shape)


In [None]:
# Take a look at the data
ag_gdp

#### Clean/preprocess data

In [None]:
# Take the last 5 years of the GDP data
most_recent_five_years = ["2015", "2016", "2017", "2018", "2019"]
ag_gdp_cln = ag_gdp.filter(most_recent_five_years, axis=1)
print(ag_gdp_cln.shape)

# Just select the countries we are interested in by referencing the index
ag_gdp_au = ag_gdp_cln.loc["Australia"]
ag_gdp_nz = ag_gdp_cln.loc["New Zealand"]

In [None]:
# Take a look at the data for AU
ag_gdp_au

In [None]:
# Take a look at the data for NZ
ag_gdp_nz

### 3. Analysis

- What is the problem with the NZ data?
- What do we need to do?
- For now, we don't do any more analysis - we are more interested in the process

### 4. Visualisation

In [None]:
# Plot the 2 countries
plt.plot(ag_gdp_au)
plt.plot(ag_gdp_nz)

In [None]:
# Add labels and set colours
plt.plot(ag_gdp_au,'g-',label="Australia")
plt.plot(ag_gdp_nz,'m-',label="New Zealand")

# Create legend.
plt.legend(loc='upper right')
plt.xlabel("Years")
plt.ylabel("% of GDP")

### Document analytics

1. What is the concern?
2. What data did we use?
3. How did we analyse it, what decisions and why?
4. What do the visualisations tell us?
5. What is the recommendation for the concern? What other information would be helpful? What *doesn't* the data tell us? Can we make inferences?

## Second QDAVI cycle

### 1. Question

How can we give more context to the agricultural % of GDP

### 2. Data

Let's check the countries total GDP to get the value in dollars. It can be used the data file located in the data folder, week-2, called `gdp.csv`

In [None]:
# Load the require  file
gdp = pandas.read_csv(???, index_col=???)
gdp.shape

In [None]:
# Take a look at the dataframe
gdp

#### Clean/preprocess data

In [None]:
# Use the same filter as before to get the sames years
gdp_cln = gdp.filter(???, axis=1)
print(gdp_cln.shape)

# Just select the countries we are interested in by referencing the index
gdp_au = gdp_cln.loc[???]
gdp_nz = gdp_cln.loc[???]

In [None]:
# Take a look at the data of AU
gdp_au

In [None]:
# Take a look at the data of NZ
gdp_nz

In [None]:
# Convert the data of AU to float
fl_gdp_au = gdp_au.apply(lambda x: x.replace("TR", "")).astype("float") * 1000
fl_gdp_au


In [None]:
# Convert the data of NZ to float
fl_gdp_nz = gdp_nz.apply(lambda x: x.replace("B", "")).astype("float")
fl_gdp_nz

### 3. Analysis

Calculate the value in dollars of the agriculture sector for each country

In [None]:
# Create a new dataframe with the year, agric %, total gdp and agric gdp for AU
df_au = pandas.DataFrame(columns={???, ???, ???, ???})
df_au["year"] = ???
df_au["agric %"] = ???
df_au["total gdp (B)"] = ???
df_au["agric gdp (B)"] = ???
df_au.set_index("year", inplace=True)
df_au

In [None]:
# Create a new dataframe with the year, agric %, total gdp and agric gdp for NZ
df_nz = pandas.DataFrame(columns={???, ???, ???, ???})
df_nz["year"] = ???
df_nz["agric %"] = ???
df_nz["total gdp (B)"] = ???
df_nz["agric gdp (B)"] = ???
df_nz.set_index("year", inplace=True)
df_nz

### 4. Visualisation

In [None]:
# Add labels and set colours
plt.plot(df_au[???],'g-',label="Australia")
plt.plot(df_nz[???],'m-',label="New Zealand")

# Create legend.
plt.legend(loc='upper right')
plt.xlabel("Years")
plt.ylabel("GDP (in billions)")

### 5. Insights

1. What is the concern?
2. What data did we use?
3. How did we analyse it, what decisions and why?
4. What do the visualisations tell us?
5. What is the recommendation for the concern? What other information would be helpful? What *doesn't* the data tell us? Can we make inferences?