<div class="alert alert-block alert-success"><b>IFN619</b> - Data Analytics for Strategic Decision Makers</div>

## TUTORIAL Week 1 :: Introducing Jupyter notebook

You've made it to this point, so you know how to:
1. Sync files to your Jupyter server
2. Launch a notebook from the Jupyter `files` section. 

Now it's time to take a look at how these notebooks are created. 

Notebooks have 2 main types of cells - `Markdown` cells for displaying content like a web page, and `Code` cells for running code (in our case Python code) and displaying the results. In this practical session we will explore both of these types of cells and how they work together for a complete analytics process.

This tutorial includes 4 groups of tasks:
1. Basic Markdown
2. Basic Python
3. Overview of analytics process

Before moving on to the tasks, familiarise yourself with the menus above, and in particular the keyboard short cuts (click the keyboard icon) for `run selected cells and insert below`, `run selected cells`, changing between `command` and `edit` modes, and changing between `Code` and `Markdown` cell types.

---

## [1] Basic Markdown

Open `Help > Markdown Reference` and read what is possible to do in a markdown cell. Try out the 10 minute tutorial to get a feel for what is possible.

Try creating the following markdown elements. Create a new markdown cell for each element:
1. Headings - 3 different levels. Then give each subsequent element a heading
2. Plain text, italics and bold text, and a horizontal line. Then put a line between each element.
3. A numbered list of your 3 favourite bands/musicians, with sublists of a couple of their albums 
4. A bulleted list of your 3 favourite movies, with sublists of characters in each.
5. A table with your movies above, your favourite character and the actor who plays the character.
6. A quote from a web page, and a link to the original webpage
7. A code block that demonstrates printing your name
8. An inline maths equation for a quadratic equation - f of x equals ax squared plus bx plus c (between some text)

---

Add your Markdown here.

---
## [2] Basic Python

Open `Help > Python Reference` and browse the resources available to help you recall (or learn) basic Python. You can learn as you go, so if you don't have a good knowledge of Python, don't worry - you will be able to pick it up week by week.

Try running Python code for the following tasks. Create a new `Code` cell for each task. 

1. Code comments. Include one at the start of the cell and another beside `answer = 42` that tells the reader that it is the answer to life, the universe and everything.
2. Calculate the number hours in a week. Assign this to a variable `hoursInWeek`, and then calculate the number of hours in a 13 week semester. Comment what you are doing.
3. Create a 2 line string using newline and print it out.
4. Create a 3 line string using triple quotes and print it out.
5. Print out `Data Analytics for Strategic Decision Makers` concatenated from 4 parts.
6. Assign `IFN619 Data Analytics for Strategic Decision Makers` to a variable `unit` and then extract the unit code separated to the unit name and print them out as `Data Analytics for Strategic Decision Makers (IFN619)`.
7. Check that the unit code is 6 characters long.
8. Add your movies from the markdown task to a list, then use a while loop to print each element of the list out on a separate line.
---

In [None]:
#Write your code here

---
## [3] Overview of analytics process

1. Question
2. Data
3. Analysis
4. Visualisation
5. Insight

### 1. Question

> **CONCERN:** A business is looking to launch an agricultural product in either Australia or New Zealand. However, management is unsure which country to start with.

What questions might the business be interested in answering, and how might we use data analytics to address these questions?

### 1. Data

What data may be helpful in finding out the importance of agriculture to each country?

Perhaps, data that shows the contribution of agriculture to the economy:
1. Take a look at [GapMinder](https://www.gapminder.org/data/) - (based on [uw-madison resource](https://uw-madison-aci.github.io/python-novice-gapminder/39-plotting/))
2. Find the "Agriculture, percent of GDP" and download the CSV
3. Upload the CSV to your Jupyter files section with the 'upload' button

#### Required libraries

For any data analysis, we need to use existing software that has been loaded into the Jupyter environment in the form of 'libraries', 'packages', or 'modules'. To make these libraries available to your notebook, you need to `import` them.

In [None]:
# Import pandas for dataframes and matplotlib for plotting
import matplotlib.pyplot as plt
import pandas

#### Load the data

Now that we have the data file in our Jupyter environment, we can load the data out of the file into our notebook so that we can work with it.

In [None]:
# Set variables for file and index column
filename = ??? #the name of your uploaded file - ensure that you use quotes "filename.csv"
colname = ??? #open the csv and have a look at what the index column is called

# Read in the percent of gdp data
ag_gdp = pandas.read_csv(filename, index_col= colname)

# Show the shape of the data
print(ag_gdp.shape)


What does the shape tell us?
Take a look at the data. 
TIP: You can view any variable by typing its name in a cell and running the cell.

In [None]:
# Display loaded data
???

#### Clean the data

Data is rarely in a form where it is ready to analyse immediately. One of the most common tasks in data analytics is cleaning. See [The Ultimate Guide to Data Cleaning](https://towardsdatascience.com/the-ultimate-guide-to-data-cleaning-3969843991d4)

For this task, we're going to work with a subset of the data, and we will select data that needs minimal cleaning. For other tasks, you may need to do a lot of cleaning. How much always depends on both the question being addressed and the data that you have selected.

In [None]:
# Take the last 5 years of the GDP data
most_recent_five_years = [???,???,???,???,???] # TIP: Ensure you put names of columns in quotes "colname"
ag_gdp_clean = ag_gdp.filter(most_recent_five_years, axis=1)
print(???.shape)

We are only interested in Australia and New Zealand, so we don't need 189 rows. We can use the .loc function of the dataframe to obtain the row.

In [None]:
ag_gdp_clean.loc["Zimbabwe"]

So we can take the appropriate rows and assign them to new variables for each country

In [None]:
# Just select the countries we are interested in by referencing the index
ag_gdp_au = ag_gdp_clean.loc[???]
ag_gdp_nz = ag_gdp_clean.loc[???]

In [None]:
# Take a look at the data for AU
???

In [None]:
# Take a look at the data for NZ
???

### 3. Analysis

* What is the problem with the NZ data?
* What can be done about this?
* What are the implications for the question?

For this exercise, we are not going to do any computational analysis on this data, but we still need to 'analyse' the data by **critiquing** it in terms of the question. We could work with the raw numbers, but a visualisation of those numbers may be more helpful.

### 4. Visualisation

At the beginning of the notebook we imported the plotting library and called it `plt`. Here we use this software to visualise our data.

In [None]:
# Plot the data for the 2 countries
plt.plot(???)
plt.plot(???)

This visualisation could easily be misinterpreted. Let's add some additional features to the visualisation to improvement.

In [None]:
# Add labels and set colours
plt.plot(ag_gdp_au,'g-',label=???)
plt.plot(ag_gdp_nz,'m-',label=???)

# Create legend.
plt.legend(loc='upper right')
plt.xlabel(???)
plt.ylabel(???)

### 5. Insight

Our data analytics is not complete at this point. We still need to identify insights from the analytics process that can help address our original questions or address the main business concern.
* What did we find, and how does it relate to the original quest?
* What is the recommendation for the concern? 
* What other information would be helpful? 
* What *doesn't* the data tell us? 
* Can we make inferences? What inferences should we avoid?