<div class="alert alert-block alert-info"><b>IAB303</b> - Data Analytics for Business Insight</div>

# Studio :: Internal data (Strengths and weaknesses)

In this session we focus on the internal concerns of businesses, look at how structured data is important for addressing these concerns, and also take a look at how open data can be valuable for addressing internal concerns.

---

### QDAVI

When addressing business concerns, our interest is much more than just the data analytics. We are interested in what is *appropriate, efficous, ethical ...* --  what is the ***right*** kind of analytics to help provide the ***right*** kind of insights for business. To provide some structure to our approach, we follow a cycle - **QDAVI** - to address a business concern:

1. **Q**uestion
2. **D**ata
3. **A**nalysis
4. **V**isualisation
5. **I**nsight

<img src="graphics/QDAVI_cycle_sm.png" width="50%" />

### Internal business concerns

Review business concerns

What are internal business concerns?

### Structured data

With structured data, the structure is defined in advance and the data that populates the structure is consistent.

Typically, structured data is in a tabular format with rows and columns:

* Columns normally represent fields or properties that are consistent across the whole of the data e.g. postcode or phone number
* Each row is a separate record in the data e.g. each customer has their own row in a customers table 

Often structured data is saved in `CSV` file format on the file system and we read it into a `pandas` `dataframe`. Review the video and notebook for **Structured Data** for an introduction on how to load and save.


---
### Example - Market size



**BUSINESS CONCERN:**

Expanding a business is risky and expensive, particular if the expansion involves exporting to other countries. It is essential to have a good understanding of potential markets and the degree of competition in those markets.

https://export.business.gov.au/find-export-markets/tips-for-choosing-export-markets

For the purposes of this exercise, our business fits into a trade category of
`Made-up textile articles`

#### Question:

 If we want to start exporting, what is the market size for our Australian competitors, and which are the biggest destinations?

#### Data:

In [None]:
import pandas as pd

In [None]:
# Sample data from https://data.gov.au/data/dataset/australia-s-merchandise-trade-by-state-territory-by-country-sitc-to-fy2017

# We can then open a CSV file into a new dataframe
trade_df = pd.read_csv('data/trade_data.csv')

# view the dataframe
trade_df

#### Analysis:

In [None]:
# What is the export value in various geographic levels?
l1_df = trade_df.groupby(['Trade type','Geographic level 1']).sum()
l1_df

In [None]:
# Lets make this easier to read
l1_df.round(0)

#### Visualisation:

Sometimes our data is difficult to explore without visualising. In these instances, it is good to use the visualisation as part of the exploration process. This is the equivalent of doing multiple cycles of QDAVI (with sub questions) within the overall question.

In [None]:
# Reverse the groupby to make it easier to see imports and exports for each country
l1_df = trade_df.groupby(['Geographic level 1','Trade type']).sum()
l1_df.round(0)

In [None]:
# Visualise this - unstack allows levels of grouping to feed into the plot function
l1_df.unstack().plot(kind='bar')

Doing the visualisation as part of the exploration helps us see an **insight** that imports from Asia are so big (> \$11B) that its difficult to see what is going on with exports.
So lets just zoom in on the exports.

In [None]:
# Just exports
exports_df = trade_df[trade_df['Trade type']=='Total Exports']
l1_exports_df = exports_df.groupby(['Geographic level 1']).sum().round(0)
l1_exports_df

In [None]:
l1_exports_df.plot(kind='bar')

That's better, and it gives us an overview, so perhaps we can look at by `State` as well.

In [None]:
l1_exports_df = exports_df.groupby(['State','Geographic level 1']).sum().round(0)
l1_exports_df.unstack(0).plot(kind='bar')

Given that `Oceania & Antarctica` dominate and that Antarctica is *not* likely to be significant, what is the dominant Oceania region. We can try looking at different levels.

In [None]:
# Let's define a function to make it easy

def getDataForLevel(level):
    return exports_df.groupby(['State','Geographic level '+str(level)]).sum().round(0)

In [None]:
getDataForLevel(2).unstack(0).plot(kind='bar')

In [None]:
getDataForLevel(3).unstack(0).plot(kind='bar')

I suspect this is *New Zealand*. Let's confirm by taking a look at all countries with exports of over \$10M.

In [None]:
# What about specific countries greater than 10M
country_df = exports_df.groupby(['Partner country']).sum().round(0)
top_country_df = country_df[country_df["A$'000"] > 10000]
top_country_df

In [None]:
top_country_df.plot(kind='bar')

#### Insights:

* ???