# Insight: According to the Australian Institute of Health and Welfare (AIHW), 53.4% of working-age (aged 15-64) people with disability participated in the Australian labour force in 2018. 


## Open source code for extracting a  data insight 

This Jupyter notebook demonstrates how to download and analyse the underlying data and extract information about disablity in Australia.

<img src="https://images.unsplash.com/photo-1593707206058-2ba3ecd07396?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1050&q=80" width ="1200" height=600 >
<span style="font-style:italic;">Photo by <a href="https://unsplash.com/@mylovefromjesus">Jung Ho Park</a> on <a href="https://unsplash.com">Unsplash</a></span>

## The insight generation process is divided into sub-tasks as shown below:
 1. Download data
 2. Explore data
 3. Extract/analyse data
 4. Visualize data
 
## Step 1 : Download data

First, we download "aihw-dis-72-labour-force-participation.xlsx" from Github (https://github.com/soda-lab/data-registry/) which are a copy of *"Data tables: Employment supplementary data tables”* data from *“People with disability in Australia”* collection [1] by the **Australian Institute of Health and Welfare(AIHW)**. 

- **Australian Institute of Health and Welfare (AIHW)**: AIHW is a health and welfare statistics agency in Australia. They provide a wide range of health and welfare data.

- **People with disability in Australia**: Data provides a range of national data sources to contribute to a greater understanding about disability in Australia.

### Download data by using python code

Below python code demonstrate how to download data from the git repository.

In [1]:
import pandas as pd

repository = 'https://github.com/soda-lab/data-registry/'
path = 'blob/main/original_data/'
dataset = 'aihw-dis-72-labour-force-participation.xlsx'
raw = '?raw=ture'

url = repository + path + dataset + raw
xls = pd.ExcelFile(url)

In [2]:
print(xls.sheet_names)

['Contents', 'Table LABF1', 'Table LABF1a', 'Table LABF1b', 'Table LABF2', 'Table LABF3', 'Table LABF4', 'Table LABF5', 'Table LABF6', 'Table LABF7', 'Table LABF8', 'Table LABF9']


Data successfully loaded to the Jupyter notebook. The data includes 12 tables (see above list of tables).

## Step 2: Explore data

Before we start to extract insights from the data, we need to explore the data to identify potential insights. In below, we described 12 tables in data file. 
- Table LABF1: People aged 15 and over living in households, by age group, disability status, and labour force status, 2018
- Table LABF1a: Males aged 15 and over living in households, by age group, disability status, and labour force status, 2018
- Table LABF1b: Females aged 15 and over living in households, by age group, disability status, and labour force status, 2018
- Table LABF2: People aged 15–64 living in households, by labour force and employment status, disability status and sex, 2018
- Table LABF3: People aged 15–64 living in households, by labour force and employment status, disability status and age group, 2018
- Table LABF4: People aged 15–64 living in households who are not in the labour force, by whether permanently unable to work, actively looked for work in the last 4 weeks and intention to work or look for work in the future, disability status and sex, 2018
- Table LABF5: People aged 15–64 living in households who are not in the labour force, by whether permanently unable to work, actively looked for work in the last 4 weeks and intention to work or look for work in the future, disability status and age group, 2018
- Table LABF6: People aged 15–64 with disability living in households who are permanently unable to work, by reasons permanently unable to work and sex, 2018
- Table LABF7: People aged 15–64 with disability living in households who are permanently unable to work, by requirements to enable workforce participation, 2018
- Table LABF8: People aged 15–64 living in households, not in the labour force, not permanently unable to work, who have not actively looked for work in last 4 weeks and do not intend to work or look for work, by reasons for not intending to work or look for work, disability status and sex, 2018
- Table LABF9: People aged 15–64 living in households, not in the labour force, not permanently unable to work, who have not looked for work in last 4 weeks but intend to work or look for work or are unsure, by reasons for not looking for work, disability status and sex, 2018

## Step3: Extract/analyse data

We interest in anlysing people with disability and their engagement in labour force. Table "LABF2" includes People aged 15–64 living in households, by labour force and employment status, disability status and sex in 2018. Therefore, we extract rows and columns from the "LABF2" table to analyse people having a disability and their employment status. 

In [8]:
# read Table "Table LABF2"
df = pd.read_excel(xls, sheet_name = 'Table LABF2', skiprows=1, header=0)
df

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19
0,,With disability—\nsevere or profound(b),,,,With disability—\nother disability status(c),,,,All with disability,,,,Without disability,,,,Total,,
1,Labour force and employment status,Estimate\n('000),%,95% CI,,Estimate\n('000),%,95% CI,,Estimate\n('000),%,95% CI,,Estimate\n('000),%,95% CI,,Estimate\n('000),%,95% CI
2,,Males,,,,,,,,,,,,,,,,,,
3,In the labour force(d),76.4,31,(25.8–36.2),,486.3,64.3,(61.1–67.5),,562.6,56.1,(53.5–58.7),,6164.5,88.6,(88.0–89.1),,6726.6,84.5,(83.8–85.2)
4,Employed(e),65.2,26.5,(21.7–31.3),,437.1,57.8,(54.4–61.1),,500.4,49.9,(47.2–52.6),,5899.7,84.8,(84.1–85.4),,6399.6,80.4,(79.7–81.1)
5,Employed working full-time,35.9,14.6,(10.5–18.6),,323.5,42.8,(39.9–45.6),,358.9,35.8,(33.3–38.3),,4851.5,69.7,(68.8–70.6),,5211.3,65.5,(64.7–66.2)
6,Employed working part-time,29.4,11.9,(8.9–15.0),,111.2,14.7,(12.5–16.9),,141,14.1,(12.3–15.8),,1050.2,15.1,(14.3–15.9),,1188.8,14.9,(14.2–15.7)
7,Unemployed,12.6,5.1,(2.8–7.4),,50.1,6.6,(5.2–8.0),,63.2,6.3,(5.1–7.5),,265.7,3.8,(3.4–4.2),,327,4.1,(3.7–4.5)
8,Not in the labour force(f),170.3,69.2,(65.1–73.2),,268.7,35.5,(33.3–37.8),,440.5,44,(41.6–46.3),,792.9,11.4,(10.8–12.0),,1231.4,15.5,(14.8–16.1)
9,Total,246.2,100,. .,,756.4,100,. .,,1002.2,100,. .,,6957.8,100,. .,,7960,100,. .


In [11]:
# extract rows and columns
extract_columns = ['Unnamed: 0', 'Unnamed: 9', 'Unnamed: 10']
extracted_df = df[extract_columns]

extracted_df.rename(columns={"Unnamed: 0": "employment status", 
                             "Unnamed: 9": "estimated number of people ('000)", 
                             "Unnamed: 10": "proportion(%)"}, inplace=True) #rename columns
extracted_df = extracted_df.iloc[19:26]
extracted_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,employment status,estimated number of people ('000),proportion(%)
19,In the labour force(d),1098.6,53.4
20,Employed(e),984.2,47.8
21,Employed working full-time,581.8,28.3
22,Employed working part-time,402.8,19.6
23,Unemployed,112.7,5.5
24,Not in the labour force(f),958.5,46.6
25,Total,2057.5,100.0


As you can see the above table, we extract columns and rows from the "Table LABF2" and create a new data table. The new data table shows the estimated number and proportion size for disability people. 

## Step4: Visualize data

Here, we'd like to visualize the extracted tables. Below cells show how to visualize the extracted data by using **treelib** library.

- **treelib** : treelib is a python library for tree vlisualzation.

In [1]:
from treelib import Node, Tree

tree = Tree()

tree.create_node("2,057,500 people aged 15-64 with disability", "root")  # No parent means its the root node
tree.create_node("In the labour force (53.4%)",  "In the labour force"   , parent="root")
tree.create_node("Not in the labour force (46.6%)",  "Not in the labour force"   , parent="root")
tree.create_node("Employed (47.8%)",  "employed"   , parent="In the labour force")
tree.create_node("Unemployed (5.5%)",  "unemployed"   , parent="In the labour force")
tree.create_node("Working full-time (28.3%)",  "full-time"   , parent="employed")
tree.create_node("Working part-time (19.6%)",  "part-time"   , parent="employed")

tree.show()

2,057,500 people aged 15-64 with disability
├── In the labour force (53.4%)
│   ├── Employed (47.8%)
│   │   ├── Working full-time (28.3%)
│   │   └── Working part-time (19.6%)
│   └── Unemployed (5.5%)
└── Not in the labour force (46.6%)



In [13]:
import pydot

tree= {'salary': {'41k-45k': 'junior', '46k-50k': {'department': {'marketing': 'senior', 'sales': 'senior', 'systems': 'junior'}}, '36k-40k': 'senior', '26k-30k': 'junior', '31k-35k': 'junior', '66k-70k': 'senior'}}

def walk_dictionaryv2(graph, dictionary, parent_node=None):
    '''
    Recursive plotting function for the decision tree stored as a dictionary
    '''

    for k in dictionary.keys():

        if parent_node is not None:

            from_name = parent_node.get_name().replace("\"", "") + '_' + str(k)
            from_label = str(k)

            node_from = pydot.Node(from_name, label=from_label)

            graph.add_edge( pydot.Edge(parent_node, node_from) )

            if isinstance(dictionary[k], dict): # if interim node


                walk_dictionaryv2(graph, dictionary[k], node_from)

            else: # if leaf node
                to_name = str(k) + '_' + str(dictionary[k]) # unique name
                to_label = str(dictionary[k])

                node_to = pydot.Node(to_name, label=to_label, shape='box')
                graph.add_edge(pydot.Edge(node_from, node_to))

                #node_from.set_name(to_name)

        else:

            from_name =  str(k)
            from_label = str(k)

            node_from = pydot.Node(from_name, label=from_label)
            walk_dictionaryv2(graph, dictionary[k], node_from)


def plot_tree(tree, name):

    # first you create a new graph, you do that with pydot.Dot()
    graph = pydot.Dot(graph_type='graph')
    
    walk_dictionaryv2(graph, tree)
    
    graph.write_png(name+'.png')


plot_tree(tree,'name')

The above tree graph shows hierarchy structure within employment status of people aged 15-64 with disability. In the total 2,057,500 people aged 15-64 with disability people, 53.4% are belong to labour force while the other 46.6% are not in the labour force. In the 53.4% labour force, 47.8% are employed where 28.3% are full-time and 19.6% are part-time. 

# Conclusion

In this Jupyter notebook, we demonstrated how to download a dataset from Github and explore the dataset by using python code. *"Data tables: Employment supplementary data tables”* data from “People with disability in Australia” collection [1] contains 12 tables related to people with disability and their engagement in labour force. Python code were used to extract a table from the data collection and visualize the extracted data table. Eventually, we learned the data insight "**According to the Australian Institute of Health and Welfare (AIHW), 53.4% of working-age (aged 15-64) people with disability participated in the Australian labour force in 2018. **" from this activity. 

## References

[1] AIHW. People with disability in Australia, Data tables: Employment supplementary data tables, 2020, Australian Institute of Health and Welfare, [Dataset] Available: https://www.aihw.gov.au/reports/disability/people-with-disability-in-australia/data. [Accessed: January 4, 2021].