# Household Debt Inequalities



## Source

For this example we're extracting the tables 11 and 12 from an xls dataset dealing with household debt inequalities.

The example highlights using iteration to join multiple tables into a coherent whole.

In [1]:
from typing import List
from tidychef import acquire, preview
from tidychef.selection import XlsSelectable

tables: List[XlsSelectable] = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/householddebtdataset.xls", tables="Table 11|Table 12")
for table in tables:
    preview(table)

0,1,2,3,4,5,6,7,8
,A,B,C,D,E,F,G,H
1.0,Table 11,,,,,,Back to contents,
2.0,"Individuals with financial liabilities by economic activity: Great Britain, July 2010 to June 2014",,,,,,,
3.0,Great Britain,,,,,,,
4.0,Economic Activity,Percentage with financial liabilities (%),Median value of financial liabilities (£),Median value of individual annual net income (£),Median Individual Debt to Income Ratio,Median value of individual gross financial wealth (£),Unweighted Frequency,Weighted Frequency
5.0,July 2012 to June 2014,,,,,,,
6.0,In Employment,44.0,2900.0,18000.0,0.15,1000.0,8175.0,12748000.0
7.0,Unemployed,41.0,1200.0,3700.0,0.35,,481.0,813000.0
8.0,Economically Inactive,21.0,1000.0,9400.0,0.14,300.0,2909.0,3728000.0
9.0,All Individuals with financial liabilities,35.0,2200.0,15700.0,0.15,800.0,11565.0,17290000.0


0,1,2,3,4,5,6,7,8
,A,B,C,D,E,F,G,H
1.0,Table 12,,,,,,Back to contents,
2.0,"Individuals with financial liabilities by education level: Great Britain, July 2010 to June 2014",,,,,,,
3.0,Great Britain,,,,,,,
4.0,Education Level,Percentage with financial liabilities (%),Median value of financial liabilities (£),Median value of individual annual net income (£),Median Individual Debt to Income Ratio,Median value of individual gross financial wealth (£),Unweighted Frequency,Weighted Frequency
5.0,July 2012 to June 2014,,,,,,,
6.0,Degree level or above,41.0,4600.0,20500.0,0.22,2600.0,3188.0,4844000.0
7.0,Other qualifications,37.0,1900.0,15100.0,0.14,600.0,6638.0,9906000.0
8.0,No qualifications,22.0,900.0,11700.0,0.09,100.0,1332.0,1860000.0
9.0,All Individuals with financial liabilities,35.0,2200.0,15700.0,0.15,800.0,11158.0,16610000.0


From an xls source which can be [downloaded here](https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/householddebtdataset.xls).

## Requirements

- We're going to extract "Period" from the obvious dates in column A.
- We're just going to call the principle field indicated by column A "Category".
- We're going to take "Great Britain" as a constant for a column named "Area".
- We're going to take the headers on row 4 as "Financial Liability"
- As an additional exercise we're going to use a horizontal condition to create a "Unit Of Measure" column to be one of "Pounds Sterling", "Percent", "Ratio" or "Number" depending on the category.
- We're going to prefix "Category" as extracted from table 12 with "Education: " to make the data a little easier to understand.
- We're going to join both tables into a single tidy data output.
- We're going to de-duplicate with a printout of what we've removed - it should be the contents of row 14 as its duplicated on both tables.
- We'll strip trailing ".0"s from the observations (which we'll call "Value" this time).

In [2]:
from typing import Dict, List
from tidychef import acquire, preview
from tidychef.direction import down, right, left
from tidychef.output import Column, TidyData
from tidychef.selection import XlsSelectable

def unit_of_measure(line: Dict[str, str]) -> str:
    """
    Function to define unit of measure based on Financial Liability
    """
    cat = line["Category"]
    if "(%)" in cat:
        return "Percent"
    elif "(£)" in cat:
        return "Pounds Sterling"
    elif "Frequency" in cat:
        return "Number"
    elif "Ratio" in cat:
        return "Ratio"
    else:
        raise Exception(f"Cannot identify unit of measure from: {cat}")

tables: List[XlsSelectable] = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/householddebtdataset.xls", tables="Table 11|Table 12")

all_tidy_data = []
for table in tables:
    area = table.excel_ref("A").re("Great Britain").assert_one().label_as("Area")
    period = table.excel_ref("A3").fill(down).re(".*[0-9]{4}").assert_len(2).label_as("Period")
    category = area.shift(down).fill(right).label_as("Category")
    observations = category.fill(down).is_not_blank().label_as("Value")
    financial_liability = (observations.fill(left) - observations).label_as("Financial Liability")
    preview(observations, area, period, category, financial_liability)

    tidy_data = TidyData(
        observations,
        Column.constant("Area", area.lone_value()),
        Column(period.finds_observations_closest(down)),
        Column(category.finds_observations_directly(down), apply=lambda x: "Eduction: "+x if table.name == "Table 12" else x),
        Column(financial_liability.finds_observations_directly(right)),
        Column.horizontal_condition("Unit Of Measure", unit_of_measure),
        obs_apply=lambda x: x.replace(".0", "")
    )

    all_tidy_data.append(tidy_data)

final_tidy_data = TidyData.from_tidy_list(all_tidy_data)
final_tidy_data.drop_duplicates(print_duplicates=True)
final_tidy_data.to_csv("household-debt.csv")

0
Value
Area
Period
Category
Financial Liability

0,1,2,3,4,5,6,7,8
,A,B,C,D,E,F,G,H
1.0,Table 11,,,,,,Back to contents,
2.0,"Individuals with financial liabilities by economic activity: Great Britain, July 2010 to June 2014",,,,,,,
3.0,Great Britain,,,,,,,
4.0,Economic Activity,Percentage with financial liabilities (%),Median value of financial liabilities (£),Median value of individual annual net income (£),Median Individual Debt to Income Ratio,Median value of individual gross financial wealth (£),Unweighted Frequency,Weighted Frequency
5.0,July 2012 to June 2014,,,,,,,
6.0,In Employment,44.0,2900.0,18000.0,0.15,1000.0,8175.0,12748000.0
7.0,Unemployed,41.0,1200.0,3700.0,0.35,,481.0,813000.0
8.0,Economically Inactive,21.0,1000.0,9400.0,0.14,300.0,2909.0,3728000.0
9.0,All Individuals with financial liabilities,35.0,2200.0,15700.0,0.15,800.0,11565.0,17290000.0


0
Value
Area
Period
Category
Financial Liability

0,1,2,3,4,5,6,7,8
,A,B,C,D,E,F,G,H
1.0,Table 12,,,,,,Back to contents,
2.0,"Individuals with financial liabilities by education level: Great Britain, July 2010 to June 2014",,,,,,,
3.0,Great Britain,,,,,,,
4.0,Education Level,Percentage with financial liabilities (%),Median value of financial liabilities (£),Median value of individual annual net income (£),Median Individual Debt to Income Ratio,Median value of individual gross financial wealth (£),Unweighted Frequency,Weighted Frequency
5.0,July 2012 to June 2014,,,,,,,
6.0,Degree level or above,41.0,4600.0,20500.0,0.22,2600.0,3188.0,4844000.0
7.0,Other qualifications,37.0,1900.0,15100.0,0.14,600.0,6638.0,9906000.0
8.0,No qualifications,22.0,900.0,11700.0,0.09,100.0,1332.0,1860000.0
9.0,All Individuals with financial liabilities,35.0,2200.0,15700.0,0.15,800.0,11158.0,16610000.0


Removed duplicate instances of the following row(s):
-----------------------------------------------------
35,Great Britain,July 2012 to June 2014,Eduction: Percentage with financial liabilities (%),All Individuals with financial liabilities,Percent
38,Great Britain,July 2010 to June 2012,Eduction: Percentage with financial liabilities (%),All Individuals with financial liabilities,Percent
2200,Great Britain,July 2012 to June 2014,Eduction: Median value of financial liabilities (£),All Individuals with financial liabilities,Pounds Sterling
2300,Great Britain,July 2010 to June 2012,Eduction: Median value of financial liabilities (£),All Individuals with financial liabilities,Pounds Sterling
15700,Great Britain,July 2012 to June 2014,Eduction: Median value of individual annual net income (£),All Individuals with financial liabilities,Pounds Sterling
15200,Great Britain,July 2010 to June 2012,Eduction: Median value of individual annual net income (£),All Individuals with financial liabili

# Outputs

The tidy data can be [downloaded here](./household-debt.csv) and a full inline preview of the tidydata generated is shown below for those people who'd prefer to scroll.

In [3]:
print(final_tidy_data)

0,1,2,3,4,5
Value,Area,Period,Category,Financial Liability,Unit Of Measure
44,Great Britain,July 2012 to June 2014,Eduction: Percentage with financial liabilities (%),In Employment,Percent
41,Great Britain,July 2012 to June 2014,Eduction: Percentage with financial liabilities (%),Unemployed,Percent
21,Great Britain,July 2012 to June 2014,Eduction: Percentage with financial liabilities (%),Economically Inactive,Percent
35,Great Britain,July 2012 to June 2014,Eduction: Percentage with financial liabilities (%),All Individuals with financial liabilities,Percent
46,Great Britain,July 2010 to June 2012,Eduction: Percentage with financial liabilities (%),In Employment,Percent
44,Great Britain,July 2010 to June 2012,Eduction: Percentage with financial liabilities (%),Unemployed,Percent
23,Great Britain,July 2010 to June 2012,Eduction: Percentage with financial liabilities (%),Economically Inactive,Percent
38,Great Britain,July 2010 to June 2012,Eduction: Percentage with financial liabilities (%),All Individuals with financial liabilities,Percent
2900,Great Britain,July 2012 to June 2014,Eduction: Median value of financial liabilities (£),In Employment,Pounds Sterling



