# Monthly GDP Tables: GVA

## Source

For this example we're extracting the table "GVA" as shown below (note - preview cropped to row 16 for reasons of practicality):

In [None]:
from typing import List
from datachef import acquire, preview
from datachef.selection import XlsSelectable

tables: List[XlsSelectable] = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xls/monthlygdptablesapril2023.xls")
preview(tables[3], bounded="A1:Z16")

From an xls source which can be [downloaded here](https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xls/monthlygdptablesapril2023.xls).

One interesting thing to note here is the producer has intermingled CDID identifiers in with the primary observations values (see lines 7 and 13 above) - we need to be sure to account for this.

# Requirements

- We'll take cell C5 and cells directly right as the column "Identifier"
- We'll take cell C8 and cells directly downwards as the column "Category"
- We'll take all cells to the right of a column B value of "CDID" as the column "CDID"
- We'll take cell A8 and cells directly downwards as "Time Period"
- We're going to ignore the "Weight" values for this example.
- We're also going to ignore the bracketed text and just remove it for our purposes here.
- We'll take the observations are the principle table values minus the CDID headings that are intermingled. We'll use a column name of "Value" for them.

In [None]:
from typing import List

from datachef import acquire, preview
from datachef.direction import up, down, left, right
from datachef.output import TidyData, Column
from datachef.selection import XlsSelectable


tables: List[XlsSelectable] = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xls/monthlygdptablesapril2023.xls")
table = tables[3]

# Sensible starting things
assert table.name == "GVA", "GVA table has moved position"
anchor = table.excel_ref("B7").label_as("Anchor Cell")
assert anchor.lone_value() == "CDID", "Anchor has moved position"

cdid = anchor.expand(down).filter(lambda x: x.value == "CDID").fill(right).label_as("CDID")
identifier = anchor.shift(up(2)).fill(right).label_as("Identifier")
category = anchor.fill(down).filter(lambda x: x.value != "CDID").label_as("Category")
time_period = category.shift(left).label_as("Time Period")

observations = (anchor.shift(right).shift(down).expand(right).expand(down).is_not_blank() - cdid).label_as("Value")

# Create a bounded preview inline but also write the full preview to path
preview(anchor, identifier, category, cdid, time_period, observations, bounded="A4:Z16")
preview(anchor, identifier, category, cdid, time_period, observations, path="monthly-gdp-gva-table.html")

tidy_data = TidyData(
    observations,
    Column(cdid.finds_observations_directly(down)),
    Column(identifier.finds_observations_directly(down)),
    Column(category.finds_observations_directly(right), apply=lambda x: x.split("[")[0].strip()),
    Column(time_period.finds_observations_directly(right), apply=lambda x: x.rstrip(".0"))
)
tidy_data.to_csv("monthly-gdp-gva-table.csv")

# Outputs

The full preview can be [downloaded here](./monthly-gdp-gva-table.html):

The tidy data can be [downloaded here](./monthly-gdp-gva-table.html) and a full inline preview of the tidydata generated is shown below for those people who'd prefer to scroll.

In [None]:
print(tidy_data)