# Service Industry

## Source

For this example we're extracting the table "TOPS19" as shown below (note - preview cropped for reasons of practicality):

In [None]:
from typing import List
from datachef import acquire, preview
from datachef.selection import XlsSelectable

tables: List[XlsSelectable] = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xls/service-industry.xls")
preview(tables[10], bounded="A1:Q22")


From an xlsx source which can be [downloaded here](https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xls/servie-industry.xls).

# Requirements

- We'll take the line 4 and 5 headers as "Product".
- We'll call "Year" from column A and clean it up.
- We'll take "Quarter" from column B.
- We'll take row 9 as "CDID" (as I happen to know that's the name of this particular type of identifier).
- We'll call the observations column "Value"

This is a pretty standard recipe with a few things to note:

- using `extrude` to hanble badly merged cells (lines 4, 5 & 6).
- dealing with an extensive footer by defining it early and just removing it from the selections.
- using a `shift` on th already selected year to add cells that structurally should be "All" for quarters in a targetted manner.

In [None]:
from typing import List
from datachef import acquire, against, preview
from datachef.direction import up, down, left, right
from datachef.output import Column, TidyData
from datachef.selection import XlsSelectable

tables: List[XlsSelectable] = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xls/service-industry.xls")

# Do sensible things
table = tables[10]
wanted_table = "TOPSI9"
assert table.name == wanted_table, f'Got table {table.name}, expected {wanted_table}'
footer = table.excel_ref('A').re("Average").expand(right).expand(down)

anchor = table.re(".*ships and boats.*").assert_one().shift(left).label_as("Anchor Cell")
year = (anchor.shift(left(3)).expand(down).is_not_blank()- footer).label_as("Year")
quarter = (
    (anchor.shift(left(2)).expand(down).is_not_blank() | year.shift(right))
    - footer
    ).label_as("Quarter")
cdid = table.re("^[A-Z]{3}\d$").assert_single_row().label_as("CDID")
product = anchor.extrude(up).extrude(down).expand(right).is_not_blank().label_as("Product")
observations = (cdid.waffle(down, quarter) - footer).label_as("Value")

preview(anchor, observations, product, year, quarter, cdid)

tidy_data = TidyData(
    observations,
    Column(product.finds_observations_directly(down)),
    Column(year.finds_observations_closest(down), apply=lambda x: x[:4], validate=against.is_numeric),
    Column(quarter.finds_observations_directly(right), apply=lambda x: "All" if x == "" else x),
    Column(cdid.finds_observations_directly(down))
)

tidy_data.to_csv("service-industry.csv")

# Outputs

The tidy data can be [downloaded here](./service-industry.csv) and a full inline preview of the tidydata generated is shown below for those people who'd prefer to scroll.

In [None]:
print(tidy_data)