# Service Industry

## Source

For this example we're extracting the table "TOPS19" as shown below (note - preview cropped for reasons of practicality):

In [1]:
from tidychef import acquire, preview
from tidychef.selection import XlsSelectable

table: XlsSelectable = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/service-industry.xls", tables="TOPSI9")
preview(table, bounded="A1:Q22")


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q
1.0,TOPSI9,,,UK Production Turnover,,,,,,,,,,,,,
2.0,,,,Turnover in Production and Services Industries,,,,,,,,,,,,,
3.0,,,,"Current price, not seasonally adjusted",,,,,,,,,,,,,£ million
4.0,Back to Contents,,,,,,,Manufacture of air and spacecraft and related machinery,,,,,,,,,
5.0,,,,,Building of ships and boats,,,,,,Manufacture of other transport equipment,,,,,,
6.0,,,,,,,,,,,,,,Manufacture of furniture,,,Other manufacturing
7.0,,,,,,,,,,,,,,,,,
8.0,,,,,30.1,,,30.3,,,30.2/4/9 (30OTHER),,,31,,,32
9.0,,,,,JQR4,,,JQS8,,,JQU4,,,JQV2,,,JQV5


From an xlsx source which can be [downloaded here](https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/servie-industry.xls).

# Requirements

- We'll take the line 4 and 5 headers as "Product".
- We'll call "Year" from column A and clean it up.
- We'll take "Quarter" from column B.
- We'll take row 9 as "CDID" (as I happen to know that's the name of this particular type of identifier).
- We'll call the observations column "Value"

This is a pretty standard recipe with a few things to note:

- using `extrude` to hanble badly merged cells (lines 4, 5 & 6).
- dealing with an extensive footer by defining it early and just removing it from the selections.
- using a `shift` on th already selected year to add cells that structurally should be "All" for quarters in a targetted manner.

In [4]:
from typing import List
from tidychef import acquire, against, preview
from tidychef.direction import up, down, left, right
from tidychef.output import Column, TidyData
from tidychef.selection import XlsSelectable

table: XlsSelectable = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/service-industry.xls", tables="TOPSI9")
footer = table.excel_ref('A').re("Average").expand(right).expand(down)

anchor = table.re(".*ships and boats.*").assert_one().shift(left).label_as("Anchor Cell")
year = (anchor.shift(left(3)).expand(down).is_not_blank()- footer).label_as("Year")
quarter = (
    (anchor.shift(left(2)).expand(down).is_not_blank() | year.shift(right))
    - footer
    ).label_as("Quarter")
cdid = table.re(r"^[A-Z]{3}\d$").assert_single_row().label_as("CDID")
product = anchor.extrude(up).extrude(down).expand(right).is_not_blank().label_as("Product")
observations = (cdid.waffle(down, quarter) - footer).label_as("Value")

preview(anchor, observations, product, year, quarter, cdid)

tidy_data = TidyData(
    observations,
    Column(product.finds_observations_directly(down)),
    Column(year.finds_observations_closest(down), apply=lambda x: x[:4], validate=against.is_numeric),
    Column(quarter.finds_observations_directly(right), apply=lambda x: "All" if x == "" else x),
    Column(cdid.finds_observations_directly(down))
)

tidy_data.to_csv("service-industry.csv")

0
Anchor Cell
Value
Product
Year
Quarter
CDID

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U
1.0,TOPSI9,,,UK Production Turnover,,,,,,,,,,,,,,,,,
2.0,,,,Turnover in Production and Services Industries,,,,,,,,,,,,,,,,,
3.0,,,,"Current price, not seasonally adjusted",,,,,,,,,,,,,£ million,,,,
4.0,Back to Contents,,,,,,,Manufacture of air and spacecraft and related machinery,,,,,,,,,,,,,
5.0,,,,,Building of ships and boats,,,,,,Manufacture of other transport equipment,,,,,,,,,,
6.0,,,,,,,,,,,,,,Manufacture of furniture,,,Other manufacturing,,,,
7.0,,,,,,,,,,,,,,,,,,,,,
8.0,,,,,30.1,,,30.3,,,30.2/4/9 (30OTHER),,,31,,,32,,,,
9.0,,,,,JQR4,,,JQS8,,,JQU4,,,JQV2,,,JQV5,,,,


# Outputs

The tidy data can be [downloaded here](./service-industry.csv) and a full inline preview of the tidydata generated is shown below for those people who'd prefer to scroll.

In [3]:
print(tidy_data)

0,1,2,3,4
Value,Product,Year,Quarter,CDID
4787.6,Building of ships and boats,2012,All,JQR4
21632.8,Manufacture of air and spacecraft and related machinery,2012,All,JQS8
2162.2,Manufacture of other transport equipment,2012,All,JQU4
6722.5,Manufacture of furniture,2012,All,JQV2
8784.8,Other manufacturing,2012,All,JQV5
4484.8,Building of ships and boats,2013,All,JQR4
24556.8,Manufacture of air and spacecraft and related machinery,2013,All,JQS8
2487.0,Manufacture of other transport equipment,2013,All,JQU4
6821.2,Manufacture of furniture,2013,All,JQV2



