# Output In Construction: Table 1a

## Source

For this example we're extracting the table "1a" as shown below (note - preview cropped to row 13 for reasons of practicality):

In [4]:
from typing import List
from datachef import acquire, preview
from datachef.selection import XlsxSelectable

tables: List[XlsxSelectable] = acquire.xlsx.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xlsx/ons-oic.xlsx")
preview(tables[3], bounded="A1:O13")

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O
1.0,"Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector",,,,,,,,,,,,,,
2.0,This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.,,,,,,,,,,,,,,
3.0,Source: Construction Output and Employment from the Office for National Statistics,,,,,,,,,,,,,,
4.0,2019=100,,,,,,,,,,,,,,
5.0,Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work,Private commercial new work,All new work,Public housing R&M,Private housing R&M,Total housing R&M,Non housing R&M,All R&M,All work
6.0,Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A,MV3B,MV3C,MV3D,MV3E,MV3F,MV3G,MV3H,MV3I
7.0,1997,30.8,44.8,42.6,61.2,57.6,152.1,84.3,63.5,124.1,93,101.8,79.2,89.3,72
8.0,1998,24.9,45.3,42,59.5,60.7,155,91.4,65.2,116,94.9,100.3,80.1,89.1,73.1
9.0,1999,21.6,40.7,37.7,57.9,68.3,159.9,102.3,67.2,111.2,93.7,97.9,79.6,87.8,74


From an xlsx source which can be [downloaded here](https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xlsx/ons-oic.xlsx).

# Requirements

- We'll take time period from the left hand column
- We'll call the row 5 headers Housing
- We'll call row 6 Identifier
- We'll call the observations column "Value"

In [5]:
from typing import List

from datachef import acquire, preview
from datachef.direction import down, right
from datachef.output import TidyData, Column
from datachef.selection import XlsxSelectable

tables: List[XlsxSelectable] = acquire.xlsx.http("https://raw.githubusercontent.com/mikeAdamss/datachef/main/tests/fixtures/xlsx/ons-oic.xlsx")
table = tables[3]

# Do sensible things to start
assert table.name == "Table 1a"
anchor = table.excel_ref('A').re("Time period").assert_one().label_as("Anchor Cell")

observations = anchor.shift(right).shift(down(2)).expand(right).expand(down).is_not_blank().label_as("Value")
identifier = anchor.shift(down).fill(right).label_as("Identifier")
housing = anchor.fill(right).label_as("Housing")
time_period = anchor.shift(down).fill(down).label_as("Time Period")

# Create a bounded preview inline but also write the full preview to path
preview(anchor, observations, identifier, housing, time_period, bounded="A3:O13")
preview(anchor, observations, identifier, housing, time_period, path="oic-1a-table.html")

tidy_data = TidyData(
    observations,
    Column(identifier.finds_observations_directly(down)),
    Column(housing.finds_observations_directly(down)),
    Column(time_period.finds_observations_directly(right)),
)

tidy_data.to_csv("oic-1a-table.csv")

0
Anchor Cell
Value
Identifier
Housing
Time Period

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O
3.0,Source: Construction Output and Employment from the Office for National Statistics,,,,,,,,,,,,,,
4.0,2019=100,,,,,,,,,,,,,,
5.0,Time period,Public new housing,Private new housing,Total new housing,Infrastructure new work,Public other new work,Private industrial new work,Private commercial new work,All new work,Public housing R&M,Private housing R&M,Total housing R&M,Non housing R&M,All R&M,All work
6.0,Dataset identifier code,MV36,MV37,MVL7,MV38,MV39,MV3A,MV3B,MV3C,MV3D,MV3E,MV3F,MV3G,MV3H,MV3I
7.0,1997,30.8,44.8,42.6,61.2,57.6,152.1,84.3,63.5,124.1,93,101.8,79.2,89.3,72
8.0,1998,24.9,45.3,42,59.5,60.7,155,91.4,65.2,116,94.9,100.3,80.1,89.1,73.1
9.0,1999,21.6,40.7,37.7,57.9,68.3,159.9,102.3,67.2,111.2,93.7,97.9,79.6,87.8,74
10.0,2000,27.1,45.5,42.6,54.3,64.7,142.7,103.1,67.3,107.6,94.1,96.9,83.8,89.6,74.7
11.0,2001,27.7,42.5,40.1,58.1,65.3,145.8,102.3,67.2,101.8,98.3,97.6,91.5,94,76


# Outputs

The full preview can be [downloaded here](./oic-1a-table.html).

The tidy data can be [downloaded here](./oic-1a-table.csv) and a full inline preview of the tidydata generated is shown below for those people who'd prefer to scroll.

In [6]:
print(tidy_data)

Value,Identifier,Housing,Time Period
30.8,MV36,Public new housing,1997
44.8,MV37,Private new housing,1997
42.6,MVL7,Total new housing,1997
61.2,MV38,Infrastructure new work,1997
57.6,MV39,Public other new work,1997
152.1,MV3A,Private industrial new work,1997
84.3,MV3B,Private commercial new work,1997
63.5,MV3C,All new work,1997
124.1,MV3D,Public housing R&M,1997
93.0,MV3E,Private housing R&M,1997



