# Monthly GDP Tables: GVA

## Source

For this example we're extracting the table "GVA" as shown below (note - preview cropped to row 16 for reasons of practicality):

In [1]:
from tidychef import acquire, preview
from tidychef.selection import XlsSelectable

table: XlsSelectable = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/monthlygdptablesapril2023.xls", tables="GVA")
preview(table, bounded="A1:Z16")

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z
1.0,"GVA - Gross Value Added [note 1] & Sections A-T [note 3], [note 4], [note 5]",,,,,,,,,,,,,,,,,,,,,,,,,
2.0,This worksheet contains one table. Some cells refer to notes which can be found on the notes worksheet.,,,,,,,,,,,,,,,,,,,,,,,,,
3.0,Link to return to cover page,,,,,,,,,,,,,,,,,,,,,,,,,
4.0,"Source: GDP monthly estimate, Office for National Statistics",,,,,,,,,,,,,,,,,,,,,,,,,
5.0,Time Period,Category,Total GVA at basic prices (A - T),"Agriculture, forestry and fishing (A)",Total production industries (B - E),Mining and Quarrying (B),Manufacturing (C),"Electricity, gas, steam and air (D)","Water supply, sewerage etc (E)",Construction (F) [note 6],Total service industries (G-T),Wholesale and retail: repair of motor vehicles and motorcycles (G),Transport and storage (H),Accommodation and food service activites (I),Information and communication (J),Financial and insurance activities (K),Real estate activites (L),"Professional, scientific and technical activities (M)",Administrative and support service activities (N),Public administration and defence (O),Education (P),Human health and social work activities (Q),"Arts, entertainment and recreation (R)",Other service activities (S),"Activities of households as employers, undifferentiated goods and services (T)",
6.0,2019.0,Weight,1000.0,7.0,135.0,11.0,97.0,15.0,12.0,62.0,796.0,104.0,40.0,30.0,63.0,82.0,132.0,73.0,51.0,49.0,60.0,77.0,15.0,17.0,2.0,
7.0,[Not applicable],CDID,YBFR,L2KL,L2KQ,L2KR,L2KX,L2MW,L2N2,L2N8,L2NC,L2NE,L2NI,L2NQ,L2NT,L2O6,L2OC,L2OI,L2OX,L2P8,L2PA,L2PC,L2PJ,L2PP,L2PT,
8.0,2018.0,Annual Chained Volume Index [note 2],98.3,85.6,97.7,97.9,98.9,87.7,100.3,99.0,98.5,100.4,98.2,96.6,91.6,102.6,98.4,100.9,99.6,95.0,94.0,101.6,98.3,101.8,94.1,
9.0,2019.0,Annual Chained Volume Index [note 2],100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,


From an xls source which can be [downloaded here](https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/monthlygdptablesapril2023.xls).

One interesting thing to note here is the producer has intermingled CDID identifiers in with the primary observations values (see lines 7 and 13 above) - we need to be sure to account for this.

# Requirements

- We'll take cell C5 and cells directly right as the column "Identifier"
- We'll take cell C8 and cells directly downwards as the column "Category"
- We'll take all cells to the right of a column B value of "CDID" as the column "CDID"
- We'll take cell A8 and cells directly downwards as "Time Period"
- We're going to ignore the "Weight" values for this example.
- We're also going to ignore the bracketed text and just remove it for our purposes here.
- We'll take the observations are the principle table values minus the CDID headings that are intermingled. We'll use a column name of "Value" for them.

In [2]:
from tidychef import acquire, preview
from tidychef.direction import up, down, left, right
from tidychef.output import TidyData, Column
from tidychef.selection import XlsSelectable

table: XlsSelectable = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/monthlygdptablesapril2023.xls", tables="GVA")

# Sensible starting things
anchor = table.excel_ref("B7").label_as("Anchor Cell")
assert anchor.lone_value() == "CDID", "Anchor has moved position"

cdid = anchor.expand(down).filter(lambda x: x.value == "CDID").fill(right).label_as("CDID")
identifier = anchor.shift(up(2)).fill(right).label_as("Identifier")
category = anchor.fill(down).filter(lambda x: x.value != "CDID").label_as("Category")
time_period = category.shift(left).label_as("Time Period")

observations = (anchor.shift(right).shift(down).expand(right).expand(down).is_not_blank() - cdid).label_as("Value")

# Create a bounded preview inline but also write the full preview to path
preview(anchor, identifier, category, cdid, time_period, observations, bounded="A4:Z16")
preview(anchor, identifier, category, cdid, time_period, observations, path="monthly-gdp-gva-table.html")

tidy_data = TidyData(
    observations,
    Column(cdid.finds_observations_directly(down)),
    Column(identifier.finds_observations_directly(down)),
    Column(category.finds_observations_directly(right), apply=lambda x: x.split("[")[0].strip()),
    Column(time_period.finds_observations_directly(right), apply=lambda x: x.replace(".0", ""))
)
tidy_data.to_csv("monthly-gdp-gva-table.csv")

0
Anchor Cell
Identifier
Category
CDID
Time Period
Value

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z
4.0,"Source: GDP monthly estimate, Office for National Statistics",,,,,,,,,,,,,,,,,,,,,,,,,
5.0,Time Period,Category,Total GVA at basic prices (A - T),"Agriculture, forestry and fishing (A)",Total production industries (B - E),Mining and Quarrying (B),Manufacturing (C),"Electricity, gas, steam and air (D)","Water supply, sewerage etc (E)",Construction (F) [note 6],Total service industries (G-T),Wholesale and retail: repair of motor vehicles and motorcycles (G),Transport and storage (H),Accommodation and food service activites (I),Information and communication (J),Financial and insurance activities (K),Real estate activites (L),"Professional, scientific and technical activities (M)",Administrative and support service activities (N),Public administration and defence (O),Education (P),Human health and social work activities (Q),"Arts, entertainment and recreation (R)",Other service activities (S),"Activities of households as employers, undifferentiated goods and services (T)",
6.0,2019.0,Weight,1000.0,7.0,135.0,11.0,97.0,15.0,12.0,62.0,796.0,104.0,40.0,30.0,63.0,82.0,132.0,73.0,51.0,49.0,60.0,77.0,15.0,17.0,2.0,
7.0,[Not applicable],CDID,YBFR,L2KL,L2KQ,L2KR,L2KX,L2MW,L2N2,L2N8,L2NC,L2NE,L2NI,L2NQ,L2NT,L2O6,L2OC,L2OI,L2OX,L2P8,L2PA,L2PC,L2PJ,L2PP,L2PT,
8.0,2018.0,Annual Chained Volume Index [note 2],98.3,85.6,97.7,97.9,98.9,87.7,100.3,99.0,98.5,100.4,98.2,96.6,91.6,102.6,98.4,100.9,99.6,95.0,94.0,101.6,98.3,101.8,94.1,
9.0,2019.0,Annual Chained Volume Index [note 2],100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,
10.0,2020.0,Annual Chained Volume Index [note 2],89.4,96.5,101.3,96.8,100.1,109.1,104.8,86.0,87.6,81.6,70.3,59.9,98.6,100.3,100.0,92.5,89.0,95.2,82.5,69.1,71.9,86.3,78.0,
11.0,2021.0,Annual Chained Volume Index [note 2],96.1,102.1,108.6,85.5,109.8,114.6,112.9,98.0,93.7,84.6,77.9,78.4,104.4,105.5,99.9,101.7,99.1,97.4,92.3,82.2,83.7,78.7,73.9,
12.0,2022.0,Annual Chained Volume Index [note 2],100.2,105.7,105.6,87.2,105.7,110.5,115.2,104.0,98.9,82.4,87.3,106.2,114.1,105.1,99.6,107.8,112.6,101.2,98.6,89.7,103.5,81.1,65.6,


# Outputs

The full preview can be [downloaded here](./monthly-gdp-gva-table.html):

The tidy data can be [downloaded here](./monthly-gdp-gva-table.html) and a full inline preview of the tidydata generated is shown below for those people who'd prefer to scroll.

In [3]:
print(tidy_data)

0,1,2,3,4
Value,CDID,Identifier,Category,Time Period
98.3,YBFR,Total GVA at basic prices (A - T),Annual Chained Volume Index,2018
85.6,L2KL,"Agriculture, forestry and fishing (A)",Annual Chained Volume Index,2018
97.7,L2KQ,Total production industries (B - E),Annual Chained Volume Index,2018
97.9,L2KR,Mining and Quarrying (B),Annual Chained Volume Index,2018
98.9,L2KX,Manufacturing (C),Annual Chained Volume Index,2018
87.7,L2MW,"Electricity, gas, steam and air (D)",Annual Chained Volume Index,2018
100.3,L2N2,"Water supply, sewerage etc (E)",Annual Chained Volume Index,2018
99.0,L2N8,Construction (F) [note 6],Annual Chained Volume Index,2018
98.5,L2NC,Total service industries (G-T),Annual Chained Volume Index,2018



