# ILCH (Index Of Labour Costs Per Hour) - Growth

-----

### Requirements

We need to make a dataset from two input sources, the sa (seasonal adjusted) and nsa (non seasonally adjusted) input.
excels.

Use and combine tabs 3,4,5  & 6 from **both** sources.

#### Observations & Dimensions

The `observations` should be apparent.

The required dimensions are:

* **Geography** - it's all UK level data (the code for UK is "K02000001")
* **Time** - either a year, or year & quarter in the format YYYY QQ
* **Seasonal Adjustment** - either "Seasonally Adjusted" or "Non Seasonaly Adjusted"
* **Growth Type** - either "quarter on quarter" or "year on year" 
* **Labour** - one of "Labour Costs per Hour", "Wage Costs per Hour", "Other Costs per Hour" "Labour Costs per Hour Excluding Bonuses and Arrears"

* **Indicator**  - the big merged cells, e.g "ILCH_A
Agriculture, Forestry and Fishing" etc


-----
    
Notes:

It's always worth getting the file out of /sources and having a look over.

In [1]:
from databaker.framework import *
import pandas as pd
import numpy as np

# the Seasonally Adjusted Tabs
sa_tabs = loadxlstabs("./sources/ilchtablestemplatesa.xls")

Loading ./sources/ilchtablestemplatesa.xls which has size 437248 bytes
Table names: ['INTRODUCTION', 'DEFINITIONS', '1. Industry index SA', '2. Sector index SA', '3. Industry annual growth SA', '4. Sector annual growth SA', '5. Industry quarterly growth SA', '6. Sector quarterly growth SA', '7. Industry costs SA', '8. Sector costs SA']


In [2]:
tabs1 = ['3. Industry annual growth SA', '4. Sector annual growth SA', \
         '5. Industry quarterly growth SA', \
         '6. Sector quarterly growth SA']

In [3]:
sa_tabs = loadxlstabs("./sources/ilchtablestemplatesa.xls", tabs1)

Loading ./sources/ilchtablestemplatesa.xls which has size 437248 bytes
Table names: ['3. Industry annual growth SA', '4. Sector annual growth SA', '5. Industry quarterly growth SA', '6. Sector quarterly growth SA']


In [4]:
# the None Seasonally Adjusted Tabs
nsa_tabs1 = loadxlstabs("./sources/ilchtablestemplatensa.xls")

Loading ./sources/ilchtablestemplatensa.xls which has size 297984 bytes
Table names: ['INTRODUCTION', 'DEFINITIONS', '1. Industry index', '2. Sector index', '3. Industry growth rates', '4. Sector growth rates', '5. Industry costs', '6. Sector costs']


In [5]:
tabs2 = ['3. Industry growth rates', '4. Sector growth rates', \
         '5. Industry costs', '6. Sector costs']

In [6]:
nsa_tabs = loadxlstabs("./sources/ilchtablestemplatensa.xls", tabs2)

Loading ./sources/ilchtablestemplatensa.xls which has size 297984 bytes
Table names: ['3. Industry growth rates', '4. Sector growth rates', '5. Industry costs', '6. Sector costs']


In [7]:
tabs = sa_tabs + nsa_tabs

In [8]:
tidied_sheets = []

for tab in tabs:
    bottomblock = tab.filter("p=Provisional").expand(DOWN)\
        .expand(RIGHT).is_not_blank()
    observations = tab.excel_ref("B7").expand(DOWN).expand(RIGHT)\
        .is_not_blank() 
    geography = "K02000001"
    time = tab.excel_ref("A7").expand(DOWN).is_not_blank()
    if "SA" in tab.name:
        seasonal_adjustment = "Seasonally Adjusted"
    else:
        seasonal_adjustment = "Non Seasonaly Adjusted"
    
    if "annual growth" in tab.name:
        growth_type = "year on year"
    else:
        growth_type = 'quarter on quarter'
    
    labour = tab.excel_ref("B6").expand(RIGHT).is_not_blank()
    indicator = tab.excel_ref("B5").expand(RIGHT).is_not_blank()
    
    dimensions = [
        HDimConst("Geography", geography),
        HDimConst("Seasonal Adjustment", seasonal_adjustment),
        HDimConst("Growth Type", growth_type),
        HDim(indicator, 'Indicator', CLOSEST, LEFT),
        HDim(labour, "Labour", DIRECTLY, ABOVE),
        HDim(time, "Time", DIRECTLY, LEFT)
            ]
    cs = ConversionSegment(tab, dimensions, observations) # < --- processing
    tidy_sheet = cs.topandas() #dataframe
    tidied_sheets.append(tidy_sheet) # <-- adding result of processing this tab to our list
    











In [9]:
datacube = pd.concat(tidied_sheets)

In [10]:
datacube

Unnamed: 0,OBS,Geography,Seasonal Adjustment,Growth Type,Indicator,Labour,Time
0,16.8,K02000001,Seasonally Adjusted,year on year,"ILCH_A\nAgriculture, Forestry and Fishing",Labour Costs per Hour,2001Q1
1,17.7,K02000001,Seasonally Adjusted,year on year,"ILCH_A\nAgriculture, Forestry and Fishing",Wage Costs per Hour,2001Q1
2,9.7,K02000001,Seasonally Adjusted,year on year,"ILCH_A\nAgriculture, Forestry and Fishing",Other Costs per Hour,2001Q1
3,15.7,K02000001,Seasonally Adjusted,year on year,"ILCH_A\nAgriculture, Forestry and Fishing",Labour Costs per Hour Excluding Bonuses and Ar...,2001Q1
4,19.7,K02000001,Seasonally Adjusted,year on year,ILCH_B\nMining and Quarrying,Labour Costs per Hour,2001Q1
...,...,...,...,...,...,...,...
1219,19.2,K02000001,Non Seasonaly Adjusted,quarter on quarter,Construction,Labour Costs per Hour Excluding Bonuses and Ar...,2019Q2 (p)
1220,14.3,K02000001,Non Seasonaly Adjusted,quarter on quarter,"Wholesaling, Retailing, Hotels& Restaurants",Labour Costs per Hour,2019Q2 (p)
1221,12.6,K02000001,Non Seasonaly Adjusted,quarter on quarter,"Wholesaling, Retailing, Hotels& Restaurants",Wage Costs per Hour,2019Q2 (p)
1222,1.7,K02000001,Non Seasonaly Adjusted,quarter on quarter,"Wholesaling, Retailing, Hotels& Restaurants",Other Costs per Hour,2019Q2 (p)


In [11]:
datacube["Indicator"].replace(r'\n', " ", inplace=True, regex=True)

In [12]:
datacube.loc[datacube["Time"].str.len() >= 5, "Time"] = datacube["Time"].str[:6]

In [13]:
datacube

Unnamed: 0,OBS,Geography,Seasonal Adjustment,Growth Type,Indicator,Labour,Time
0,16.8,K02000001,Seasonally Adjusted,year on year,"ILCH_A Agriculture, Forestry and Fishing",Labour Costs per Hour,2001Q1
1,17.7,K02000001,Seasonally Adjusted,year on year,"ILCH_A Agriculture, Forestry and Fishing",Wage Costs per Hour,2001Q1
2,9.7,K02000001,Seasonally Adjusted,year on year,"ILCH_A Agriculture, Forestry and Fishing",Other Costs per Hour,2001Q1
3,15.7,K02000001,Seasonally Adjusted,year on year,"ILCH_A Agriculture, Forestry and Fishing",Labour Costs per Hour Excluding Bonuses and Ar...,2001Q1
4,19.7,K02000001,Seasonally Adjusted,year on year,ILCH_B Mining and Quarrying,Labour Costs per Hour,2001Q1
...,...,...,...,...,...,...,...
1219,19.2,K02000001,Non Seasonaly Adjusted,quarter on quarter,Construction,Labour Costs per Hour Excluding Bonuses and Ar...,2019Q2
1220,14.3,K02000001,Non Seasonaly Adjusted,quarter on quarter,"Wholesaling, Retailing, Hotels& Restaurants",Labour Costs per Hour,2019Q2
1221,12.6,K02000001,Non Seasonaly Adjusted,quarter on quarter,"Wholesaling, Retailing, Hotels& Restaurants",Wage Costs per Hour,2019Q2
1222,1.7,K02000001,Non Seasonaly Adjusted,quarter on quarter,"Wholesaling, Retailing, Hotels& Restaurants",Other Costs per Hour,2019Q2
