# ASHE Table 7

-----

### Requirements

"Annual Summary of Hours and Earnings"


#### Observations & Dimensions

The `observations` are the numbers in the percentile columns.

The required dimensions are:

* **Geography** - in the `Code` column, one letter followed by 8 digits
* **Percentiles** - 10,20,30, etc
* **Time** - year, 4 digits
* **Gender** - Male, Female, All
* **Working Pattern** - Full time, Part time, All
* **Statistics** - The "topic" of the dataset, i.e "monthly pay net etc", in the filename

-----
    
Notes:

The "statistics" seems pointless because we're looking at one file. In production there are 24 per year per ASHE table.

It's always worth getting the file out of /sources and having a look over.

In [2]:
from databaker.framework import *
import pandas as pd
import numpy as np

tabs = loadxlstabs("./sources/PROV - Work Geography Table 7.1a   Weekly pay - Gross 2018.xls")

Loading ./sources/PROV - Work Geography Table 7.1a   Weekly pay - Gross 2018.xls which has size 846336 bytes
Table names: ['Notes', 'All', 'Male', 'Female', 'Full-Time', 'Part-Time', 'Male Full-Time', 'Male Part-Time', 'Female Full-Time', 'Female Part-Time']


In [4]:
tabs1 = ['All', 'Male', 'Female', 'Full-Time', 'Part-Time', \
         'Male Full-Time', 'Male Part-Time',\
         'Female Full-Time', 'Female Part-Time']

In [5]:
tabs = loadxlstabs("./sources/PROV - Work Geography Table 7.1a   Weekly pay - Gross 2018.xls", tabs1)

Loading ./sources/PROV - Work Geography Table 7.1a   Weekly pay - Gross 2018.xls which has size 846336 bytes
Table names: ['All', 'Male', 'Female', 'Full-Time', 'Part-Time', 'Male Full-Time', 'Male Part-Time', 'Female Full-Time', 'Female Part-Time']


In [6]:
tidied_sheets = []
for tab in tabs:
    
    bottom_block = tab.excel_ref("A438").expand(DOWN).expand(RIGHT).is_not_blank()
    right_block = tab.excel_ref("S6").expand(DOWN).expand(RIGHT).is_not_blank()
    observations = tab.excel_ref("H6").expand(DOWN).expand(RIGHT).is_not_blank() \
    - bottom_block - right_block
    
    geography = tab.excel_ref("B6").expand(DOWN).is_not_blank() - bottom_block
    percentile = tab.excel_ref("H5").expand(RIGHT).is_not_blank()
    time = "2018" # tab.excel_ref("A1") 
    
    if "Male" in tab.name:
        gender = "Male"
    elif "Female" in tab.name:
        gender = "Female"
    else:
        gender = "All"
    
    if "Full-Time" in tab.name:
        work_pattern = "Full time"
    elif "Part-Time" in tab.name:
        work_pattern = "Part time"
    else:
        work_pattern = "All"
    
    statistics =  "weekly pay" # tab.excel_ref("A1")
    
    dimensions = [ 
        HDimConst("Time", time),
        HDimConst("Statistics", statistics),
        HDimConst("Gender", gender),
        HDimConst("Working Pattern", work_pattern),
        HDim(geography, 'Geography', CLOSEST, UP),
        HDim(percentile, "Percentile", DIRECTLY, ABOVE)
    ]
    
    cs = ConversionSegment(tab, dimensions, observations) # < --- processing
    tidy_sheet = cs.topandas() #dataframe
    tidied_sheets.append(tidy_sheet) # <-- adding result of processing this tab to our list
    












In [7]:
datacube = pd.concat(tidied_sheets)

In [8]:
datacube

Unnamed: 0,OBS,DATAMARKER,Time,Statistics,Gender,Working Pattern,Geography,Percentile
0,145.2,,2018,weekly pay,All,All,K02000001,10.0
1,241.1,,2018,weekly pay,All,All,K02000001,20.0
2,290.2,,2018,weekly pay,All,All,K02000001,25.0
3,325.3,,2018,weekly pay,All,All,K02000001,30.0
4,389.5,,2018,weekly pay,All,All,K02000001,40.0
...,...,...,...,...,...,...,...,...
4295,220.5,,2018,weekly pay,Female,Part time,N92000002,60.0
4296,249.9,,2018,weekly pay,Female,Part time,N92000002,70.0
4297,270,,2018,weekly pay,Female,Part time,N92000002,75.0
4298,294.7,,2018,weekly pay,Female,Part time,N92000002,80.0


In [9]:
datacube.drop(columns="DATAMARKER", inplace=True)

In [10]:
datacube

Unnamed: 0,OBS,Time,Statistics,Gender,Working Pattern,Geography,Percentile
0,145.2,2018,weekly pay,All,All,K02000001,10.0
1,241.1,2018,weekly pay,All,All,K02000001,20.0
2,290.2,2018,weekly pay,All,All,K02000001,25.0
3,325.3,2018,weekly pay,All,All,K02000001,30.0
4,389.5,2018,weekly pay,All,All,K02000001,40.0
...,...,...,...,...,...,...,...
4295,220.5,2018,weekly pay,Female,Part time,N92000002,60.0
4296,249.9,2018,weekly pay,Female,Part time,N92000002,70.0
4297,270,2018,weekly pay,Female,Part time,N92000002,75.0
4298,294.7,2018,weekly pay,Female,Part time,N92000002,80.0
