# Tourism

-----

### Requirements

Extract the data from tab 2 and tab 4.

#### Observations & Dimensions

For `observations` we want the `purpose` data, we **dont** want the `all visits`.


The required dimensions are:

* **Geography** - it's all UK level data (the code for UK is "K02000001")
* **Time** - in the format MMM YYYY
* **Purpose** - one of Holiday, Business, Visiting friends or relatives, Miscellaneous
* **Direction of Travel** - either "overseas visits to the uk", or "uk visits abroad"
* **Units** - constant value of '1000'

-----
Notes:

* We dont want the data markings against 2019 dates
* We don't want the ad hoc summary data at the bottom , "latest three months.." etc

In [1]:
from databaker.framework import *
import pandas as pd

tabs2and4 = ['Table 2', 'Table 4']
tabs = loadxlstabs("./sources/tourism.xls", tabs2and4) # load tabs

Loading ./sources/tourism.xls which has size 180736 bytes
Table names: ['Table 2', 'Table 4']


In [2]:
len(tabs)

2

In [3]:
tidied_sheets = [] # datacube for the two cubes

for tab in tabs:
    bottomblock = tab.filter("Latest three months").expand(DOWN).expand(RIGHT).is_not_blank()
    
    observations = tab.excel_ref("G7").expand(DOWN).expand(RIGHT).is_not_blank() - bottomblock
    
    geography = "K02000001"
    month = tab.excel_ref("B7").expand(DOWN).is_not_blank() - bottomblock
    year = tab.excel_ref("A7").expand(DOWN).is_not_blank() - bottomblock
    purpose = tab.excel_ref("G4").expand(RIGHT).is_not_blank()
    
    direction = tab.excel_ref("B1")
    units = "thousands" #1000
    #observations = month.waffle(purpose)
    
    dimensions = [
        HDimConst("Geography", geography),
        HDimConst("Units", units),
        HDimConst("Direction", direction.value),
        HDim(month, "Month", DIRECTLY, LEFT),
        HDim(year, "Year", CLOSEST, UP),
        HDim(purpose, 'Purpose', DIRECTLY, ABOVE)
            ]
    cs = ConversionSegment(tab, dimensions, observations) # < --- processing
    tidy_sheet = cs.topandas() #dataframe
    tidied_sheets.append(tidy_sheet) # <-- adding result of processing this tab to our list





In [4]:
datacube = pd.concat(tidied_sheets)

In [5]:
datacube

Unnamed: 0,OBS,Geography,Units,Direction,Month,Year,Purpose
0,662.512925,K02000001,thousands,Purpose of overseas residents' visits to the U...,Jan,2015.0,Holiday
1,688.776441,K02000001,thousands,Purpose of overseas residents' visits to the U...,Jan,2015.0,Business
2,899.146350,K02000001,thousands,Purpose of overseas residents' visits to the U...,Jan,2015.0,Visiting friends or relatives
3,168.449570,K02000001,thousands,Purpose of overseas residents' visits to the U...,Jan,2015.0,Miscellaneous
4,639.824439,K02000001,thousands,Purpose of overseas residents' visits to the U...,Feb,2015.0,Holiday
...,...,...,...,...,...,...,...
211,120.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,May¹,2019.0,Miscellaneous
212,4850.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,June¹,2019.0,Holiday
213,610.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,June¹,2019.0,Business
214,1240.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,June¹,2019.0,Visiting friends or relatives


In [6]:
datacube["Direction"]\
.replace({"Purpose of overseas residents\' visits to the UK by month" : "Overseas visits to the UK", 
"Purpose of UK residents\' visits abroad by month" : "UK visits abroad"}, inplace=True)

In [7]:
datacube["Month"] = datacube["Month"].str[:3]
datacube["Year"] = datacube["Year"].astype(float).astype(int).astype(str)
datacube["Time"] = datacube["Month"] + " " + datacube["Year"]
datacube.drop(columns = ["Month", "Year"], inplace=True)

In [8]:
datacube

Unnamed: 0,OBS,Geography,Units,Direction,Purpose,Time
0,662.512925,K02000001,thousands,Overseas visits to the UK,Holiday,Jan 2015
1,688.776441,K02000001,thousands,Overseas visits to the UK,Business,Jan 2015
2,899.146350,K02000001,thousands,Overseas visits to the UK,Visiting friends or relatives,Jan 2015
3,168.449570,K02000001,thousands,Overseas visits to the UK,Miscellaneous,Jan 2015
4,639.824439,K02000001,thousands,Overseas visits to the UK,Holiday,Feb 2015
...,...,...,...,...,...,...
211,120.000000,K02000001,thousands,UK visits abroad,Miscellaneous,May 2019
212,4850.000000,K02000001,thousands,UK visits abroad,Holiday,Jun 2019
213,610.000000,K02000001,thousands,UK visits abroad,Business,Jun 2019
214,1240.000000,K02000001,thousands,UK visits abroad,Visiting friends or relatives,Jun 2019
