# Migration - Local Area

-----

### Requirements



#### Observations & Dimensions

We're taking 5 tabs (see below table). The `observations` and the `indictor` will vary a little for each tab,

|Tab name| Get obs from columns named | Set 'Indictor' dimension to |
|---|---|---|
| Short Term Migration Flow | Short-Term International Migration Inflow Estimates | Inflow Estimates
| Non-UK Born Population | Resident Population | Not UK Born Population
| Non British Population | Resident Population |  Not British Population
| NINo Registrations | Migrant NINo Registrations | NINo Registrations
| GP Registrations | New Migrant GP Registrations | New Gp Registrations


The required dimensions are:

* **Geography** - use the area code column
* **Time** - from the top and just the year, so "Jan 2009 to Dec 2009" == 2009
* **Indictor** - as per above table

-----
    
Notes:

It's always worth getting the file out of /sources and having a look over.

In [1]:
from databaker.framework import *
import pandas as pd

# the Seasonally Adjusted Tabs
sa_tabs = loadxlstabs("./sources/lamisspreadsheet.xls")

Loading ./sources/lamisspreadsheet.xls which has size 1138688 bytes
Table names: ['Notes', 'Migration Flows', 'Short-Term Migration Inflows', 'Non-UK Born Population', 'Non-British Population', 'NINo Registrations', 'GP Registrations', 'Births to Non-UK Born Mothers']


In [2]:
tabs_list = ['Short-Term Migration Inflows', 'Non-UK Born Population', \
         'Non-British Population', 'NINo Registrations', 'GP Registrations']

In [3]:
tabs = loadxlstabs("./sources/lamisspreadsheet.xls", tabs_list)

Loading ./sources/lamisspreadsheet.xls which has size 1138688 bytes
Table names: ['Short-Term Migration Inflows', 'Non-UK Born Population', 'Non-British Population', 'NINo Registrations', 'GP Registrations']


In [4]:

# the Seasonally Adjusted Tabs
tidied_sheets = []
for tab in tabs:
    bottom_block = tab.excel_ref("A435").expand(DOWN).expand(RIGHT).is_not_blank()
    if tab.name == "Short-Term Migration Inflows":
        indicator = tab.excel_ref("C2").expand(RIGHT).filter('Short-Term International Migration Inflow Estimates').is_not_blank()\
        | tab.excel_ref("S2").expand(RIGHT).filter('Short-Term International Migration Inflow Estimatesp').is_not_blank()    
        
    elif tab.name in ['NINo Registrations', 'GP Registrations']:
        if tab.name == 'NINo Registrations':
            indicator = tab.excel_ref("C2").expand(RIGHT).filter('Migrant NINo Registrations').is_not_blank()
        else:
            indicator = tab.excel_ref("C2").expand(RIGHT).filter('New Migrant GP Registrations').is_not_blank()
        
    elif tab.name in ["Non-UK Born Population", 'Non-British Population']:
        if tab.name == "Non-UK Born Population":
            indicator = tab.excel_ref("C3").expand(RIGHT).filter("Non-UK Born Estimate").is_not_blank()  
        else:
            indicator = tab.excel_ref("C3").expand(RIGHT).filter("Non-British Estimate").is_not_blank()
    
    time = tab.excel_ref("C1").expand(RIGHT).is_not_blank()
    geography = tab.excel_ref("A4").expand(DOWN).is_not_blank() 
    observations = geography.waffle(indicator)
    
    dimensions = [ 
        HDim(indicator, 'Indicator', DIRECTLY, ABOVE),
        HDim(geography, 'Geography', DIRECTLY, LEFT),
        HDim(time, "Year", CLOSEST, LEFT)
    ]
    
    cs = ConversionSegment(tab, dimensions, observations) # < --- processing
    tidy_sheet = cs.topandas() #dataframe
    tidied_sheets.append(tidy_sheet) # <-- adding result of processing this tab to our list
    








In [5]:
datacube = pd.concat(tidied_sheets)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


In [8]:
datacube

Unnamed: 0,DATAMARKER,Geography,Indicator,OBS,Year
0,,E92000001,Short-Term International Migration Inflow Esti...,102000,Mid-2008 to Mid-2009
1,,E92000001,Short-Term International Migration Inflow Esti...,112973,Mid-2009 to Mid-2010
2,,E92000001,Short-Term International Migration Inflow Esti...,93437.7,Mid-2010 to Mid-2011
3,,E92000001,Short-Term International Migration Inflow Esti...,107195,Mid-2011 to Mid-2012
4,,E92000001,Short-Term International Migration Inflow Esti...,117172,Mid-2012 to Mid-2013
...,...,...,...,...,...
4265,,E06000014,New Migrant GP Registrations,3252,Mid-2013 to Mid-2014
4266,,E06000014,New Migrant GP Registrations,3018,Mid-2014 to Mid-2015
4267,,E06000014,New Migrant GP Registrations,2918,Mid-2015 to Mid-2016
4268,,E06000014,New Migrant GP Registrations,3061,Mid-2016 to Mid-2017


In [9]:
datacube.drop(columns="DATAMARKER", inplace=True)
datacube["Year"] = datacube["Year"].str[4:8].astype(int)
datacube = datacube.reindex(columns=["OBS", "Geography", "Year", "Indicator"])

In [10]:
indicator_dict = {"Inflow Estimates": "Inflow Estimates",
           "Non-UK Born Estimate": "Not-UK Born Population", 
        "Non-British Estimate": "Not British Population",
        "NINo Registrations": "NINo Registrations",
        'GP Registrations': "New GP Registrations"
            }
for key,value in indicator_dict.items():
    datacube.loc[datacube["Indicator"].str.contains(key),"Indicator"] = value

In [11]:
datacube

Unnamed: 0,OBS,Geography,Year,Indicator
0,102000,E92000001,2008,Inflow Estimates
1,112973,E92000001,2009,Inflow Estimates
2,93437.7,E92000001,2010,Inflow Estimates
3,107195,E92000001,2011,Inflow Estimates
4,117172,E92000001,2012,Inflow Estimates
...,...,...,...,...
4265,3252,E06000014,2013,New GP Registrations
4266,3018,E06000014,2014,New GP Registrations
4267,2918,E06000014,2015,New GP Registrations
4268,3061,E06000014,2016,New GP Registrations
