# Migration data

Here we download migration data from [here](https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/migrationwithintheuk/datasets/localareamigrationindicatorsunitedkingdom)

We will use this data to create indicators about labour flows in and out of a LAD

The data is only available at the NUTS3 level so as part of this we will have to create a tool to lookup NUTS3 to NUTS2. This might be useful in other analyses of the data.

## Preamble

In [None]:
%run ../notebook_preamble.ipy

In [None]:
# Imports

In [None]:
# Functions

In [None]:
# Directories

if 'migration' not in os.listdir('../../data/raw/'):
    os.mkdir('../../data/raw/migration')
    
if 'migration' not in os.listdir('../../data/processed/'):
    os.mkdir('../../data/processed/migration')

## 1. Collect data

The migration file has various components. We will focus on sheet 2, which contains information about 'Long-Term International and Internal migration 'component of population change' data.  Rates can be calculated using population estimates data'.

Long-term migration feels more strongly linked with the idea of absorptive capacity and long term comparative advantage

In [None]:
#Collect the data
file = requests.get(
    'https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fpopulationandmigration%2fmigrationwithintheuk%2fdatasets%2flocalareamigrationindicatorsunitedkingdom%2fcurrent/lamisspreadsheet.xlsx')

#Save it
with open(f'../../data/raw/migration/{today_str}_migration.xls','wb') as outfile:
    outfile.write(file.content)

In [None]:
#Read the file
migration = pd.read_excel('../../data/raw/migration/7_11_2019_migration.xls',sheet_name=1,header=None)

## 2. Tidy data

Unfortunately the data is only available in a wide, gnarly format. 

We will try to turn it into a wide dataset that can be analysed more easily.

`fillna(method='ffill')` will be our friend

In [None]:
migration.head()

In [None]:
# Fill missing values

In [None]:
migration.fillna(method='ffill',axis=1,inplace=True)

In [None]:
migration.head()

Note that the above fills the missing value below the population estimate as if it was an outflow. We will remove that later

In [None]:
#Join the first three values in each column to create a variable with information about year, variable and direction
col_name = migration.loc[:2].fillna('').apply(lambda x: '__'.join(list(x)))

print(col_name[:4][3])

In [None]:
#Remove the four first rows, which now are redundant
migration_2 = migration.loc[4:]

migration_2.columns = col_name

In [None]:
migration_2.head()

In [None]:
#Melt the df

In [None]:
migration_long = pd.melt(migration_2,id_vars=migration_2.columns[:2])

In [None]:
migration_long.head(n=10)

In [None]:
# Drop the missing rows (they are a gap between geographies)

In [None]:
migration_long.dropna(axis=0,subset=['Area Code____'],inplace=True)

In [None]:
migration_long.head(n=10)

In [None]:
#Now we want to split the variable into something more meaningful

In [None]:
#Period contains the year, variable 2 the name and direction whether it is inflow or outflow
migration_long['period'],migration_long['variable_2'],migration_long['direction'] = [
    [var.split('__')[n] for var in migration_long['variable']] for n in [0,1,2]]

In [None]:
migration_long.tail()

In [None]:
#Rename variables

In [None]:
#Period we just split on hyphens and remove one to have the first half year
migration_long['period'] = [int(x.strip().split('-')[-1])-1 for x in migration_long['period']]

In [None]:
migration_long['variable_2'] = ['population_estimate' if 'Population' in x else 'internal_migration' if 'Internal' in x 
                               else 'international_migration' for x in migration_long['variable_2']]

In [None]:
migration_long['direction'] = [np.nan if len(x)==0 else x.lower() for x in migration_long['direction']]

In [None]:
migration_clean = migration_long[['Area Code____','Area Name____','period','variable_2','direction','value']]

In [None]:
migration_clean.rename(columns={'Area Code____':'area_code','Area Name____':'area_name'},inplace=True)

In [None]:
# Rearrange so we can normalise by population if we want

In [None]:
migration_rearranged = pd.merge(migration_clean.loc[migration_clean['variable_2']=='population_estimate'],
        migration_clean.loc[migration_clean['variable_2']!='population_estimate'],
        left_on=['area_code','area_name','period'],right_on=['area_code','area_name','period'])

migration_rearranged.head()

In [None]:
migration_rearranged.drop(axis=1,labels=['variable_2_x','direction_x'],inplace=True)

In [None]:
migration_rearranged.head()

In [None]:
migration_rearranged.rename(columns={'value_x':'population_estimate','variable_2_y':'variable','direction_y':'direction',
                                    'value_y':'value'},inplace=True)

In [None]:
migration_rearranged.head()

In [None]:
#Remove a few observations with no codes
migration_rearranged= migration_rearranged.loc[[len(x)>2 for x in migration_rearranged['area_code']]]

In [None]:
migration_rearranged.tail()

## 2. Convert to NUTS2

The data in this table is available at the NUTS1 and NUTS3 region. We want to convert it to NUTS2.

This is not going to be as easy as we hoped. The reason is that various codes are out of date in the available lookups. To address this we will use the National Statistics Postcode Lookup, which contains the most up to date lookups.

We use the latest nspl file from [here](https://geoportal.statistics.gov.uk/datasets/national-statistics-postcode-lookup-august-2019)

What we will do is go:

```
nspl -> get current lad to nuts4 -> match nuts4 with nuts2 using a lookup in the NSPL documentation -> match this with the migration file

```




##### LADS to NUTS4

In [None]:
nspl = pd.read_csv('../../data/external/nspl/Data/NSPL_AUG_2019_UK.csv')

In [None]:
#Get the LAD to NUTS matches
laua_nuts = nspl.drop_duplicates('nuts').reset_index(drop=True)[['laua','nuts']]

In [None]:
laua_nuts.head()

It is a one LAD to many nuts style concordance

##### NUTS4 to NUTS2

In [None]:
lad_nuts_lookup = pd.read_csv('../../data/external/nspl/Documents/LAU219_LAU119_NUTS18_MAY_2019_UK_LU.csv')

In [None]:
new_lookup = pd.merge(laua_nuts,lad_nuts_lookup,left_on='nuts',right_on='LAU219CD')[['laua','LAU119NM','NUTS218CD','NUTS218NM']]

In [None]:
new_lookup = new_lookup.drop_duplicates('laua').reset_index(drop=True)[['laua','NUTS218CD','NUTS218NM']]

new_lookup.head()

##### LADs to NUTS2 (in the migration file)

In [None]:
migration_w_nuts = migration_rearranged.merge(new_lookup,left_on='area_code',right_on='laua')

In [None]:
migration_w_nuts.head()

In [None]:
set(new_lookup['laua'])-set(migration_rearranged['area_code'])

All matched!

##### Regroup LADS into NUTS

In [None]:
migration_regrouped = migration_w_nuts.groupby(
    ['NUTS218NM','NUTS218CD','direction','variable','period'])[['population_estimate','value']].sum().reset_index(drop=False)

In [None]:
migration_regrouped.rename(columns={'NUTS218NM':'nuts_name','NUTS218CD':'nuts_code'},inplace=True)

In [None]:
migration_wide = migration_regrouped.pivot_table(index=['nuts_name','nuts_code','variable','period','population_estimate'],
                               columns='direction',values='value').reset_index(drop=False)

In [None]:
migration_wide['net'] = (migration_wide['inflow']-migration_wide['outflow'])/migration_wide['population_estimate']

In [None]:
migration_wide.head()

In [None]:
migration_wide.to_csv(f'../../data/processed/migration/{today_str}_migration_nuts.csv')