# Vulnerability Ireland 

The 2016 Census is used to derive the vulnerability index for Ireland. In order to use this data, first the relevant data needs to be identified and then normalised. Below is the method used to do this.

## Census Data

The Central Statistics Office (CSO) has produced a dataset of [small area statistics](https://www.cso.ie/en/census/census2016reports/census2016smallareapopulationstatistics/) for the 2016 Census. This will be the main data source for use with the Irish Vulnerbility Assessment.


### R Libraries

The relvant R libraries are imported in to the kernal:

In [1]:
# load the libraries
library(tidyverse)

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.1.1       v purrr   0.3.2  
v tibble  2.1.1       v dplyr   0.8.0.1
v tidyr   0.8.3       v stringr 1.4.0  
v readr   1.3.1       v forcats 0.4.0  
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()


### Import the csv data

The data can be imported directly from the CSO website (this is the default) or using a local version.

In [3]:
#get the data from the CSO website
smallAreaCSOData <- read.csv("https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS2016_SA2017.csv",  header=TRUE, sep=",")

#get the data locally
# smallAreaCSOData <- read.csv("../../CSOData/SAPS2016_SA2017.csv", header=TRUE, sep=",", stringsAsFactors = FALSE)


## Select only the relevant data

In total there are 799 different variables in the small area dataset. However, only a smaller subset are useful for our purposes. We therefore need to extract the relevant data, then combine these to create our vulnerability indicators.

The dataset also includes data that is at the persons level (number of people in a small area) and the household level (number of households in a small area). As the preprocessing is slightly different for each, they are treated differently below.

### Small Area ID

First, we need to get the unique ID data for each of the small areas:

In [4]:
smallAreaID <- smallAreaCSOData[, c('GUID'), drop = FALSE]

### Persons Level Data

We then get the persons level data and combine the variables together to create indicators:

In [5]:
#PERSONS DATA

#POPULATION TOTAL
populationTotalData <- smallAreaCSOData[, 'T1_1AGETT', drop = FALSE]
names(populationTotalData)[1] <- 'populationTotal'

#AGE - YOUNG 
ageYoungVariables <- c(
    'T1_1AGE0T', #Age 0 - Total
    'T1_1AGE1T', #Age 1 - Total
    'T1_1AGE2T', #Age 2 - Total
    'T1_1AGE3T', #Age 3 - Total
    'T1_1AGE4T', #Age 4 - Total
    'T1_1AGE5T'  #Age 5 - Total
)

ageYoungData <- smallAreaCSOData[,ageYoungVariables, drop = FALSE]
ageYoungData$young <- apply(ageYoungData,1,sum)
ageYoungData <- select(ageYoungData, 'young')

#AGE - OLD 
ageOldVariables <- c(
    #'T1_1AGE65_69T', #Age 65 - 69 - Total - Scottish Flood Disadvantage uses over 75
    #'T1_1AGE70_74T', #Age 70 - 74 - Total - Scottish Flood Disadvantage uses over 75
    'T1_1AGE75_79T', #Age 75 - 79 - Total
    'T1_1AGE80_84T', #Age 80 - 84 - Total
    'T1_1AGEGE_85T'  #Age 85 and over - Total
)
ageOldData <- smallAreaCSOData[, ageOldVariables, drop = FALSE]
ageOldData$old <- apply(ageOldData,1,sum)
ageOldData <- select(ageOldData, 'old')

#PRIMARY SCHOOL AGE
primarySchoolAgeVariables <- c(
    'T1_1AGE4T',  #Age 4 - Total
    'T1_1AGE5T',  #Age 5 - Total
    'T1_1AGE6T',  #Age 6 - Total
    'T1_1AGE7T',  #Age 7 - Total
    'T1_1AGE8T',  #Age 8 - Total
    'T1_1AGE9T',  #Age 9 - Total
    'T1_1AGE10T', #Age 10 - Total
    'T1_1AGE11T', #Age 11 - Total
    'T1_1AGE12T'  #Age 12 - Total
)

primarySchoolAgeData <- smallAreaCSOData[, primarySchoolAgeVariables, drop = FALSE]
primarySchoolAgeData$priSch <- apply(primarySchoolAgeData,1,sum)
primarySchoolAgeData <- select(primarySchoolAgeData, 'priSch')


#HEALTH -  BAD HEALTH (Choice of: Very good, Good, Fair, Bad, Very bad, and Not stated) 
healthVariables <- c(
    'T12_3_BT', #Bad - Total
    'T12_3_VBT' #Very bad - Total
)
healthData <- smallAreaCSOData[, healthVariables, drop = FALSE]
healthData$poorHealth <- apply(healthData,1,sum)
healthData <- select(healthData, 'poorHealth')

#DISABILITY 
disabilitiesData <- smallAreaCSOData[, 'T12_1_T', drop = FALSE] 
disabilitiesData$disability <- apply(disabilitiesData,1,sum)
disabilitiesData <- select(disabilitiesData, 'disability')

#UNEMPLOYMENT 
unemploymentVariables <- c(
    'T8_1_LFFJT',   #Looking for first regular job - Total
    'T8_1_ULGUPJT', #Unemployed having lost or given up previous job - Total
    'T8_1_UTWSDT',  #Unable to work due to permanent sickness or disability - Total - MAY CORRELATE WITH HEALTH TOO MUCH
    'T8_1_LAHFT'    #Looking after home/family - Total - NOT SURE ABOUT THIS ONE
)

unemploymentData <- smallAreaCSOData[, unemploymentVariables, drop = FALSE]
unemploymentData$unemploy <- apply(unemploymentData,1,sum)
unemploymentData <- select(unemploymentData, 'unemploy')

#LOW SKILLED EMPLOYMENT
lowSkilledEmploymentVariables <- c(
    'T9_2_PE', #E Manual skilled (No. of persons)
    'T9_2_PF', #F Semi-skilled (No. of persons)
    'T9_2_PG'  #G Unskilled (No. of persons)
)

lowSkilledEmploymentData <- smallAreaCSOData[, lowSkilledEmploymentVariables, drop = FALSE]
lowSkilledEmploymentData$lowSkill <- apply(lowSkilledEmploymentData,1,sum)
lowSkilledEmploymentData <- select(lowSkilledEmploymentData, 'lowSkill')

#FARMERS
farmingEmploymentVariables <- c(
    'T9_2_PI', #I Farmers (No. of persons)
    'T9_2_PJ'  #J Agricultural workers (No. of persons)
)

farmingEmploymentData <- smallAreaCSOData[, farmingEmploymentVariables, drop = FALSE]
farmingEmploymentData$farming<- apply(farmingEmploymentData,1,sum)
farmingEmploymentData <- select(farmingEmploymentData, 'farming')


#TENURE - Permanent private households by type of occupancy 
rentVariables <- c(
    'T6_3_RPLP',  #Rented from private landlord (No. of persons) 
    'T6_3_RLAP',  #Rented from Local Authority (No. of persons)
    'T6_3_RVCHBP' #Rented from voluntary/co-operative housing body (No. of persons)
)

rentData <- smallAreaCSOData[, rentVariables, drop = FALSE]
rentData$rent <- apply(rentData,1,sum)
rentData <- select(rentData, 'rent')

#EDUCATION 
educationVariables <- c(
    'T10_4_NFT' #No formal education - Total
#     'T10_4_PT'   #Primary education - Total
)

educationData <- smallAreaCSOData[, educationVariables, drop = FALSE]
educationData$education <- apply(educationData,1,sum)
educationData <- select(educationData, 'education')

#ENGLISH ABILITY - Speakers of foreign languages by ability to speak English
englishVariables <- c(
    'T2_6NW', #Not well
    'T2_6NAA' #Not at all
)

englishData <- smallAreaCSOData[, englishVariables, drop = FALSE] 
englishData$engLang <- apply(englishData,1,sum)
englishData <- select(englishData, 'engLang')

#NEW RESIDENTS - Usually resident population aged 1 year and over by usual residence 1 year before Census Day
newResidentsVariables <- c(
    'T2_3EI', #Elsewhere in Ireland
    'T2_3OI'  #Outside Ireland
)

newResidentsData <- smallAreaCSOData[, newResidentsVariables, drop = FALSE] 
newResidentsData$newRes <- apply(newResidentsData,1,sum)
newResidentsData <- select(newResidentsData, 'newRes')

#TRAVEL TIME - Population aged 5 years and over by journey time to work, school or college 
travelTimeVariables <- c(
    'T11_3_D5', #hour - under 1 1/2 hours
    'T11_3_D6'  #1 1/2 hours and over
)

travelTimeData <- smallAreaCSOData[, travelTimeVariables, drop = FALSE] 
travelTimeData$travelTime <- apply(travelTimeData,1,sum)
travelTimeData <- select(travelTimeData, 'travelTime')

#combine all the data into one table
personsData <- cbind(smallAreaID,
                     populationTotalData,
                     ageYoungData,
                     ageOldData,
                     primarySchoolAgeData,
                     healthData,
                     disabilitiesData,
                     unemploymentData,
                     lowSkilledEmploymentData,
                     farmingEmploymentData,
                     rentData,
                     educationData,
                     englishData,
                     newResidentsData,
                     travelTimeData
                    )

#get the number of columns in the data
personsDataColLength = ncol(personsData)

head(personsData)

#output the data as a csv
write.csv(personsData, "../1_InputData/1a_CensusData/persons/personsSmallAreaRawData2016.csv", row.names = FALSE)

GUID,populationTotal,young,old,priSch,poorHealth,disability,unemploy,lowSkill,farming,rent,education,engLang,newRes,travelTime
4c07d11e-11d3-851d-e053-ca3ca8c0ca7f,395,37,14,70,6,51,53,87,59,14,1,1,4,33
4c07d11e-123a-851d-e053-ca3ca8c0ca7f,344,32,22,37,3,49,47,61,38,25,1,1,7,17
4c07d11e-14b1-851d-e053-ca3ca8c0ca7f,405,45,8,74,5,52,79,62,7,89,3,8,5,25
4c07d11e-14b2-851d-e053-ca3ca8c0ca7f,276,13,26,22,3,44,50,55,21,31,0,1,6,18
4c07d11d-f709-851d-e053-ca3ca8c0ca7f,243,20,11,33,3,27,27,72,60,16,1,3,2,17
4c07d11e-1237-851d-e053-ca3ca8c0ca7f,319,22,26,47,6,46,50,79,60,30,6,1,6,20


### Household Level Data

We then get the household level data and combine the variables together to create indicators:

In [6]:
#HOUSEHOLD DATA

#HOUSEHOLDS TOTAL
householdsTotalData <- smallAreaCSOData[, 'T5_1T_H', drop = FALSE] #Total households (No. of households)
names(householdsTotalData)[1] <- 'householdsTotal'


#NO HEATING - Permanent private households by central heating - Households
noHeatingData <- smallAreaCSOData[, 'T6_5_NCH', drop = FALSE]  #No central heating
noHeatingData$noHeating <- apply(noHeatingData,1,sum)
noHeatingData <- select(noHeatingData, 'noHeating')

#YEAR PROPERTY BUILT - Permanent private households by year built (Pre 1919, 1919-1945, 1946-1960, 1961-1970, 
#1971-1980, 1981-1990, 1991-2000, 2001-2010, 2011 or Later, Not stated)

yearBuiltVariables <- c(
    'T6_2_PRE19H', #Pre 1919 (No. of households)
    'T6_2_19_45H'  #1919 - 1945 (No. of households)
)

yearBuiltData <- smallAreaCSOData[, yearBuiltVariables, drop = FALSE]
yearBuiltData$yearBuilt <- apply(yearBuiltData,1,sum)
yearBuiltData <- select(yearBuiltData, 'yearBuilt')


#CARAVAN/MOBILE HOME (House/Bungalow, Flat/Apartment Bed-Sit, Caravan/Mobile home, Not stated)
mobileHomeData <- smallAreaCSOData[, 'T6_1_CM_H', drop = FALSE] # #Caravan/Mobile home (No. of households)
mobileHomeData$mobHome <- apply(mobileHomeData,1,sum)
mobileHomeData <- select(mobileHomeData, 'mobHome')


#ONE PARENT HOUSEHOLDS
oneParentVariables <- c(
    'T5_1OPFC_H', #One parent family (father) with  children households (No. of households)
    'T5_1OPMC_H', #One parent family (mother) and children households (No. of households)
    'T5_1OPFCO_H',#One parent family (father) with children and others households (No. of households)
    'T5_1OPMCO_H' #One parent family (mother) with children and others households (No. of households)
)

oneParentData <- smallAreaCSOData[, oneParentVariables, drop = FALSE]
oneParentData$oneParent <- apply(oneParentData,1,sum)
oneParentData <- select(oneParentData, 'oneParent')

#ONE PERSON HOUSEHOLDS
onePersonData <- smallAreaCSOData[, 'T5_1OP_H', drop = FALSE] #One person households (No. of households)
onePersonData$onePerson <- apply(onePersonData,1,sum)
onePersonData <- select(onePersonData, 'onePerson')

#CAR OWNERSHIP
noCarData <- smallAreaCSOData[, 'T15_1_NC', drop = FALSE] #No motor car (No. of households)
noCarData$noCar <- apply(noCarData,1,sum)
noCarData <- select(noCarData, 'noCar')


#NO INTERNET
noInternetData <- smallAreaCSOData[, 'T15_3_N', drop = FALSE] #No internet (No. of households)
noInternetData$noInternet <- apply(noInternetData,1,sum)
noInternetData <- select(noInternetData, 'noInternet')

#WATER SUPPLY - private water supplies at risk of disease due to reduced quality control - *BIG ASSUMPTION*
waterSupplyVariables <- c(
    'T6_6_GSP', #Group scheme with private source
    'T6_6_OP'   #Other private source
)

waterSupplyData <- smallAreaCSOData[, waterSupplyVariables, drop = FALSE]
waterSupplyData$priWater <- apply(waterSupplyData,1,sum)
waterSupplyData <- select(waterSupplyData, 'priWater')


#combine all the data into one table
householdData <- cbind(smallAreaID,
                       householdsTotalData,
                       noHeatingData,
                       yearBuiltData,
                       mobileHomeData,
                       oneParentData,
                       onePersonData,
                       noCarData,
                       noInternetData,
                       waterSupplyData
                    )
#inspect the table
head(householdData)

#get the number of columns in the data
householdDataColLength = ncol(householdData)

#output the data as a csv
write.csv(householdData, "../1_InputData/1a_CensusData/household/householdSmallAreaRawData2016.csv", row.names = FALSE)

GUID,householdsTotal,noHeating,yearBuilt,mobHome,oneParent,onePerson,noCar,noInternet,priWater
4c07d11e-11d3-851d-e053-ca3ca8c0ca7f,129,1,27,1,14,23,5,23,113
4c07d11e-123a-851d-e053-ca3ca8c0ca7f,114,2,22,0,6,19,7,23,85
4c07d11e-14b1-851d-e053-ca3ca8c0ca7f,139,0,10,1,22,20,7,16,10
4c07d11e-14b2-851d-e053-ca3ca8c0ca7f,103,0,19,0,6,20,1,15,42
4c07d11d-f709-851d-e053-ca3ca8c0ca7f,83,2,21,0,9,20,1,21,61
4c07d11e-1237-851d-e053-ca3ca8c0ca7f,114,1,32,0,8,23,3,27,29


## Percentages

The raw data is not suitable for use within the vulnerabiltiy assessment. It needs to be normalised based on the number of people/households within each small area. Therefore, the data is converted to percentages based on the total persons/households within each small area.

### Persons Percentages

In [7]:
#PERSONS DATA

#Copy the data
personsDataPct <- personsData

#Calculate the percentages for each of the relevant columns - starting at the 4th column
for(col in names(personsDataPct)[3:personsDataColLength]) {
  personsDataPct[paste0(col, "_pct")] = (personsDataPct[col] / personsDataPct$populationTotal)*100
}

#remove the original data to leave only the percentages
personsDataPct <- personsDataPct[-c(2:personsDataColLength)]
# head(personsDataPct)

#output the data as a csv
write.csv(personsDataPct, "../1_InputData/1a_CensusData/persons/personsSmallAreaPctData2016.csv", row.names = FALSE)

### Household Percentages

In [8]:
#HOUSEHOLD DATA

#Copy the data
householdDataPct <- householdData

#Calculate the percentages for each of the relevant columns - starting at the 4th column
for(col in names(householdDataPct)[3:ncol(householdDataPct)]) {
  householdDataPct[paste0(col, "_pct")] = (householdDataPct[col] / householdDataPct$householdsTotal)*100
}

#remove the original data to leave only the percentages
householdDataPct <- householdDataPct[-c(2:householdDataColLength)]
# head(householdDataPct)

#output the data as a csv
write.csv(householdDataPct, "../1_InputData/1a_CensusData/household/householdSmallAreaNormalisedData2016.csv", row.names = FALSE)

## Z-Scores

The raw data is not suitable for use within the vulnerabiltiy assessment. It needs to be standardised. Therefore, the data is converted to z-scores. Z-scores are:

>"A statistical measurement of a score's relationship to the mean (average value) in a group of scores. A Z-score of 0 means the score is the same as the mean (average value). A Z-score can be positive or negative, indicating whether it is above or below the mean and by how many standard deviations. Z-score standardisation represents the deviation of a raw score from its mean in standard deviation units." (Kazmierczak et al., 2015)

## Persons Z-scores

In [9]:
#PERSONS DATA

#Copy the data
personsDataZ <- personsDataPct

#Calculate the z scores for each of the relevant columns - starting at the 2nd column
for(col in names(personsDataZ)[2:ncol(personsDataZ)]) {
  personsDataZ[paste0(col, "_Z")] = scale(personsDataZ[col])
}


#remove the original data to leave only the z scores
personsDataZ <- personsDataZ[-c(2:ncol(personsDataPct))]
# summary(personsDataZ)
# head(personsDataZ)

# #output the data as a csv
write.csv(personsDataZ, "../1_InputData/1a_CensusData/persons/personsSmallAreaZData2016.csv", row.names = FALSE)

## Households Z-scores

In [10]:
#HOUSEHOLD DATA

#Copy the data
householdDataZ <- householdDataPct

#Calculate the z scores for each of the relevant columns - starting at the 3rd column
for(col in names(householdDataZ)[2:ncol(householdDataZ)]) {
  householdDataZ[paste0(col, "_Z")] = scale(householdDataZ[col])
}

#remove the original data to leave only the z scores
householdDataZ <- householdDataZ[-c(2:ncol(householdDataPct))]
# summary(householdDataZ)
# head(householdDataZ)

#output the data as a csv
write.csv(householdDataZ, "../1_InputData/1a_CensusData/household/householdSmallAreaZData2016.csv", row.names = FALSE)

## Combine Data

The persons level and household level data are then combined into a single CSV:

In [11]:
#Combine the RAW persons and household data
personsHouseholdDataCombined <- cbind(personsData,
                                       householdData[2:ncol(householdData)])

#output the data as a csv
write.csv(personsHouseholdDataCombined, "../1_InputData/1a_CensusData/censusData.csv", row.names = FALSE)

#Combine the % persons and household data
personsHouseholdPctDataCombined <- cbind(personsDataPct,
                                       householdDataPct[2:ncol(householdDataPct)])

#output the data as a csv
write.csv(personsHouseholdPctDataCombined, "../1_InputData/1a_CensusData/censusDataPercent.csv", row.names = FALSE)


#Combine the Z-score persons and household data
personsHouseholdZDataCombined <- cbind(personsDataZ,
                                       householdDataZ[2:ncol(householdDataZ)])

names(personsHouseholdZDataCombined) <- gsub("_pct_Z","",names(personsHouseholdZDataCombined))

#output the data as a csv
write.csv(personsHouseholdZDataCombined, "../1_InputData/1a_CensusData/censusDataZ.csv", row.names = FALSE)


**END**