# PDA data science - SIMD 2020
<div class="alert alert-block alert-info"> 
    Notebook 7: by michael.ferrie@edinburghcollege.ac.uk <br> Edinburgh College, April 2022
</div>

#### Scottish Index of Multiple Deprivation 2020

[The Scottish Index of Multiple Deprivation](https://www.gov.scot/collections/scottish-index-of-multiple-deprivation-2020/?utm_source=redirect&utm_medium=shorturl&utm_campaign=simd) is a relative measure of deprivation across 6,976 small areas (called data zones). If an area is identified as ‘deprived’, this can relate to people having a low income but it can also mean fewer resources or opportunities. SIMD looks at the extent to which an area is deprived across seven domains: income, employment, education, health, access to services, crime and housing.

SIMD is the Scottish Government's standard approach to identify areas of multiple deprivation in Scotland. It can help improve understanding about the outcomes and circumstances of people living in the most deprived areas in Scotland. It can also allow effective targeting of policies and funding where the aim is to wholly or partly tackle or take account of area concentrations of multiple deprivation. have a look at this [comic](https://www.gov.scot/publications/simd-illustrated-story-a-place-in-time/) about the Levenmouth area in Fife.

SIMD ranks data zones from most deprived (ranked 1) to least deprived (ranked 6,976). People using SIMD will often focus on the data zones below a certain rank, for example, the 5%, 10%, 15% or 20% most deprived data zones in Scotland. There are some good [visualisations and insights](https://www.gov.scot/publications/scottish-index-multiple-deprivation-2020/pages/5/) available to us already.

In this notebook we will make use of the open data provided by the Scottish government for analysis, this should be a skills integration task and should also be a useful exercise in working with a much larger data set. As in previous notebooks we shall make use of data science libraries such as `pandas` to aid our analysis.

Before beginning, have a look at this [interactive map](https://simd.scot/#/simd2020/BTTTFTT/9/-4.0000/55.9000/) that has been set up, this is a good way to represent the data and make it accessible to the public.

#### Data file for analysis

Download the csv file from Moodle and import into pandas, create a new dataframe with the file. Before beginning read the summary of each of the columns in the datafile. The following notes are provided with the data.

##### Notes

* SIMD 2020 was published on 28 January 2020.
* A revision was published in April 2020 due to an error in the income ranks provided by DWP. 
* There was some impact on the overall SIMD ranks, however these changes were not substantive on the whole. 
* A summary of the impact can be found on the SIMD website.
* For more information, guidance and technical notes please go to [www.gov.scot/SIMD](www.gov.scot/SIMD).

Contact: Elizabeth Fraser, 0131 244 7714, simd@gov.scot

#### Metadata

The following is a summary of the data in the file, DZ stands for Data Zone.

|Column|Description|
|-----|-------| 
|DZ|2011 data zone code|
|DZname|2011 data zone  name|
|SIMD2020v2_Rank|Overall SIMD 2020v2 rank - 1 is most deprived, 6,976 is least deprived|
|SIMD2020v2_Vigintile|SIMD 2020v2 5% band - 1 is most deprived, 20 is least deprived|
|SIMD2020v2_Decile|SIMD 2020v2 10% band - 1 is most deprived, 10 is least deprived|
|SIMD2020v2_Quintile|SIMD 2020v2 20% band - 1 is most deprived, 5 is least deprived|
|SIMD2020v2_Income_Domain_Rank|SIMD 2020v2 income domain rank|
|SIMD2020_Employment_Domain_Rank|SIMD 2020 employment domain rank|
|SIMD2020_Education_Domain_Rank|SIMD 2020 education domain rank|
|SIMD2020_Health_Domain_Rank|SIMD 2020 health domain rank|
|SIMD2020_Access_Domain_Rank|SIMD 2020 access domain rank|
|SIMD2020_Crime_Domain_Rank|SIMD 2020 crime domain rank|
|SIMD2020_Housing_Domain_Rank|SIMD 2020 housing domain rank|
|Population|Small area population estimates 2017|
|Working_Age_Population|Working age population based on 2017 small area population estimates|
|URclass|2016 Scottish Government 6-fold urban rural classification|
|URname|2016 Scottish Government 6-fold urban rural classification name|
|IZcode|2011 intermediate zone code|
|IZname|2011 intermediate zone name|
|LAcode|Local authority code|
|LAname|Local authority name|
|HBcode|Health board code|
|HBname|Health board name|
|MMWcode|Multi-member ward code|
|MMWname|Multi-member ward name|
|SPCcode|Scottish parliamentary constituency code|
|SPCname|Scottish parliamentary constituency name|

##### Import the datafile for analysis

Add your code to import the datafile then create a pandas dataframe with the file, use `describe()` to provide a summary of the data.

In [None]:
# import pandas here


# import datafile as df here (use tab to autocomplete)


# use describe on file here


# Questions

##### According to the data what is the total population in the Aberdeen City Local Authority area?

In [None]:
# answer in this cell


##### According to the data what is the total working age population in the Aberdeen City Local Authority area, what percentage is this of the total population?

In [None]:
# answer in this cell


##### According to the data what is the total population in the City of Edinburgh Local Authority area and what is the total working age population, what percentage of the total population are of working age? 

In [None]:
# answer in this cell


##### Create a bar chart with 4 bars - Aberdeen City LA (Local authority) total population, Aberdeen City LA working age population, City of Edinburgh LA total population, City of Edinburgh LA working age population. Make the Aberdeen bars on the chart two different shades of blue and the Edinburgh bars two different shades of red, add axis labels and a main title to the chart?

In [None]:
# answer in this cell


##### According to the data what is most populated local zone (LZ) and what local zone is the least populated, what is the mean LZ population for all LZ's?

In [None]:
# answer in this cell


##### There are 6 types of [urban rural classification](https://www.gov.scot/publications/scottish-government-urban-rural-classification-2016/pages/2/) create a pie chart showing the split between these for all entries in the dataset,  add axis labels and a main title to the chart?

In [None]:
# answer in this cell


Employment, housing, health, crime and education are actually composite metrics, the details of which are detailed on [this spreadsheet](https://www.gov.scot/publications/scottish-index-of-multiple-deprivation-2020v2-indicator-data/). In order to determine these, several factors are considered for each and then a score is calculated. In the case of Education, the following variables are used to calculate the domain rannk Attendance, Attainment, no_qualifications, not_participating, University attendance. This is called a domain rank, For all ranks: [1 is most deprived, 6,976 is least deprived](https://www.gov.scot/publications/scottish-index-of-multiple-deprivation-2020v2-ranks/) as detailed here.

##### For this question we will look at the education domain ranking in the four major cities of Scotland, first use the following local authorities - Dundee City, Aberdeen City, City of Edinburgh, Glasgow City, for each calculate the mean SIMD2020_Education_Domain_Rank. Then plot the mean values for each city side by side on a bar char with labels, give each bar a different colour?

In [None]:
# answer in this cell


##### Adapt the bar char from the previous question into a multiple bar chart for each city add a second bar this time use the mean value of the SIMD2020_Health_Domain_Rank?

In [None]:
# answer in this cell


##### For each city calculate the correlation between the mean health rank and the mean education rank? 

In [None]:
# answer in this cell


##### There is another ranking called SIMD2020v2_Income_Domain_Rank, write the code to calculate the top 10 least deprived LZ's (Local Zones) in Scotland in terms of income?

In [None]:
# answer in this cell
