# Matrimonial Matters County & UA Level Annual

## Contents
#### Setup
1. [import_packages](#import_packages) 
2. [define_key_variables](#define_key_variables) 

#### Stage 1 - [Divorce County and UA Lookup](#Divorce_County_and_UA_lookup)
3. [ons_postcode_table](#ons_postcode_table) - imports the ons postcode directory file saved in the S3 bucket
4. [la_districts_table](#la_districts_table) - imports the local authority districts file saved in the S3 bucket
5. [la_districts_counties_table](#la_districts_counties_table) - imports the local authority districts to counties file saved in the S3 bucket
6. [lookup_working](#lookup_working) - creates a code, la, county, and country lookup by joining both the la_districts_table to la_districts_counties_table
7. [divorce_county_ua_lookup](#divorce_county_ua_lookup) - creates an additional column called 'county_ua' and adds additional information to the county or la strings


#### Stage 2 - [Petitioner Postcode](#Petitioner_Postcode)
8. [petitioner_address_details_table](#petitioner_address_details_table) - imports the petitioner address details saved in the S3 bucket

#### Stage 3 - [Creating Final Output](#Creating_Final_Output)
9. [petitioner_address](#petitioner_address) - selects the columns required and renames the column 'pettnr_contact_details_confdntl_cind' to 'confdntl'
10. [new_divorce_postcode](#new_divorce_postcode) - capitalise the values stored in all the columns, renaming the columns (except from the year, quarter, and month column), and removing any spaces that appear in the 'pettnr_postal_code' column 
11. [new_divorce_with_postcode_temp1](#new_divorce_with_postcode_temp1) - searches for postcode patterns from column 'line1' to 'postcode' (if conditions are not satified the code will return null in the respective column)
12. [new_divorce_with_postcode_temp2](#new_divorce_with_postcode_temp2) - reformates the columns 'newpostcode1' to 'newpostcode7' from an array to a dataframe (removing any spaces)
13. [new_divorce_with_postcode](#new_divorce_with_postcode) - creates a column called 'newpostcode' by printing the first postcode that appears in columns 'newpostcode1' to 'newpostcode7' (if there is no postcode available the code returns null)
14. [ons_postcode_data](#ons_postcode_data) - removes spaces from postcode data column and filters by country code in ('E92000001','W92000004') to extract England and Wales postcodes
15. [divorce_postcode_ons_match](#divorce_postcode_ons_match) - adds the columns 'PCD' and 'oslaua' from the ons postcode data to the new_divorce_with_postcode table
16. [divorce_postcode_la](#divorce_postcode_la) - adds the columns 'county_ua' and 'country' from the divorce_county_ua_lookup table to the divorce_postcode_ons_match table
17. [divorce_la_c8](#divorce_la_c8) - creates a new column called 'county_ua2' to mark the applicant's postcode if they request their postcode to be confidential, postcodes can also be marked as invalid/foreign/not given, or unknown

    **Confidentiality requested** - has been marked with C8 or Confidential in the address lines and/or KEEP or Y in the confidential column. Note: These addresses must have valid postcodes to be marked as confidential.
    
    **Postcode invalid/not given or foreign** - has a wide range of postcodes with incorrect formatting that can't be picked up by the postcode search and/or cannot be recognised by ONS postcode directory. Note: This can contain addresses marked as confidential, but the postcodes are invalid. This can also contain instances where there are addresses but a postcode was not provided.
    
    **Unknown** - has little to no address information (e.g first line of the address) with no available postcodes. Note: This group may contain a few invalid postcodes as we don't have a piece of code looking for postcodes in different formats (e.g punctuation or spaces in different positions of the string) 

18. [divorce_county](#divorce_county) - rename the 'county_ua2' column to 'county_ua', concatinating both Cornwall UA to Isles of Scilly UA, and copying over any confidential, invalid, not given, or foreign postcodes to country column
19. [petitioner_summary_la](#petitioner_summary_la) - calculate the number of county_ua grouping by year, county_ua and country, and filters by 2010 < year < current year


## 1. Import packages and set options 
<a name="import_packages"></a>

In [None]:
import pandas as pd  # a module which provides the data structures and functions to store and manipulate tables in dataframes
import pydbtools as pydb  # A module which allows SQL queries to be run on the Analytical Platform from Python, see https://github.com/moj-analytical-services/pydbtools
import boto3  # allows you to directly create, update, and delete AWS resources from Python scripts
import numpy as np
import re

# sets parameters to view dataframes for tables easier
pd.set_option("display.max_columns", 100)
pd.set_option("display.width", 900)
pd.set_option("display.max_colwidth", 200)

## 2. Define key variables to be used throughout the notebook 
<a name="define_key_variables"></a>

In [None]:
#this is the database we will be extracting from
database = "familyman_dev_v3" 

#this is the athena database we will be storing our tables in
fcsq_database = "fcsq"

#this is the s3 bucket we will be saving data to
s3 = boto3.resource("s3")
bucket = s3.Bucket("alpha-family-data")

#setting current year
current_year = 2023

# Stage 1 - Divorce County and UA lookup
<a name="Divorce_County_and_UA_lookup"></a>

## Import ONS Postcode Directory 
<a name="ons_postcode_table"></a>

### Create the ons_postcode table

In [None]:
#imports ONS Postcode Directory data from S3 bucket into a temporary table
ons_postcode_table = pd.read_csv("s3://alpha-family-data/CSVs/Divorce/Petitioner LA/Lookup/ONSPD_NOV_2022_UK.csv", low_memory=False)

In [None]:
pydb.dataframe_to_temp_table(ons_postcode_table, "ons_postcode")

#### ons_postcode validation

In [None]:
#ons_postcode_count = pydb.read_sql_query("SELECT * from __temp__.ons_postcode limit 10")
#ons_postcode_count

## Import Local Authority Districts
<a name="la_districts_table"></a>

### Create the la_districts table

In [None]:
#imports Local Authority Districts data from S3 bucket into a temporary table
la_districts_table = pd.read_csv("s3://alpha-family-data/CSVs/Divorce/Petitioner LA/Lookup/Local_Authority_Districts_(December_2022)_Names_and_Codes_in_the_United_Kingdom.csv")

In [None]:
pydb.dataframe_to_temp_table(la_districts_table, "la_districts")

#### la_districts validation

In [None]:
#la_districts_count = pydb.read_sql_query("SELECT * from __temp__.la_districts LIMIT 10")
#la_districts_count

## Import Local Authority Districts to Counties
<a name="la_districts_counties_table"></a>

### Create the la_districts_counties table

In [None]:
#imports Local Authority Districts to Counties data from S3 bucket into a temporary table
la_districts_counties_table = pd.read_csv("s3://alpha-family-data/CSVs/Divorce/Petitioner LA/Lookup/Local_Authority_District_to_County_(December_2022)_Lookup_in_England.csv")

In [None]:
pydb.dataframe_to_temp_table(la_districts_counties_table, "la_districts_counties")

#### la_districts_counties validation

In [None]:
#la_districts_counties_count = pydb.read_sql_query("SELECT * from __temp__.la_districts_counties LIMIT 10")
#la_districts_counties_count

## Creating Lookup - *Manual change required*

### Create the lookup_working table - This is where we have to manually change the column names as both la_districts and la_districts_counties, each year change the names of their columns
<a name="lookup_working"></a>

In [None]:
#Creates a code, la, county, and country lookup
#The key variable '{prev_endyear}' has been set to automate this process because the column names in both la_districts and la_districts_counties change

#Remeber to check that the column names match both la_districts and la_districts_counties tables

create_lookup_working =f"""
SELECT 
a.LAD22CD AS code,
a.LAD22NM AS la,
b.CTY22NM AS county,
CASE WHEN a.LAD22CD LIKE 'E%' THEN 'England'
WHEN a.LAD22CD LIKE 'W%' THEN 'Wales'
END AS country
FROM __temp__.la_districts a
LEFT JOIN __temp__.la_districts_counties b
ON a.LAD22CD = b.LAD22CD 
WHERE a.LAD22CD LIKE 'E%' OR a.LAD22CD LIKE 'W%';
"""
pydb.create_temp_table(create_lookup_working,'lookup_working')

In [None]:
#lookup_working = pydb.read_sql_query("SELECT * from __temp__.lookup_working LIMIT 10")
#lookup_working

### Create the divorce_county_ua_lookup table
<a name="divorce_county_ua_lookup"></a>

In [None]:
#creates an additional column called 'county_ua', by adding additional information to the county or la.
create_divorce_county_ua_lookup =f"""
SELECT
code,
CASE WHEN county IN ('Greater Manchester', 'Merseyside', 'South Yorkshire', 'Tyne and Wear', 'West Midlands', 'West Yorkshire')
THEN CONCAT(county,' ','(Met County)')
WHEN code LIKE 'W%'
THEN la
WHEN county IS NULL
THEN CONCAT(la,' ','UA')
ELSE county
END AS county_ua,
country
FROM __temp__.lookup_working;
"""
pydb.create_temp_table(create_divorce_county_ua_lookup,'divorce_county_ua_lookup')

In [None]:
#divorce_county_ua_lookup = pydb.read_sql_query("SELECT * from __temp__.divorce_county_ua_lookup Limit 10")
#divorce_county_ua_lookup

# Stage 2 - Petitioner Postcode
<a name="Petitioner_Postcode"></a>

## Import Petitioner Address Details 
<a name="petitioner_address_details_table"></a>

### Create the petitioner_address_details table

In [None]:
#imports Petitioner Address Details data from S3 bucket into a temporary table
petitioner_address_details_table = pd.read_csv("s3://alpha-family-data/CSVs/Divorce/Petitioner LA/Petitioner_Address_Details.csv", low_memory=False)

In [None]:
pydb.dataframe_to_temp_table(petitioner_address_details_table, "petitioner_address_details")

#### petitioner_address_details validation

In [None]:
#petitioner_address_details_count = pydb.read_sql_query("SELECT * from __temp__.petitioner_address_details limit 10")
#petitioner_address_details_count

# Stage 3 - Creating Final Output
<a name="Creating_Final_Output"></a>

### Create the petitioner_address table
<a name="petitioner_address"></a>

In [None]:
#Selects the columns required and renaming the column 'pettnr_contact_details_confdntl_cind' to confdntl
create_petitioner_address =f"""
SELECT t1.Year, 
          t1.Month, 
          t1.Quarter,
          t1.PETTNR_LINE_1_ADDRESS,
          t1.PETTNR_LINE_2_ADDRESS,
          t1.PETTNR_LINE_3_ADDRESS,
          t1.PETTNR_LINE_4_ADDRESS,
          t1.PETTNR_LINE_5_ADDRESS, 
          t1.PETTNR_LINE_6_ADDRESS,
          t1.PETTNR_POSTAL_CODE,
          t1.PETTNR_CONTACT_DETAILS_CONFDNTL_CIND as CONFDNTL
FROM __temp__.petitioner_address_details t1;
"""
pydb.create_temp_table(create_petitioner_address,'petitioner_address')

In [None]:
#petitioner_address = pydb.read_sql_query("SELECT * from __temp__.petitioner_address LIMIT 10")
#petitioner_address

### Create the new_divorce_postcode table
<a name="new_divorce_postcode"></a>

In [None]:
#Capitalising the values stored in all the columns and renaming the columns except from the year, quarter, and month column
#Also, removing any spaces appearing from the column 'pettnr_postal_code')
create_new_divorce_postcode =f"""
SELECT t1.Year, 
    t1.Month, 
    t1.Quarter,
    UPPER(t1.PETTNR_LINE_1_ADDRESS) as Line1,
    UPPER(t1.PETTNR_LINE_2_ADDRESS) as Line2,
    UPPER(t1.PETTNR_LINE_3_ADDRESS) as Line3,
    UPPER(t1.PETTNR_LINE_4_ADDRESS) as Line4,
    UPPER(t1.PETTNR_LINE_5_ADDRESS) as Line5, 
    UPPER(t1.PETTNR_LINE_6_ADDRESS) as Line6,
    REPLACE(UPPER(t1.PETTNR_POSTAL_CODE), ' ', '') as postcode,
    UPPER(t1.PETTNR_CONTACT_DETAILS_CONFDNTL_CIND) as CONFDNTL
    
      FROM __temp__.petitioner_address_details t1;
"""
pydb.create_temp_table(create_new_divorce_postcode,'new_divorce_postcode')

In [None]:
#new_divorce_postcode = pydb.read_sql_query("SELECT * from __temp__.new_divorce_postcode LIMIT 10")
#new_divorce_postcode

### Create the new_divorce_with_postcode_temp1 table
<a name="new_divorce_with_postcode_temp1"></a>

In [None]:
#Searches for postcode patterns from column 'line1' to 'postcode'
#If these conditions are not satified the code will return null in the respective column
#The functions:
    #regexp_like() - looks for the pattern and produces a true or false.
    #regexp_extract_all() - extracts the string that satifies the pattern.
#The postcode case when statement looks for postcode patterns with no spaces
create_new_divorce_with_postcode_temp1 =f"""
SELECT *,

CASE WHEN regexp_like(line1, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line1, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line1, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line1, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line1, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line1, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')

WHEN regexp_like(line1, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line1, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line1, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line1, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line1, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line1, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')
ELSE NULL
END newpostcode1,


CASE WHEN regexp_like(line2, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line2, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line2, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line2, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line2, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line2, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')

WHEN regexp_like(line2, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line2, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line2, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line2, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line2, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line2, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')
ELSE NULL
END newpostcode2,


CASE WHEN regexp_like(line3, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line3, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line3, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line3, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line3, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line3, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')

WHEN regexp_like(line3, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line3, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line3, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line3, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line3, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line3, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')
ELSE NULL
END newpostcode3,


CASE WHEN regexp_like(line4, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line4, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line4, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line4, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line4, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line4, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')

WHEN regexp_like(line4, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line4, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line4, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line4, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line4, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line4, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')
ELSE NULL
END newpostcode4,


CASE WHEN regexp_like(line5, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line5, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line5, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line5, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line5, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line5, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')

WHEN regexp_like(line5, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line5, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line5, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line5, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line5, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line5, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')
ELSE NULL
END newpostcode5,

CASE WHEN regexp_like(line6, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line6, '[A-Z][A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line6, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line6, '[A-Z][A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line6, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line6, '[A-Z][A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')

WHEN regexp_like(line6, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line6, '[A-Z][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line6, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line6, '[A-Z][0-9][0-9][ \t][0-9][A-Z][A-Z]')
WHEN regexp_like(line6, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]$') THEN regexp_extract_all(line6, '[A-Z][0-9][A-Z][ \t][0-9][A-Z][A-Z]')
ELSE NULL
END newpostcode6,

CASE WHEN regexp_like(postcode, '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]$') THEN regexp_extract_all(postcode, '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]')
WHEN regexp_like(postcode, '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]$') THEN regexp_extract_all(postcode, '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]')
WHEN regexp_like(postcode, '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]$') THEN regexp_extract_all(postcode, '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]')

WHEN regexp_like(postcode, '[A-Z][0-9][0-9][A-Z][A-Z]$') THEN regexp_extract_all(postcode, '[A-Z][0-9][0-9][A-Z][A-Z]')
WHEN regexp_like(postcode, '[A-Z][0-9][0-9][0-9][A-Z][A-Z]$') THEN regexp_extract_all(postcode, '[A-Z][0-9][0-9][0-9][A-Z][A-Z]')
WHEN regexp_like(postcode, '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]$') THEN regexp_extract_all(postcode, '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]')
ELSE NULL
END newpostcode7

FROM __temp__.new_divorce_postcode;
"""
pydb.create_temp_table(create_new_divorce_with_postcode_temp1,'new_divorce_with_postcode_temp1')


In [None]:
#code = pydb.read_sql_query("SELECT * from __temp__.new_divorce_with_postcode_temp1 LIMIT 10")
#code

### Create the new_divorce_with_postcode_temp2 table
<a name="new_divorce_with_postcode_temp2"></a>

In [None]:
#The temporary table 'new_divorce_with_postcode_temp1' is reformated turning the columns 'newpostcode1' to 'newpostcode7' from an array to a dataframe (removing any spaces)
create_new_divorce_with_postcode_temp2 =f"""
SELECT year, 
month, 
quarter,
line1,
line2,
line3,
line4,
line5, 
line6,
postcode,
confdntl,

array_join(newpostcode1,  '') as newpostcode1,
array_join(newpostcode2,  '') as newpostcode2,
array_join(newpostcode3,  '') as newpostcode3,
array_join(newpostcode4,  '') as newpostcode4,
array_join(newpostcode5,  '') as newpostcode5,
array_join(newpostcode6,  '') as newpostcode6,
array_join(newpostcode7,  '') as newpostcode7

FROM __temp__.new_divorce_with_postcode_temp1;
"""
pydb.create_temp_table(create_new_divorce_with_postcode_temp2,'new_divorce_with_postcode_temp2')

In [None]:
#code = pydb.read_sql_query("SELECT * FROM __temp__.new_divorce_with_postcode_temp2")
#code

### Create the new_divorce_with_postcode table
<a name="new_divorce_with_postcode"></a>

In [None]:
#Creates a column called 'newpostcode' exporting the first postcode that appears in columns 'newpostcode1' to 'newpostcode7'
#If there is no postcode available the code returns null
create_new_divorce_with_postcode =f"""
SELECT year, 
month, 
quarter,
line1,
line2,
line3,
line4,
line5, 
line6,
postcode,
confdntl,

CASE WHEN newpostcode1 IS NOT NULL THEN newpostcode1
WHEN newpostcode2 IS NOT NULL THEN newpostcode2
WHEN newpostcode3 IS NOT NULL THEN newpostcode3
WHEN newpostcode4 IS NOT NULL THEN newpostcode4
WHEN newpostcode5 IS NOT NULL THEN newpostcode5
WHEN newpostcode6 IS NOT NULL THEN newpostcode6
WHEN newpostcode7 IS NOT NULL THEN newpostcode7

ELSE NULL
END newpostcode

FROM __temp__.new_divorce_with_postcode_temp2;
"""
pydb.create_temp_table(create_new_divorce_with_postcode,'new_divorce_with_postcode')

In [None]:
#code = pydb.read_sql_query("SELECT * FROM __temp__.new_divorce_with_postcode LIMIT 10")
#code

### Create the ons_postcode_data table
<a name="ons_postcode_data"></a>

In [None]:
#Removes spaces from postcode data column and filters by country code IN ('E92000001','W92000004')
create_ons_postcode_data =f"""
SELECT REPLACE(t1.pcd , ' ', '') AS PCD, 
t1.oslaua,
t1.ctry

FROM __temp__.ons_postcode t1

WHERE t1.ctry IN ('E92000001','W92000004');
"""
pydb.create_temp_table(create_ons_postcode_data,'ons_postcode_data')

In [None]:
#ons_postcode_data = pydb.read_sql_query("SELECT * from __temp__.ons_postcode_data LIMIT 10")
#ons_postcode_data

### Create the divorce_postcode_ons_match table
<a name="divorce_postcode_ons_match"></a>

In [None]:
#Adds the columns 'PCD' and 'oslaua' from the ons postcode data to the new_divorce_with_postcode table
create_divorce_postcode_ons_match =f"""
SELECT  t1.Year, 
    t1.Quarter, 
    t1.LINE1, 
    t1.LINE2, 
    t1.LINE3, 
    t1.LINE4, 
    t1.LINE5,
    t1.LINE6,
    t1.CONFDNTL,
    t1.postcode,
    t1.newpostcode, 
    t2.PCD, 
    t2.oslaua 
FROM ( SELECT Year, 
          Quarter, 
          LINE1, 
          LINE2, 
          LINE3, 
          LINE4, 
          LINE5,
          LINE6,
          CONFDNTL,
          postcode,
          REPLACE(newpostcode, ' ', '') AS newpostcode
FROM __temp__.new_divorce_with_postcode) t1

LEFT JOIN __temp__.ons_postcode_data t2 
    ON (t1.newpostcode = t2.PCD);
"""
pydb.create_temp_table(create_divorce_postcode_ons_match,'divorce_postcode_ons_match')

In [None]:
#divorce_postcode_ons_match = pydb.read_sql_query("SELECT * from __temp__.divorce_postcode_ons_match LIMIT 10")
#divorce_postcode_ons_match

### Create the divorce_postcode_la table
<a name="divorce_postcode_la"></a>

In [None]:
#Adds the columns 'county_ua' and 'country' from the divorce_county_ua_lookup table to the divorce_postcode_ons_match table
create_divorce_postcode_la =f"""
SELECT t1.Year, 
          t1.Quarter, 
          t1.LINE1, 
          t1.LINE2, 
          t1.LINE3, 
          t1.LINE4, 
          t1.LINE5,
          t1.LINE6,
          t1.CONFDNTL,
          t1.postcode, 
          t1.newpostcode, 
          t1.PCD, 
          t1.oslaua, 
          t2.county_ua,
          t2.country
FROM __temp__.divorce_postcode_ons_match t1
LEFT JOIN __temp__.divorce_county_ua_lookup t2 
ON (t1.oslaua = t2.code);

"""
pydb.create_temp_table(create_divorce_postcode_la,'divorce_postcode_la')

In [None]:
#divorce_postcode_la = pydb.read_sql_query("SELECT * from __temp__.divorce_postcode_la LIMIT 10")
#divorce_postcode_la

### Create the divorce_la_c8 table
<a name="divorce_la_c8"></a>

In [None]:
#Creates a new column called 'county_ua2' to suppress the applicant's postcode if they request their postcode to be confidential
#Uses a range of different conditions that need to be met in order to suppress the applicants postcode
#Postcodes can also be marked as invalid/foreign/not given e.g if the newpostcode column is not empty and the PCD column is empty
create_divorce_la_c8 =f"""
SELECT *,
CASE 
WHEN line1 is null AND line2 is null AND line3 is null AND line4 is null AND line5 is null AND line6 is null AND postcode is null and oslaua is null THEN 'Unknown'
WHEN Line1 = '-' AND Line2 = '-' AND Postcode IS NULL and oslaua is null THEN 'Unknown'
WHEN Line1 = '.' AND Line2 = '.' AND Line3 IS NULL AND Postcode IS NULL and oslaua is null THEN 'Unknown'
WHEN (Line1 = 'X' OR Line1 = 'XX' OR Line1 = 'XXX' OR Line1 = 'XXXX' OR Line1 = 'XXXXX' or Line1 = 'XXXXXX' OR Line1 = 'XXXXXXX' or Line1 = 'XXXXXXXX') and Postcode is null and oslaua is null then 'Unknown'
WHEN (Line1 = 'UNDISCLOSED' or Line1 = 'NA' OR Line1 = 'UNKNOWN' OR Line1 = 'UNKNOWN TO THE COURT' OR Line1 = 'UNKNONW' OR Line1 = 'DECEASED' OR Line1 = 'EMAIL' OR Line1 = 'N/A' OR Line1 = 'N/K' OR Line1 = 'NFA' OR Line1 = 'Y' OR Line1 = 'WITHELD' OR Line1 = 'NO' OR Line1 = 'NOT' OR Line1 = 'NOT GIVEN' OR Line1 = 'NOT AT THIS' OR Line1 = 'NOT DISCLOSED' OR Line1 = 'NOT KNOWN' OR Line1 = 'NOT PROVIDED' OR Line1 = 'NOT SUPPLIED' OR Line1 = 'NOT TO BE DISCLOSED' OR Line1 = 'NOT TO BE' OR Line1 = 'A' OR Line1 = 'ADDRES NOT TO BE DISCLOSED' OR Line1 = 'ADDRESS' OR Line1 = 'ADDRESS CONFIENTIAL' OR Line1 = 'ADDRESS DISCLOSED BY DWP' OR Line1 = 'ADDRESS DISLOSED' OR Line1 = 'ADDRESS HAS BEEN' OR Line1 = 'ADDRESS KNOWN' OR Line1 = 'ADDRESS KNOWN TO' OR Line1 = 'ADDRESS KNOWN TO COURT' OR Line1 = 'ADDRESS NEEDED (RETURN TO SENDER)' OR Line1 = 'ADDRESS NO TO BE DISCLOSED' OR Line1 = 'ADDRESS UNKNOWN' OR Line1 = 'B' OR Line1 = 'C' OR Line1 = 'CC' OR Line1 = 'C 8' OR Line1 = 'DISCLOSED' OR Line1 = 'DISCLOSED TO THE COURT' OR Line1 = 'XXXXXXXXXX' OR Line1 = 'WITHELD ADDRESS' OR Line1 = 'WITHHEALD' OR Line1 = 'TBC' OR Line1 = 'TEST' OR Line1 = 'SD' OR Line1 = 'RETURN BY THE GPO' OR Line1 = 'RETURNED' OR Line1 = 'NO FIXED' OR Line1 = 'NO ADDRESS' OR Line1 = 'NO ADDRESS.' OR Line1 = 'NO FIXED ABODE' OR Line1 = 'NO LONG AT ADDRESS' OR Line1 = '________________________' OR Line1 = 'DE' OR Line1 = 'DIED ON' OR Line1 = 'G' OR Line1 = 'USE' OR Line1 = '..ERROR' OR Line1 = 'DO NOT SEND' OR Line1 = 'DO NOT SEND OUT' OR Line1 = 'SEND' OR Line1 = 'SEND ALL PROCESS' OR Line1 = 'SEE' OR Line1 = 'SEE D80B' OR Line1 = 'EMAIL TO' OR Line1 = 'CONFIDENITAL' OR Line1 = 'A HOUSE' OR Line1 = 'A TREEHOUSE' OR Line1 = 'ADDRESS NOT' OR Line1 = 'ADDRESS NOT TO BE' OR Line1 = 'ADDRESS TO' OR Line1 = 'ADDRESS TO BE' OR Line1 = 'ADDRESS TO BE DISCLOSED' OR Line1 = 'ADDRESS TO REMAIN' OR Line1 = 'ADDRESSEE GONE AWAY' OR Line1 = '(UNKNOWN)' OR Line1 = '** LET FROM BARWELLS DATED 3.6.20**' OR Line1 = '***********NOT TO BE*****' OR Line1 = '****ADDRESS NOT DISCLOSED***') and Postcode is null and oslaua is null then 'Unknown'
WHEN ((Line1 = '0' OR Line1 = '1' OR Line1 = '3' OR Line1 = '"' OR Line1 = '''' OR Line1 = '''''' OR Line1 = '=' OR Line1 = '.' OR Line1 = '..' OR Line1 = '...' OR Line1 = '. . .' OR Line1 = '....' OR Line1 = '.,..' OR Line1 = '.....' OR Line1 = '..,..' OR Line1 = '......' OR Line1 = '........' OR Line1 = '.........' OR Line1 = '..........' OR Line1 = '...........' OR Line1 = '............' OR Line1 = '.............' OR Line1 = '..............' OR Line1 = '...............' OR Line1 = '................' OR Line1 = '.................' OR Line1 = '...................' OR Line1 = '....................' OR Line1 = '......................' OR Line1 = '-' OR Line1 = '--' OR Line1 = '---' OR Line1 = '----' OR Line1 = '------' OR Line1 = '--------' OR Line1 = '----------' OR Line1 = '--------------' OR Line1 = '-----------------------------' OR Line1 = '*' OR Line1 = '**' OR Line1 = '***' OR Line1 = '****' OR Line1 = '*****' OR Line1 = '*******' OR Line1 = '**********' OR Line1 = '******************' OR Line1 = '*********************' OR Line1 = '******************************' OR Line1 = '/' OR Line1 = ',' OR Line1 = ',,' OR Line1 = ':' OR Line1 = '::' OR Line1 = ';' OR Line1 = '_' OR Line1 = '>') AND 
(Line2 = '0' OR Line2 = '1' OR Line2 = '3' OR Line2 = '"' OR Line2 = '''' OR Line2 = '''''' OR Line2 = '=' OR Line2 = '.' OR Line2 = '..' OR Line2 = '...' OR Line2 = '. . .' OR Line2 = '....' OR Line2 = '.,..' OR Line2 = '.....' OR Line2 = '..,..' OR Line2 = '......' OR Line2 = '........' OR Line2 = '.........' OR Line2 = '..........' OR Line2 = '...........' OR Line2 = '............' OR Line2 = '.............' OR Line2 = '..............' OR Line2 = '...............' OR Line2 = '................' OR Line2 = '.................' OR Line2 = '...................' OR Line2 = '....................' OR Line2 = '......................' OR Line2 = '-' OR Line2 = '--' OR Line2 = '---' OR Line2 = '----' OR Line2 = '------' OR Line2 = '--------' OR Line2 = '----------' OR Line2 = '--------------' OR Line2 = '-----------------------------' OR Line2 = '*' OR Line2 = '**' OR Line2 = '***' OR Line2 = '****' OR Line2 = '*****' OR Line2 = '*******' OR Line2 = '**********' OR Line2 = '******************' OR Line2 = '*********************' OR Line2 = '******************************' OR Line2 = '/' OR Line2 = ',' OR Line2 = ',,' OR Line2 = ':' OR Line2 = '::' OR Line2 = ';' OR Line2 = '_' OR Line2 = '>')) AND Postcode is null and oslaua is null then 'Unknown'
WHEN ((Line1 = '"' OR Line1 = '''' OR Line1 = '''''' OR Line1 = '=' OR Line1 = '.' OR Line1 = '..' OR Line1 = '...' OR Line1 = '. . .' OR Line1 = '....' OR Line1 = '.,..' OR Line1 = '.....' OR Line1 = '..,..' OR Line1 = '......' OR Line1 = '........' OR Line1 = '.........' OR Line1 = '..........' OR Line1 = '...........' OR Line1 = '............' OR Line1 = '.............' OR Line1 = '..............' OR Line1 = '...............' OR Line1 = '................' OR Line1 = '.................' OR Line1 = '...................' OR Line1 = '....................' OR Line1 = '......................' OR Line1 = '-' OR Line1 = '--' OR Line1 = '---' OR Line1 = '----' OR Line1 = '------' OR Line1 = '--------' OR Line1 = '----------' OR Line1 = '--------------' OR Line1 = '-----------------------------' OR Line1 = '*' OR Line1 = '**' OR Line1 = '***' OR Line1 = '****' OR Line1 = '*****' OR Line1 = '*******' OR Line1 = '**********' OR Line1 = '******************' OR Line1 = '*********************' OR Line1 = '******************************' OR Line1 = '/' OR Line1 = ',' OR Line1 = ',,' OR Line1 = ':' OR Line1 = '::' OR Line1 = ';' OR Line1 = '_' OR Line1 = '>') OR 
(Line2 = '"' OR Line2 = '''' OR Line2 = '''''' OR Line2 = '=' OR Line2 = '.' OR Line2 = '..' OR Line2 = '...' OR Line2 = '. . .' OR Line2 = '....' OR Line2 = '.,..' OR Line2 = '.....' OR Line2 = '..,..' OR Line2 = '......' OR Line2 = '........' OR Line2 = '.........' OR Line2 = '..........' OR Line2 = '...........' OR Line2 = '............' OR Line2 = '.............' OR Line2 = '..............' OR Line2 = '...............' OR Line2 = '................' OR Line2 = '.................' OR Line2 = '...................' OR Line2 = '....................' OR Line2 = '......................' OR Line2 = '-' OR Line2 = '--' OR Line2 = '---' OR Line2 = '----' OR Line2 = '------' OR Line2 = '--------' OR Line2 = '----------' OR Line2 = '--------------' OR Line2 = '-----------------------------' OR Line2 = '*' OR Line2 = '**' OR Line2 = '***' OR Line2 = '****' OR Line2 = '*****' OR Line2 = '*******' OR Line2 = '**********' OR Line2 = '******************' OR Line2 = '*********************' OR Line2 = '******************************' OR Line2 = '/' OR Line2 = ',' OR Line2 = ',,' OR Line2 = ':' OR Line2 = '::' OR Line2 = ';' OR Line2 = '_' OR Line2 = '>')) AND Postcode is null and oslaua is null then 'Unknown'


WHEN CONFDNTL = 'KEEP' AND (newpostcode is not null AND PCD is not null) THEN 'Confidentiality requested' 
WHEN CONFDNTL = 'Y' AND (newpostcode is not null AND PCD is not null) THEN 'Confidentiality requested'
WHEN (Line1 = 'X' OR Line1 = 'XX' OR Line1 = 'XXX' OR Line1 = 'XXXX' OR Line1 = 'XXXXX' or Line1 = 'XXXXXX' OR Line1 = 'XXXXXXX' or Line1 = 'XXXXXXXX') AND (newpostcode is not null AND PCD is not null) then 'Confidentiality requested'
WHEN strpos(Line1,'WITHHELD') <> 0 AND (newpostcode is not null AND PCD is not null) then 'Confidentiality requested'
WHEN strpos(Line1,'CONFIDENT') <> 0 AND (newpostcode is not null AND PCD is not null) then 'Confidentiality requested'
WHEN strpos(Line2,'CONFIDENT') <> 0 AND (newpostcode is not null AND PCD is not null) then 'Confidentiality requested'
WHEN strpos(Line1,'C8') <> 0 AND (newpostcode is not null AND PCD is not null) then 'Confidentiality requested'
WHEN strpos(Line2,'C8') <> 0 AND (newpostcode is not null AND PCD is not null) then 'Confidentiality requested'

WHEN Newpostcode IS NOT NULL AND PCD IS NULL then 'Postcode invalid/not given or foreign'
WHEN Newpostcode IS NOT NULL AND PCD IS NOT NULL AND county_ua IS NULL then 'Postcode invalid/not given or foreign'
WHEN county_ua IS NULL THEN 'Postcode invalid/not given or foreign'

ELSE county_ua 
END county_ua2
FROM __temp__.divorce_postcode_la;

"""
pydb.create_temp_table(create_divorce_la_c8,'divorce_la_c8')

In [None]:
#divorce_la_c8 = pydb.read_sql_query("SELECT * from __temp__.divorce_la_c8 LIMIT 10")
#divorce_la_c8

#### Check Confidentiality Filter

In [None]:
#Check that makes sure the confidentiality filters are working properly
check = pydb.read_sql_query("SELECT DISTINCT * FROM __temp__.DIVORCE_LA_C8 WHERE CONFDNTL = 'Y' or CONFDNTL = 'KEEP';")
check

### Create the divorce_county table
<a name="divorce_county"></a>

In [None]:
#Renaming the 'county_ua2' column to 'county_ua', and concatinating both Cornwall UA to Isles of Scilly UA
#Copying over any confidential, invalid, not given, or foreign postcodes to country column.
create_divorce_county =f"""
SELECT year,
quarter,
line1,
line2,
line3,
line4,
line5,
line6,
confdntl,
postcode, 
newpostcode,
pcd,
oslaua,

CASE 
WHEN county_ua2 = 'Isles of Scilly UA' then 'Cornwall & Isles of Scilly'
WHEN county_ua2 = 'Cornwall UA' then 'Cornwall & Isles of Scilly'

ELSE county_ua2 
END county_ua,

CASE 
WHEN county_ua2 = 'Confidentiality requested' then 'Confidentiality requested'
WHEN county_ua2 = 'Postcode invalid/not given or foreign' then 'Postcode invalid/not given or foreign'
WHEN county_ua2 = 'Unknown' then 'Unknown'
ELSE country
END country

FROM __temp__.divorce_la_c8;

"""
pydb.create_temp_table(create_divorce_county,'divorce_county')

In [None]:
#divorce_county = pydb.read_sql_query("SELECT * from __temp__.divorce_county LIMIT 10")
#divorce_county

### Create the petitioner_summary_la table
<a name="petitioner_summary_la"></a>

In [None]:
#Calculates the number of county_ua grouping by year, county_ua and country
#Filters by 2010 < year < current year
create_petitioner_summary_la =f"""
SELECT DISTINCT 'Petitioner' as Type,
t1.year,
t1.country,
t1.county_ua,
(COUNT(t1.county_ua)) AS COUNT_of_County

FROM __temp__.divorce_county t1

WHERE year > 2010
AND year < {current_year}

GROUP BY t1.year,
t1.county_ua,
t1.country;

"""
pydb.create_temp_table(create_petitioner_summary_la,'petitioner_summary_la')

In [None]:
petitioner_summary_la = pydb.read_sql_query("SELECT * from __temp__.petitioner_summary_la")
petitioner_summary_la

In [None]:
#Check that counts of county 
petitioner_summary_la[['count_of_county']].sum()

In [None]:
#Orders the data by year, country, and county_ua
final_output = pydb.read_sql_query("""
SELECT *
from __temp__.petitioner_summary_la
ORDER BY year,
country,
county_ua
""")

In [None]:
#Export the final csv
final_output.to_csv("s3://alpha-family-data/CSVs/Divorce/CSV Matrimonial Matters County & UA Annual.csv", index = False)