# Domestic Violence Timeliness 

## Contents
#### Setup
1. [import_packages](#import_packages) 
2. [define_key_variables](#define_key_variables) 

#### Stage 1 - Legal representation
3. [dv_case_data_temp](#DV_case_data_temp) - filters by year > 2010 and by first application date
4. [dv_applicant_info](#DV_applicant_info) - joins the roles, parties and address tables for applicants to get information on the applicants
5. [dv_respondent_info](#DV_respondent_info) - joins the roles, parties and address tables for respondents to get information on the respondents
6. [dv_applicants_temp](#DV_applicants_temp) takes the dv_applicant_info table and reformats it, decoding gender and representative role values into strings
7. [dv_respondents_temp](#DV_respondents_temp) - takes the dv_respondent_info table and reformats it, decoding gender and representative role values into strings
8. [dv_app_rep](#DV_app_rep) - joins the case data with the applicant representation data
9. [dv_resp_rep](#DV_resp_rep) - joins the case data with the respondent representation data
10. [dv_hearing_events](#DV_Hearing_Events) - takes the domestic violence hearing events from the hearings table and joins it with the events table data 
11. [dv_hearing_cases](#DV_Hearing_Cases) - takes the hearing events, filters by domestic violence event codes and adds a flag whether the hearing is the first in the case
12. [hearing_dv_applicants](#Hearing_DV_Applicants) - joins the cases from applicant representation table to the first hearing for the case in the DV_Hearing_Cases table
13. [hearing_dv_respondents](#Hearing_DV_Respondents) - joins the cases from respondent representation table to the first hearing for the case in the DV_Hearing_Cases table
14. [dv_app](#DV_App) - groups the hearing_dv_applicants table and produces a count per group
15. [dv_resp](#DV_Resp) - groups the hearing_dv_respondents table and produces a count per group
16. [dv_case](#DV_Case) - groups and formats dv_case_data_v3 table and gives a count for each group
17. [dv_case_hearings](#DV_Case_Hearings) - creates a count of all the cases with a hearing per quarter
18. [domestic violence](#Domestic_Violence) - joins the applicant/respondent representation count tables, case count table, and case hearing count tables

#### Stage 2 - Timeliness
19. [dv_applications_data_sorted](#DV_applications_data_sorted) - orders dv_application_5 by case_number and application date
20. [dv_applications_temp](#DV_applications_temp) - takes the dv_applications_data sorted table and filters it so it only has the first application record per case number
21. [dv_orders_data_sorted](#DV_orders_data_sorted) - sorts the disposals table by case_number and receipt date, removing contact orders, placement revoke or vary orders and other type orders
22. [dv_orders_temp](#DV_orders_temp)- takes the dv_orders_data sorted table and filters it so it only has the first order record per case number
23. [dv_apps_and_orders_match](#DV_apps_and_orders_match) - calculates the timeliness based on the difference between app_date and disp_date
24. [Applicant_representation](#Applicant_representation) - creates a table showing whether all, some, or none of the applicants for a case have representation
25. [Respondent_representation](#Respondent_representation) - creates a table showing whether all, some, or none of the respondents for a case have representation
26. [DV_disposals_final](#DV_disposals_final) - joins the dv_apps_and_orders_match table and the representation tables, joining timeliness and legal representation data for a case together
27. [DV_Quarterly](#DV_Quarterly) - groups the disposals_final table by quarter, providing a total and mean wait_weeks per quarter
28. [DV_Annual](#DV_Annual) - groups the disposals_final table annually, providing a total and mean wait_weeks annually
29. [DV_timeliness](#DV_timeliness) - groups the disposals_final table regionally and nationally, providing a total, mean, and median wait_weeks
30. [DV_timeliness_combined](#DV_timeliness_combined) - combines and orders the annual and quarterly timeliness data 

## 1. Import packages and set options 
<a name="import_packages"></a>

In [1]:
import pandas as pd  # a module which provides the data structures and functions to store and manipulate tables in dataframes
import pydbtools as pydb  # A module which allows SQL queries to be run on the Analytical Platform from Python, see https://github.com/moj-analytical-services/pydbtools
import boto3  # allows you to directly create, update, and delete AWS resources from Python scripts
import numpy as np

# sets parameters to view dataframes for tables easier
pd.set_option("display.max_columns", 100)
pd.set_option("display.width", 900)
pd.set_option("display.max_colwidth", 200)

## 2. Define key variables to be used throughout the notebook 
<a name="define_key_variables"></a>

In [2]:
#this is the database we will be extracting from
database = "familyman_dev_v3" 

#this extracts the latest snapshot from athena
#get_snapshot_date = f"SELECT mojap_snapshot_date from {database}.events order by mojap_snapshot_date desc limit 1"
#snapshot_date = str(pydb.read_sql_query(get_snapshot_date)['mojap_snapshot_date'].values[0])

#this extracts the August snapshot from athena
snapshot_date = '2022-08-04'

#this is the athena database we will be storing our tables in
fcsq_database = "fcsq"

#this is the s3 bucket we will be saving data to
s3 = boto3.resource("s3")
bucket = s3.Bucket("alpha-family-data")

#change these to the current quarter and year not the quarter being published
latest_quarter = 3
latest_year = 2022

## 3. DV_case_data temporary tables - filters by receipt_date > 2010 and by first application date
<a name="DV_case_data_temp"></a>

In [3]:
create_dv_case_data_v1 = f"""
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY CASE_NUMBER
ORDER BY CASE_NUMBER, RECEIPT_DATE, EVENT_COURT
) CASE_NUMBER_ID
FROM fcsq.DV_APPS_FINAL ;
"""

pydb.create_temp_table(create_dv_case_data_v1,'dv_case_data_v1')

create_dv_case_data_v2 = f"""
SELECT *
FROM __temp__.dv_case_data_v1
WHERE CASE_NUMBER_ID = 1;
"""

pydb.create_temp_table(create_dv_case_data_v2,'dv_case_data_v2')

create_dv_case_data_v3 = f"""
SELECT *
FROM __temp__.dv_case_data_v2

WHERE RECEIPT_DATE >= cast('2011-01-01'as timestamp) 
AND RECEIPT_DATE  <= cast('2022-06-30'as timestamp)

ORDER BY Year,
Quarter;
"""

pydb.create_temp_table(create_dv_case_data_v3,'dv_case_data_v3')

In [4]:
#DV_CASE_DATA_count = pydb.read_sql_query("SELECT count(*) as count from fcsq.DV_APPS_FINAL;")
#DV_CASE_DATA_count

## 4. Applicant_Info table - joins the roles, parties and address tables for applicants to get information on the applicants
<a name="DV_applicant_info"></a>

### Drop the Applicant_Info table if it already exists and remove its data from the S3 bucket

In [5]:
drop_Applicants_Table = "DROP TABLE IF EXISTS fcsq.Applicants"
pydb.start_query_execution_and_wait(drop_Applicants_Table)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/Applicants").delete();

### Create the Applicant_Info table in Athena

In [6]:
create_Applicants_Table = f"""
CREATE TABLE IF NOT EXISTS fcsq.Applicants
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/Applicants') AS
 SELECT DISTINCT
   {database}.roles.ROLE, 
   {database}.roles.REPRESENTATIVE_ROLE, 
   {database}.roles.ROLE_MODEL, 
   {database}.roles.PARTY, 
   {database}.roles.CASE_NUMBER, 
   {database}.parties.PERSON_GIVEN_FIRST_NAME, 
   {database}.parties.PERSON_FAMILY_NAME, 
   {database}.parties.COMPANY, 
   {database}.addresses.POSTCODE, 
   {database}.parties.GENDER, 
   {database}.roles.DELETE_FLAG
FROM 
  ({database}.roles INNER JOIN {database}.parties ON {database}.roles.PARTY = {database}.parties.PARTY) 
  INNER JOIN {database}.addresses ON {database}.roles.ADDRESS = {database}.addresses.ADDRESS
WHERE (((({database}.roles.ROLE_MODEL)= 'APLC') AND (({database}.roles.DELETE_FLAG)= 'N'))
    OR ((({database}.roles.ROLE_MODEL)= 'APLZ') AND (({database}.roles.DELETE_FLAG)= 'N')) 
    OR ((({database}.roles.ROLE_MODEL)= 'APLA') AND (({database}.roles.DELETE_FLAG)= 'N')))
    AND {database}.roles.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.parties.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.addresses.mojap_snapshot_date = date '{snapshot_date}';
"""

pydb.start_query_execution_and_wait(create_Applicants_Table);


In [7]:
#pydb.read_sql_query("SELECT * from fcsq.Applicants where case_number='NE19Z02909'")

#### Applicants validation

In [8]:
#Applicants_count = pydb.read_sql_query("SELECT count(*) as count from fcsq.Applicants")
#Applicants_count

## 5. Respondent_Info table - joins the roles, parties and address tables for respondents to get information on the respondents
<a name="DV_respondent_info"></a>

### Drop the Respondent_Info table if it already exists and remove its data from the S3 bucket

In [9]:
drop_Respondents_Table = "DROP TABLE IF EXISTS fcsq.Respondents"
pydb.start_query_execution_and_wait(drop_Respondents_Table)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/Respondents").delete();

### Create the Respondent_Info table in Athena

In [10]:
create_Respondents_Table = f"""
CREATE TABLE IF NOT EXISTS fcsq.Respondents
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/Respondents') AS
SELECT DISTINCT
  {database}.roles.ROLE, 
  {database}.roles.REPRESENTATIVE_ROLE, 
  {database}.roles.ROLE_MODEL, 
  {database}.roles.PARTY, 
  {database}.roles.CASE_NUMBER, 
  {database}.parties.GENDER, 
  {database}.addresses.POSTCODE, 
  {database}.roles.DELETE_FLAG
FROM 
  ({database}.roles INNER JOIN {database}.parties ON {database}.roles.PARTY = {database}.parties.PARTY) 
  INNER JOIN {database}.addresses ON {database}.roles.ADDRESS = {database}.addresses.ADDRESS
WHERE 
    (((({database}.roles.ROLE_MODEL)='RSPA') AND (({database}.roles.DELETE_FLAG)='N')) 
    OR ((({database}.roles.ROLE_MODEL)='RSPZ') AND (({database}.roles.DELETE_FLAG)='N'))
    OR ((({database}.roles.ROLE_MODEL)='RSPC') AND (({database}.roles.DELETE_FLAG)='N')))
    AND {database}.roles.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.parties.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.addresses.mojap_snapshot_date = date '{snapshot_date}';
"""

pydb.start_query_execution_and_wait(create_Respondents_Table);

#### Respondents validation

In [11]:
#Respondents_count = pydb.read_sql_query("SELECT count(*) as count from fcsq.Respondents")
#Respondents_count

In [12]:
#pydb.read_sql_query("select * from fcsq.Respondents where case_number='HB12Z00228'")

## 6. Applicants temporary tables - takes the Applicants_Info table and reformats it, decoding gender and representative role values into strings
<a name="DV_applicants_temp"></a>

In [13]:
create_applicants_v1 = f"""
SELECT T1.role,
    T1.representative_role,
    T1.role_model,
    T1.party,
    T1.case_number,
    T1.gender,
    case when gender = 1 then 'Male'
    when gender = 2 then 'Female'
    else 'Unknown' end as Gender_Decode

FROM fcsq.Applicants AS t1
ORDER BY t1.Case_Number;
"""

pydb.create_temp_table(create_applicants_v1,'applicants_v1')

create_applicants_v2 = f"""
SELECT DISTINCT T1.case_number,
    T1.party,
    max(T1.representative_role) as Rep_Role,
    max(T1.gender_decode) as Gender_Max

FROM __temp__.applicants_v1 AS t1
GROUP BY Case_number, party;
"""

pydb.create_temp_table(create_applicants_v2,'applicants_v2')

create_applicants_v3 = f"""
SELECT t1.case_number,
    t1.party as App_Party_ID,
    t1.Rep_Role,
    t1.Gender_Max,
    case when t1.Rep_Role IS NULL then 'N'
    when t1.Rep_Role IS NOT NULL then 'Y'
    End as REPRESENTATION,
    case when Rep_Role IS NULL AND Gender_Max = 'Female' then 'Unrep_Female'
    when Rep_Role IS NULL AND Gender_Max = 'Male' then 'Unrep_Male'
    when Rep_Role IS NULL AND Gender_Max = 'Unknown' then 'Unrep_Unknown'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Female' then 'Rep_Female'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Male' then 'Rep_Male'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Unknown' then 'Rep_Unknown'
    else '' end as App_Rep_Cat
    
    from __temp__.applicants_v2 AS t1;
"""

pydb.create_temp_table(create_applicants_v3,'applicants_v3')

## 7. Respondents temporary tables - takes the Respondents_Info table and reformats it, decoding gender and representative role values into strings
<a name="DV_respondents_temp"></a>

In [14]:
create_respondents_v1 = f"""
SELECT T1.role,
    T1.representative_role,
    T1.role_model,
    T1.party,
    T1.case_number,
    T1.gender,
    case when gender = 1 then 'Male'
    when gender = 2 then 'Female'
    else 'Unknown' end as Gender_Decode

FROM fcsq.Respondents AS t1
ORDER BY t1.Case_Number;
"""

pydb.create_temp_table(create_respondents_v1,'respondents_v1')

create_respondents_v2 = f"""
SELECT DISTINCT T1.case_number,
    T1.party,
    max(T1.representative_role) as Rep_Role,
    max(T1.gender_decode) as Gender_Max

FROM __temp__.respondents_v1 AS t1
GROUP BY Case_number, party;
"""

pydb.create_temp_table(create_respondents_v2,'respondents_v2')

create_respondents_v3 = f"""
SELECT t1.case_number,
    t1.party as Resp_Party_ID,
    t1.Rep_Role,
    t1.Gender_Max,
    case when t1.Rep_Role IS NULL then 'N'
    when t1.Rep_Role IS NOT NULL then 'Y'
    End as REPRESENTATION,
    case when Rep_Role IS NULL AND Gender_Max = 'Female' then 'Unrep_Female'
    when Rep_Role IS NULL AND Gender_Max = 'Male' then 'Unrep_Male'
    when Rep_Role IS NULL AND Gender_Max = 'Unknown' then 'Unrep_Unknown'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Female' then 'Rep_Female'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Male' then 'Rep_Male'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Unknown' then 'Rep_Unknown'
    else '' end as Resp_Rep_Cat
    
    from __temp__.respondents_v2 AS t1;
"""

pydb.create_temp_table(create_respondents_v3,'respondents_v3')

## 8. Create dv_app_rep table - joins the case data with the applicant representation data 
<a name="DV_app_rep"></a>

In [15]:
dv_app_rep_final = f"""
SELECT t1.YEAR, 
    t1.QUARTER,
    t1.CASE_NUMBER, 
    t1.EVENT_COURT AS Court,
    t2.App_Party_ID,
    t2.Representation,
    t2.Gender_Max as App_Gender,
    t2.App_Rep_Cat          
FROM __temp__.dv_case_data_v3 t1
    LEFT JOIN __temp__.applicants_v3 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER);

"""

pydb.create_temp_table(dv_app_rep_final,'dv_app_rep_final')

In [16]:
#dv_app_rep_final_check = "SELECT COUNT(*) as Count from __temp__.dv_app_rep_final"
#pydb.read_sql_query(dv_app_rep_final_check)

## 9. Create dv_resp_rep table - joins the case data with the respondent representation data
<a name="DV_resp_rep"></a>

In [17]:
dv_resp_rep_final = f"""
   SELECT t1.YEAR, 
        t1.QUARTER,
        t1.CASE_NUMBER, 
        t1.EVENT_COURT AS Court,
          t2.Resp_Party_ID,
          t2.Representation,
          t2.Gender_Max as Resp_Gender,
          t2.Resp_Rep_Cat
          
      FROM __temp__.dv_case_data_v3 t1
           LEFT JOIN __temp__.respondents_v3 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER);
"""

pydb.create_temp_table(dv_resp_rep_final,'dv_resp_rep_final')


In [18]:
#dv_resp_rep_final_check = "SELECT COUNT(*) as Count from __temp__.dv_resp_rep_final"
#pydb.read_sql_query(dv_resp_rep_final_check)

In [19]:
#pydb.read_sql_query("select * from __temp__.dv_resp_rep_final")

## 10. Create DV_Hearing_Events table - takes the domestic violence hearing events from the hearings table and joins it with the events table data 
<a name="DV_Hearing_Events"></a>

### Drop the DV_Hearing_Events table if it already exists and remove its data from the S3 bucket

In [20]:
drop_DV_Hearing_Events = "DROP TABLE IF EXISTS fcsq.DV_Hearing_Events"
pydb.start_query_execution_and_wait(drop_DV_Hearing_Events)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/DV_Hearing_Events").delete();

### Create the DV_Hearing_Events table in Athena

In [21]:
create_DV_Hearing_Events = f"""
CREATE TABLE IF NOT EXISTS fcsq.DV_Hearing_Events
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/DV_Hearing_Events') AS
SELECT {database}.hearings.EVENT,
  {database}.hearings.VACATED_FLAG,
  {database}.hearings.HEARING_TYPE,
  {database}.hearings.HEARING_DATE,
  {database}.events.RECEIPT_DATE,
  {database}.events.ERROR,
  {database}.events.CASE_NUMBER,
  {database}.events.EVENT_MODEL
FROM {database}.hearings
INNER JOIN {database}.events
ON {database}.hearings.EVENT            = {database}.events.EVENT
WHERE {database}.hearings.VACATED_FLAG IS NULL
AND {database}.events.ERROR             = 'N'
AND HEARING_DATE > date_parse('31-12-2009 00:00:00', '%d-%m-%Y %H:%i:%s')
AND substring(case_number,5,1) = 'F'
AND {database}.hearings.mojap_snapshot_date = date '{snapshot_date}' and {database}.events.mojap_snapshot_date = date '{snapshot_date}';
"""

pydb.start_query_execution_and_wait(create_DV_Hearing_Events);

#### DV_Hearing_Events validation

In [22]:
#DV_Hearing_Events_count = pydb.read_sql_query("select count(*) as count from fcsq.DV_Hearing_Events")
#DV_Hearing_Events_count

In [23]:
#pydb.read_sql_query("select * from fcsq.DV_Hearing_Events")

## 11. DV_Hearing_Cases table - takes the hearing events, filters by domestic violence event codes and adds a flag whether the hearing is the first in the case
<a name="DV_Hearing_Cases"></a> 

### Drop the DV_Hearing_Cases table if it already exists and remove its data from the S3 bucket

In [24]:
drop_DV_Hearings_Cases = "DROP TABLE IF EXISTS fcsq.DV_Hearings_Cases"
pydb.start_query_execution_and_wait(drop_DV_Hearings_Cases)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/DV_Hearings_Cases").delete();

### Create the DV_Hearing_Cases table in Athena

In [25]:
create_DV_Hearings_Cases = f"""
CREATE TABLE IF NOT EXISTS fcsq.DV_Hearings_Cases
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/DV_Hearings_Cases') AS
select t1.case_number,
    t1.error,
    t1.event,
    t1.event_model,
    t1.hearing_date,
    t1.hearing_type,
    t1.receipt_date,
    t1.vacated_flag,
    substring(Case_Number,5,1) AS Case_Type
    from fcsq.DV_Hearing_Events AS t1
    where t1.event_model in ('C6', 'G61', 'FL402', 'FL405')
    order by t1.case_number, t1.receipt_date;
"""

pydb.start_query_execution_and_wait(create_DV_Hearings_Cases)

create_dv_hearings_cases_v2 = f"""
SELECT *,
(case when row_number() over (partition by Case_Number order by receipt_date) = 1 then 1 else 0 end) as Case_Number_ID
FROM fcsq.DV_Hearings_Cases;
"""

pydb.create_temp_table(create_dv_hearings_cases_v2,'dv_hearings_cases_v2')

#### DV_Hearing_Cases validation

In [26]:
#DV_Hearings_Cases_count = pydb.read_sql_query("select count(*) as count from __temp__.dv_hearings_cases_v2")
#DV_Hearings_Cases_count

## 12. Hearing_DV_Applicants table - joins the cases from applicant representation table to the first hearing for the case in the DV_Hearing_Cases table 
<a name="Hearing_DV_Applicants"></a>

### Drop the Hearing_DV_Applicant table if it already exists and remove its data from the S3 bucket

In [27]:
drop_Hearing_DV_Applicant = "DROP TABLE IF EXISTS fcsq.Hearing_DV_Applicant"
pydb.start_query_execution_and_wait(drop_Hearing_DV_Applicant)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/Hearing_DV_Applicant").delete();

### Create the Hearing_DV_Applicant table in Athena

In [28]:
create_Hearing_DV_Applicant = f"""
CREATE TABLE IF NOT EXISTS fcsq.Hearing_DV_Applicant
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/Hearing_DV_Applicants') AS
SELECT t1.*,
t2.Case_Number_ID

FROM __temp__.dv_app_rep_final t1
LEFT JOIN __temp__.dv_hearings_cases_v2 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)
where Case_Number_ID > 0;
"""

pydb.start_query_execution_and_wait(create_Hearing_DV_Applicant)

{'QueryExecutionId': 'f8a48a2e-dbaf-4ddb-bfc4-94363270e35d',
 'Query': "CREATE TABLE IF NOT EXISTS fcsq.Hearing_DV_Applicant\nWITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/Hearing_DV_Applicants') AS\nSELECT t1.*,\nt2.Case_Number_ID\n\nFROM mojap_de_temp_alpha_user_jdlv500.dv_app_rep_final t1\nLEFT JOIN mojap_de_temp_alpha_user_jdlv500.dv_hearings_cases_v2 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)\nwhere Case_Number_ID > 0",
 'StatementType': 'DDL',
 'ResultConfiguration': {'OutputLocation': 's3://aws-athena-query-results-593291632749-eu-west-1/tables/f8a48a2e-dbaf-4ddb-bfc4-94363270e35d'},
 'QueryExecutionContext': {},
 'Status': {'State': 'SUCCEEDED',
  'SubmissionDateTime': datetime.datetime(2022, 12, 30, 12, 59, 25, 465000, tzinfo=tzlocal()),
  'CompletionDateTime': datetime.datetime(2022, 12, 30, 12, 59, 27, 746000, tzinfo=tzlocal())},
 'Statistics': {'EngineExecutionTimeInMillis': 2152,
  'DataScannedInBytes': 5269332,
  'Dat

#### Hearing_DV_Applicant validation

In [29]:
#Hearing_DV_Applicant_count = pydb.read_sql_query("select count(*) as count from fcsq.Hearing_DV_Applicant")
#Hearing_DV_Applicant_count

## 13. Hearing_DV_Respondent table - joins the cases from respondent representation table to the first hearing for the case in the DV_Hearing_Cases table 
<a name="Hearing_DV_Respondents"></a>

### Drop the Hearing_DV_Respondent table if it already exists and remove its data from the S3 bucket

In [30]:
drop_Hearing_DV_Respondent = "DROP TABLE IF EXISTS fcsq.Hearing_DV_Respondent"
pydb.start_query_execution_and_wait(drop_Hearing_DV_Respondent)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/Hearing_DV_Respondent").delete();

### Create the Hearing_DV_Respondent table in Athena

In [31]:
create_Hearing_DV_Respondent = f"""
CREATE TABLE IF NOT EXISTS fcsq.Hearing_DV_Respondent
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/Hearing_DV_Respondents') AS
SELECT t1.*,
t2.Case_Number_ID

FROM __temp__.dv_resp_rep_final t1
LEFT JOIN __temp__.dv_hearings_cases_v2 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)
where Case_Number_ID > 0;
"""

pydb.start_query_execution_and_wait(create_Hearing_DV_Respondent)

{'QueryExecutionId': 'af1545dd-6412-4a73-85c6-b556650ec338',
 'Query': "CREATE TABLE IF NOT EXISTS fcsq.Hearing_DV_Respondent\nWITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/Hearing_DV_Respondents') AS\nSELECT t1.*,\nt2.Case_Number_ID\n\nFROM mojap_de_temp_alpha_user_jdlv500.dv_resp_rep_final t1\nLEFT JOIN mojap_de_temp_alpha_user_jdlv500.dv_hearings_cases_v2 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)\nwhere Case_Number_ID > 0",
 'StatementType': 'DDL',
 'ResultConfiguration': {'OutputLocation': 's3://aws-athena-query-results-593291632749-eu-west-1/tables/af1545dd-6412-4a73-85c6-b556650ec338'},
 'QueryExecutionContext': {},
 'Status': {'State': 'SUCCEEDED',
  'SubmissionDateTime': datetime.datetime(2022, 12, 30, 12, 59, 32, 975000, tzinfo=tzlocal()),
  'CompletionDateTime': datetime.datetime(2022, 12, 30, 12, 59, 35, 629000, tzinfo=tzlocal())},
 'Statistics': {'EngineExecutionTimeInMillis': 2536,
  'DataScannedInBytes': 5311305,
  '

#### Hearing_DV_Respondent validation

In [32]:
#Hearing_DV_Respondent_count = pydb.read_sql_query("select count(*) as count from fcsq.Hearing_DV_Respondent")
#Hearing_DV_Respondent_count

In [33]:
#pydb.read_sql_query("select * from fcsq.Hearing_DV_Respondent")

## 14. DV_App table - Groups the hearing_dv_applicants table and produces a count per group
<a name="DV_App"></a>

### Drop the DV_App table if it already exists and remove its data from the S3 bucket

In [34]:
drop_DV_App = "DROP TABLE IF EXISTS fcsq.DV_App"
pydb.start_query_execution_and_wait(drop_DV_App)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/DV_App").delete();

### Create the DV_App table in Athena

In [35]:
create_DV_App = f"""
CREATE TABLE IF NOT EXISTS fcsq.DV_App
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/DV_App') AS
SELECT
  'Domestic Violence' AS CASE_TYPE,
  Year,
  Quarter,
  'Party' AS Category,
  'Applicant' AS PARTY,
   App_Gender AS Gender,
  Representation,
  Count (*) AS Count
FROM
  fcsq.Hearing_DV_Applicant
WHERE 
  Representation <> '' /*A very small number of cases from 2011/12 look into whether these should be recoded as N (gender is also blank)*/
GROUP BY
  'Domestic Violence',
  Year,
  Quarter,
  'Party',
  'Applicant',
  App_Gender,
  Representation;
"""

pydb.start_query_execution_and_wait(create_DV_App);

#### DV_App validation

In [36]:
#DV_App_count = pydb.read_sql_query("select count(*) as count from fcsq.DV_App")
#DV_App_count

## 15. DV_Resp table - Groups the hearing_dv_respondents table and produces a count per group
<a name="DV_Resp"></a>

### Drop the DV_Resp table if it already exists and remove its data from the S3 bucket

In [37]:
drop_DV_Resp = "DROP TABLE IF EXISTS fcsq.DV_Resp"
pydb.start_query_execution_and_wait(drop_DV_Resp)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/DV_Resp").delete();

### Create the DV_Resp table in Athena

In [38]:
create_DV_Resp = f"""
CREATE TABLE IF NOT EXISTS fcsq.DV_resp
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/DV_Resp') AS
SELECT
  'Domestic Violence' AS CASE_TYPE,
  Year,
  Quarter,
  'Party' AS Category,
  'Respondent' AS PARTY,
   Resp_Gender AS Gender,
  Representation,
  Count (*) AS Count
FROM
  fcsq.Hearing_DV_Respondent
WHERE 
  Representation <> '' /*A very small number of cases from 2011/12 look into whether these should be recoded as N (gender is also blank)*/
GROUP BY
  'Domestic Violence',
  Year,
  Quarter,
  'Party',
  'Respondent',
  Resp_Gender,
  Representation;
"""

pydb.start_query_execution_and_wait(create_DV_Resp);

#### DV_Resp validation

In [39]:
#DV_Resp_count = pydb.read_sql_query("select count(*) as count from fcsq.DV_Resp")
#DV_Resp_count

## 16. dv_case table - groups and formats dv_case_data_v3 table and gives a count for each group
<a name="DV_Case"></a>

### Drop the dv_case table if it already exists and remove its data from the S3 bucket

In [40]:
drop_DV_case = "DROP TABLE IF EXISTS fcsq.DV_case"
pydb.start_query_execution_and_wait(drop_DV_case)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/DV_case").delete();

### Create the dv_case table in Athena

In [41]:
create_DV_case = f"""
CREATE TABLE IF NOT EXISTS fcsq.DV_case
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/DV_case') AS
SELECT *, Count(*) as Count FROM
    (SELECT
      'Domestic Violence' AS CASE_TYPE,
      Year,
      Quarter,
      'Cases' AS Category,
      ' ' AS PARTY,
      ' ' AS Gender,
      ' ' AS Representation
    FROM
      __temp__.dv_case_data_v3)

GROUP BY
  CASE_TYPE,
  Year,
  Quarter,
  Category,
  PARTY,
  Gender,
  Representation

ORDER BY 
    Year,
    Quarter;
"""

pydb.start_query_execution_and_wait(create_DV_case);

#### DV_case validation

In [42]:
#DV_case_count = pydb.read_sql_query("select count(*) as count from fcsq.DV_case")
#DV_case_count

## 17. DV_Case_Hearings table - creates a count of all the cases with a hearing per quarter
<a name="DV_Case_Hearings"></a>

### Drop the DV_Case_Hearings table if it already exists and remove its data from the S3 bucket

In [43]:
drop_DV_Case_Hearings = "DROP TABLE IF EXISTS fcsq.DV_Case_Hearings"
pydb.start_query_execution_and_wait(drop_DV_Case_Hearings)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/DV_Case_Hearings").delete();

### Create the DV_Case_Hearings table in Athena

In [44]:
create_hearing_dv_case =f"""
SELECT DISTINCT Year, Quarter, Case_Number
FROM fcsq.Hearing_DV_Applicant;
"""

pydb.create_temp_table(create_hearing_dv_case,'hearing_dv_case')



create_DV_Case_Hearings = f"""
CREATE TABLE IF NOT EXISTS fcsq.DV_Case_Hearings
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/DV_Case_Hearings') AS
SELECT *, Count(*) as Count FROM
    (SELECT
      'Domestic Violence' AS CASE_TYPE,
      Year,
      Quarter,
      'Cases with a hearing' AS Category,
      ' ' AS PARTY,
      ' ' AS Gender,
      ' ' AS Representation
    FROM
      __temp__.hearing_dv_case)
GROUP BY
  CASE_TYPE,
  Year,
  Quarter,
  Category,
  PARTY,
  Gender,
  Representation
ORDER BY 
  Year,
  Quarter;
"""

pydb.start_query_execution_and_wait(create_DV_Case_Hearings);

In [45]:
#DV_Case_Hearings = pydb.read_sql_query("select count(*) as count from fcsq.DV_Case_Hearings")
#DV_Case_Hearings

## 18. Domestic Violence table - Joins the applicant/respondent representation count tables, case count table, and case hearing count tables 
<a name="Domestic_Violence"></a>

### Drop the Domestic Violence table if it already exists and remove its data from the S3 bucket

In [46]:
drop_Domestic_Violence = "DROP TABLE IF EXISTS fcsq.Domestic_Violence"
pydb.start_query_execution_and_wait(drop_Domestic_Violence)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/Domestic_Violence").delete();

### Create the Domestic Violence table in Athena

In [47]:
create_Domestic_Violence = f"""
CREATE TABLE IF NOT EXISTS fcsq.Domestic_Violence
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/Domestic_Violence') AS
SELECT
  *
FROM
 fcsq.DV_APP
UNION ALL
SELECT
  *
FROM
  fcsq.DV_RESP
UNION ALL
SELECT
  *
FROM
  fcsq.DV_CASE
UNION ALL
SELECT
  *
FROM
  fcsq.DV_CASE_HEARINGS;
"""

pydb.start_query_execution_and_wait(create_Domestic_Violence);

#### Domestic Violence validation

In [48]:
#Domestic_Violence_count = pydb.read_sql_query("select count(*) as count from fcsq.Domestic_Violence")
#Domestic_Violence_count

In [49]:
legrep_lookup = f"""
SELECT
case_type || '|' || cast(year as varchar(10)) || '|' || cast(quarter as varchar(1)) || '|' as lookup,
case_type,
year,
cast(quarter as varchar(3)) as quarter,
category,
party,
gender,
representation,
count
from fcsq.Domestic_Violence
order by lookup

"""
pydb.create_temp_table(legrep_lookup,'legrep_lookup')

#pydb.read_sql_query("SELECT * from __temp__.legrep_lookup")

In [50]:
legrep_with_annual = f"""
SELECT
* from __temp__.legrep_lookup
UNION ALL
SELECT 
lookup,
case_type,
year,
quarter,
category,
party,
gender,
representation,
sum(count) from
(
SELECT
case_type || '|' || cast(year as varchar(10)) || '|' as lookup,
case_type,
year,
'N/A' as quarter,
category,
party,
gender,
representation,
count
from __temp__.legrep_lookup where year<>{latest_year}
)
group by 
lookup,
case_type,
year,
quarter,
category,
party,
gender,
representation
order by lookup
"""

pydb.create_temp_table(legrep_with_annual,'legrep_with_annual')

In [51]:
legrep_cases = f"""
SELECT lookup,
sum(count) as cases
from __temp__.legrep_with_annual
where category='Cases'
group by lookup
order by lookup
"""

pydb.create_temp_table(legrep_cases,'legrep_cases')

In [52]:
legrep_hearing_cases = f"""
SELECT lookup,
sum(count) as cases_hearing
from __temp__.legrep_with_annual
where category='Cases with a hearing'
group by lookup
order by lookup
"""
pydb.create_temp_table(legrep_hearing_cases,'legrep_hearing_cases')

In [53]:
legrep_app_rep = f"""
SELECT lookup,
sum(count) as app_rep
from __temp__.legrep_with_annual
where category='Party' and party = 'Applicant' and representation = 'Y'
group by lookup
order by lookup
"""

pydb.create_temp_table(legrep_app_rep,'legrep_app_rep')

In [54]:
legrep_app_unrep = f"""
SELECT lookup,
sum(count) as app_unrep
from __temp__.legrep_with_annual
where category='Party' and party = 'Applicant' and representation = 'N'
group by lookup
order by lookup
"""

pydb.create_temp_table(legrep_app_unrep,'legrep_app_unrep')

In [55]:
legrep_res_rep = f"""
SELECT lookup,
sum(count) as res_rep
from __temp__.legrep_with_annual
where category='Party' and party = 'Respondent' and representation = 'Y'
group by lookup
order by lookup
"""

pydb.create_temp_table(legrep_res_rep,'legrep_res_rep')

In [56]:
legrep_res_unrep = f"""
SELECT lookup,
sum(count) as res_unrep
from __temp__.legrep_with_annual
where category='Party' and party = 'Respondent' and representation = 'N'
group by lookup
order by lookup
"""

pydb.create_temp_table(legrep_res_unrep,'legrep_res_unrep')

In [57]:
legrep_total_parties = f"""
SELECT lookup,
sum(count) as total_parties
from __temp__.legrep_with_annual
where category='Party'
group by lookup
order by lookup
"""

pydb.create_temp_table(legrep_total_parties,'legrep_total_parties')

In [58]:
legrep_final_output = f"""
SELECT
t1.lookup,
t1.cases,
t2.cases_hearing,
t3.app_rep,
t4.app_unrep,
t5.res_rep,
t6.res_unrep,
t7.total_parties

from
((((((__temp__.legrep_cases t1 
INNER JOIN
__temp__.legrep_hearing_cases t2
ON 
t1.lookup = t2.lookup)

INNER JOIN 
__temp__.legrep_app_rep t3
ON 
t1.lookup = t3.lookup)

INNER JOIN 
__temp__.legrep_app_unrep t4
ON 
t1.lookup = t4.lookup)

INNER JOIN 
__temp__.legrep_res_rep t5
ON 
t1.lookup = t5.lookup)

INNER JOIN 
__temp__.legrep_res_unrep t6
ON 
t1.lookup = t6.lookup)

INNER JOIN 
__temp__.legrep_total_parties t7
ON 
t1.lookup = t7.lookup)
"""

df = pydb.read_sql_query(legrep_final_output)
df.to_csv(path_or_buf = 's3://alpha-family-data/CSVs/Domestic_Violence/domestic_violence_legrep.csv',index=False)

## 19. DV_applications_data_sorted table - Orders dv_application_5 by case_number and application date
<a name="DV_applications_data_sorted"></a>

### Drop the dv_applications_data_sorted table if it already exists and remove its data from the S3 bucket

In [59]:
drop_dv_applications_data_sorted = "DROP TABLE IF EXISTS fcsq.dv_applications_data_sorted"
pydb.start_query_execution_and_wait(drop_dv_applications_data_sorted)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/dv_applications_data_sorted").delete();

### Create the dv_applications_data_sorted table in Athena

In [60]:
create_dv_applications_data_sorted = f"""
CREATE TABLE IF NOT EXISTS fcsq.dv_applications_data_sorted
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/dv_applications_data_sorted') AS
SELECT t1.*
      FROM fcsq.DV_APPS_FINAL t1
      ORDER BY t1.YEAR,
               t1.QUARTER,
               t1.RECEIPT_DATE,
               t1.CASE_NUMBER,
               t1.Description;
"""

pydb.start_query_execution_and_wait(create_dv_applications_data_sorted);

#### DV_applications_data_sorted validation

In [61]:
#dv_applications_data_sorted_count = pydb.read_sql_query("select count(*) as count from fcsq.dv_applications_data_sorted")
#dv_applications_data_sorted_count

## 20. Create dv_applications temporary tables - takes the dv_applications_data sorted table and filters it so it only has the first application record per case number
<a name="DV_applications_temp"></a>

In [62]:
create_dv_applications_1 = f"""
SELECT *, row_number() over (order by CASE_NUMBER, RECEIPT_DATE) as SEQ_NUM
      FROM fcsq.dv_applications_data_sorted;
"""
pydb.create_temp_table(create_dv_applications_1,'dv_applications_1')

create_dv_applications_2 = f"""
SELECT DISTINCT CASE_NUMBER, Description, EVENT, (MIN(Seq_Num)) AS MIN_of_Seq_Num
FROM __temp__.dv_applications_1
GROUP BY CASE_NUMBER, Description, EVENT;
"""

pydb.create_temp_table(create_dv_applications_2,'dv_applications_2')

create_dv_applications_3 = f"""
SELECT t1.CASE_NUMBER, 
    t2.RECEIPT_DATE, 
    t2.YEAR, 
    t2.QUARTER, 
    t2.EVENT, 
    t2.EVENT_COURT, 
    t2.Description
FROM __temp__.dv_applications_2 t1
LEFT JOIN __temp__.dv_applications_1 t2 ON (t1.MIN_of_Seq_NUM = t2.Seq_NUM) AND (t1.Description = 
t2.Description);
"""

pydb.create_temp_table(create_dv_applications_3,'dv_applications_3')

In [63]:
c = pydb.read_sql_query("SELECT * from __temp__.dv_applications_1 where case_number = 'BC11F01034'")
c

Unnamed: 0,year,quarter,receipt_date,case_number,event,event_court,field_model,adjusted_value,case_type,description,seq_num
0,2011,4,2011-11-10,BC11F01034,13100525429,131,U22_AT,", ONM, ONO,",Domestic Violence,On Notice Occupation,6803
1,2011,4,2011-11-10,BC11F01034,13100525429,131,U22_AT,", ONM, ONO,",Domestic Violence,On Notice Non-Molestation,6804
2,2011,4,2011-11-17,BC11F01034,13100526297,131,G50_AT,", ONM, ONO,",Domestic Violence,On Notice Occupation,6805
3,2011,4,2011-11-17,BC11F01034,13100526297,131,G50_AT,", ONM, ONO,",Domestic Violence,On Notice Non-Molestation,6806


In [64]:
c = pydb.read_sql_query("SELECT * from __temp__.dv_applications_2 where case_number = 'BC11F01034'")
c

Unnamed: 0,case_number,description,event,min_of_seq_num
0,BC11F01034,On Notice Non-Molestation,13100526297,6806
1,BC11F01034,On Notice Occupation,13100526297,6805
2,BC11F01034,On Notice Non-Molestation,13100525429,6804
3,BC11F01034,On Notice Occupation,13100525429,6803


In [65]:
c = pydb.read_sql_query("SELECT * from __temp__.dv_applications_3 where case_number = 'BC11F01034'")
c

Unnamed: 0,case_number,receipt_date,year,quarter,event,event_court,description
0,BC11F01034,2011-11-17,2011,4,13100526297,131,On Notice Non-Molestation
1,BC11F01034,2011-11-17,2011,4,13100526297,131,On Notice Occupation
2,BC11F01034,2011-11-10,2011,4,13100525429,131,On Notice Occupation
3,BC11F01034,2011-11-10,2011,4,13100525429,131,On Notice Non-Molestation


## 21. DV_orders_data_sorted table - Sorts the disposals table by case_number and receipt date, removing contact orders, placement revoke or vary orders and other type orders
<a name="DV_orders_data_sorted"></a>

### Drop the dv_orders_data_sorted table if it already exists and remove its data from the S3 bucket

In [66]:
drop_dv_orders_data_sorted = "DROP TABLE IF EXISTS fcsq.dv_orders_data_sorted"
pydb.start_query_execution_and_wait(drop_dv_orders_data_sorted)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/dv_orders_data_sorted").delete();

### Create the dv_orders_data_sorted table in Athena

In [67]:
create_dv_orders_data_sorted = f"""
CREATE TABLE IF NOT EXISTS fcsq.dv_orders_data_sorted
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/dv_orders_data_sorted') AS
SELECT t1.RECEIPT_DATE, 
    t1.CASE_NUMBER, 
    t1.EVENT, 
    t1.CREATING_COURT,
    CAST(EVENT AS VARCHAR(3)) AS EVENT_CODE,
    t1.FIELD_MODEL, 
    t1.VALUE
FROM fcsq.DV_ORDS1 t1
ORDER BY t1.CASE_NUMBER,
    t1.RECEIPT_DATE;
"""

pydb.start_query_execution_and_wait(create_dv_orders_data_sorted);

#### DV_orders_data_sorted validation

In [68]:
dv_orders_data_sorted_count = pydb.read_sql_query("select * from fcsq.dv_ords1 where case_number = 'BC11F01034'")
dv_orders_data_sorted_count

Unnamed: 0,receipt_date,case_number,event,creating_court,field_model,value,error
0,2011-11-16,BC11F01034,13100526140,BC,FL404B_7,GEN,N
1,2011-11-28,BC11F01034,13100527586,BC,FL404B_7,GEN,N
2,2011-11-22,BC11F01034,13100526816,BC,FL404B_7,GEN,N
3,2012-02-06,BC11F01034,13100534573,BC,FL404B_7,GEN,N
4,2011-12-23,BC11F01034,13100530580,BC,FL404B_7,OCC,N


In [69]:
dv_orders_data_sorted_count = pydb.read_sql_query("select * from fcsq.dv_orders_data_sorted where case_number = 'BC11F01034' order by receipt_date")
dv_orders_data_sorted_count

Unnamed: 0,receipt_date,case_number,event,creating_court,event_code,field_model,value
0,2011-11-16,BC11F01034,13100526140,BC,131,FL404B_7,GEN
1,2011-11-22,BC11F01034,13100526816,BC,131,FL404B_7,GEN
2,2011-11-28,BC11F01034,13100527586,BC,131,FL404B_7,GEN
3,2011-12-23,BC11F01034,13100530580,BC,131,FL404B_7,OCC
4,2012-02-06,BC11F01034,13100534573,BC,131,FL404B_7,GEN


In [70]:
c = pydb.read_sql_query("SELECT * from __temp__.dv_applications_3 where case_number = 'BC11F01034'")
c

Unnamed: 0,case_number,receipt_date,year,quarter,event,event_court,description
0,BC11F01034,2011-11-17,2011,4,13100526297,131,On Notice Non-Molestation
1,BC11F01034,2011-11-17,2011,4,13100526297,131,On Notice Occupation
2,BC11F01034,2011-11-10,2011,4,13100525429,131,On Notice Non-Molestation
3,BC11F01034,2011-11-10,2011,4,13100525429,131,On Notice Occupation


## 22. Create dv_orders temporary tables - takes the dv_orders_data sorted table and filters it so it only has the first order record per case number
<a name="DV_orders_temp"></a>

In [72]:
create_dv_orders_0 = f"""
SELECT t1.CASE_NUMBER, 
          t1.EVENT, 
          t1.CREATING_COURT, 
          t2.RECEIPT_DATE AS APP_DATE, 
          t1.RECEIPT_DATE AS DISP_DATE, 
          t2.Description AS APP_Order_Type, 
          t1.FIELD_MODEL, 
          t1.EVENT_CODE, 
          t1.VALUE, 
          DAY(t1.RECEIPT_DATE - t2.RECEIPT_DATE) AS DATE_DIFF
      FROM fcsq.dv_orders_data_sorted t1
           INNER JOIN __temp__.dv_applications_3 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)
      WHERE DAY(t1.RECEIPT_DATE - t2.RECEIPT_DATE) >= 0;
"""
pydb.create_temp_table(create_dv_orders_0,'dv_orders_0')


create_dv_orders_1 = f"""
SELECT DISTINCT t1.CASE_NUMBER, 
          t1.EVENT, 
          t1.CREATING_COURT, 
          t1.EVENT_CODE, 
          t1.APP_Order_Type, 
          t1.APP_DATE, 
          t1.DISP_DATE, 
          t1.FIELD_MODEL, 
          t1.VALUE,
          row_number() over (ORDER BY CASE_NUMBER, APP_Order_Type, APP_DATE, DISP_DATE) as SEQ_NUM
      FROM __temp__.dv_orders_0 t1
      ORDER BY t1.CASE_NUMBER,
               t1.APP_Order_Type,
               t1.APP_DATE,
               t1.DISP_DATE,
               t1.EVENT;
"""
pydb.create_temp_table(create_dv_orders_1,'dv_orders_1')

create_dv_orders_2 = f"""
SELECT DISTINCT t1.CASE_NUMBER, 
          t1.APP_Order_Type, 
          t1.APP_DATE, 
          (MIN(t1.SEQ_NUM)) AS MIN_SEQ_NUM
      FROM __temp__.dv_orders_1 t1
      GROUP BY t1.CASE_NUMBER,
               t1.APP_Order_Type,
               t1.APP_DATE;
"""
pydb.create_temp_table(create_dv_orders_2,'dv_orders_2')


create_dv_orders_3 = f"""
SELECT DISTINCT t1.CASE_NUMBER, 
          t1.APP_Order_Type, 
          t1.APP_DATE, 
          t2.DISP_DATE, 
          t2.EVENT, 
          t2.CREATING_COURT, 
          t2.EVENT_CODE, 
          t2.FIELD_MODEL, 
          t2.VALUE
      FROM __temp__.dv_orders_2 t1
           LEFT JOIN __temp__.dv_orders_1 t2 ON (t1.MIN_SEQ_NUM = t2.SEQ_NUM);
"""
pydb.create_temp_table(create_dv_orders_3,'dv_orders_3')

In [73]:
#dv_apps_and_orders_match_count = pydb.read_sql_query("select count(*) as count from __temp__.dv_orders_3")
#dv_apps_and_orders_match_count

In [74]:
dv_orders_data_sorted_count = pydb.read_sql_query("select * from __temp__.dv_orders_0 where case_number = 'BC11F01034'")
dv_orders_data_sorted_count

Unnamed: 0,case_number,event,creating_court,app_date,disp_date,app_order_type,field_model,event_code,value,date_diff
0,BC11F01034,13100526140,BC,2011-11-10,2011-11-16,On Notice Non-Molestation,FL404B_7,131,GEN,6
1,BC11F01034,13100526140,BC,2011-11-10,2011-11-16,On Notice Occupation,FL404B_7,131,GEN,6
2,BC11F01034,13100526816,BC,2011-11-10,2011-11-22,On Notice Non-Molestation,FL404B_7,131,GEN,12
3,BC11F01034,13100526816,BC,2011-11-10,2011-11-22,On Notice Occupation,FL404B_7,131,GEN,12
4,BC11F01034,13100526816,BC,2011-11-17,2011-11-22,On Notice Non-Molestation,FL404B_7,131,GEN,5
5,BC11F01034,13100526816,BC,2011-11-17,2011-11-22,On Notice Occupation,FL404B_7,131,GEN,5
6,BC11F01034,13100527586,BC,2011-11-10,2011-11-28,On Notice Non-Molestation,FL404B_7,131,GEN,18
7,BC11F01034,13100527586,BC,2011-11-10,2011-11-28,On Notice Occupation,FL404B_7,131,GEN,18
8,BC11F01034,13100527586,BC,2011-11-17,2011-11-28,On Notice Non-Molestation,FL404B_7,131,GEN,11
9,BC11F01034,13100527586,BC,2011-11-17,2011-11-28,On Notice Occupation,FL404B_7,131,GEN,11


In [75]:
dv_orders_data_sorted_count = pydb.read_sql_query("select * from __temp__.dv_orders_1 where case_number = 'BC11F01034'")
dv_orders_data_sorted_count

Unnamed: 0,case_number,event,creating_court,event_code,app_order_type,app_date,disp_date,field_model,value,seq_num
0,BC11F01034,13100526140,BC,131,On Notice Non-Molestation,2011-11-10,2011-11-16,FL404B_7,GEN,19632
1,BC11F01034,13100526816,BC,131,On Notice Non-Molestation,2011-11-10,2011-11-22,FL404B_7,GEN,19633
2,BC11F01034,13100527586,BC,131,On Notice Non-Molestation,2011-11-10,2011-11-28,FL404B_7,GEN,19634
3,BC11F01034,13100530580,BC,131,On Notice Non-Molestation,2011-11-10,2011-12-23,FL404B_7,OCC,19635
4,BC11F01034,13100534573,BC,131,On Notice Non-Molestation,2011-11-10,2012-02-06,FL404B_7,GEN,19636
5,BC11F01034,13100526816,BC,131,On Notice Non-Molestation,2011-11-17,2011-11-22,FL404B_7,GEN,19637
6,BC11F01034,13100527586,BC,131,On Notice Non-Molestation,2011-11-17,2011-11-28,FL404B_7,GEN,19638
7,BC11F01034,13100530580,BC,131,On Notice Non-Molestation,2011-11-17,2011-12-23,FL404B_7,OCC,19639
8,BC11F01034,13100534573,BC,131,On Notice Non-Molestation,2011-11-17,2012-02-06,FL404B_7,GEN,19640
9,BC11F01034,13100526140,BC,131,On Notice Occupation,2011-11-10,2011-11-16,FL404B_7,GEN,19641


In [76]:
dv_orders_data_sorted_count = pydb.read_sql_query("select * from __temp__.dv_orders_2 where case_number = 'BC11F01034'")
dv_orders_data_sorted_count

Unnamed: 0,case_number,app_order_type,app_date,max_seq_num
0,BC11F01034,On Notice Occupation,2011-11-10,19645
1,BC11F01034,On Notice Occupation,2011-11-17,19649
2,BC11F01034,On Notice Non-Molestation,2011-11-17,19640
3,BC11F01034,On Notice Non-Molestation,2011-11-10,19636


In [77]:
dv_orders_data_sorted_count = pydb.read_sql_query("select * from __temp__.dv_orders_3 where case_number = 'BC11F01034'")
dv_orders_data_sorted_count

Unnamed: 0,case_number,app_order_type,app_date,disp_date,event,creating_court,event_code,field_model,value
0,BC11F01034,On Notice Occupation,2011-11-17,2012-02-06,13100534573,BC,131,FL404B_7,GEN
1,BC11F01034,On Notice Occupation,2011-11-10,2012-02-06,13100534573,BC,131,FL404B_7,GEN
2,BC11F01034,On Notice Non-Molestation,2011-11-10,2012-02-06,13100534573,BC,131,FL404B_7,GEN
3,BC11F01034,On Notice Non-Molestation,2011-11-17,2012-02-06,13100534573,BC,131,FL404B_7,GEN


## 23. dv_apps_and_orders_match table - calculates the timeliness based on the difference between app_date and disp_date
<a name="DV_apps_and_orders_match"></a>

### Drop the dv_apps_and_orders_match table if it already exists and remove its data from the S3 bucket

In [None]:
drop_dv_apps_and_orders_match = "DROP TABLE IF EXISTS fcsq.dv_apps_and_orders_match"
pydb.start_query_execution_and_wait(drop_dv_apps_and_orders_match)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/dv_apps_and_orders_match").delete();

### Create the dv_apps_and_orders_match table in Athena

In [None]:
create_dv_apps_and_orders_match = f"""
CREATE TABLE IF NOT EXISTS fcsq.dv_apps_and_orders_match
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/dv_apps_and_orders_match') AS
SELECT t2.CASE_NUMBER, 
    t2.APP_Order_Type, 
    t2.APP_DATE, 
    t2.DISP_DATE, 
    t2.FIELD_MODEL,
    t2.EVENT_CODE AS DSP_COURT,
    /* Wait_weeks */
    (CAST (DAY(t2.Disp_Date-t2.App_date) as double)/7) AS Wait_weeks,
    YEAR(t2.DISP_DATE) AS Year,
    Month(t2.DISP_DATE) AS Month,
    t2.VALUE AS Orders
FROM __temp__.dv_orders_3 t2;
"""

pydb.start_query_execution_and_wait(create_dv_apps_and_orders_match);


In [None]:
#df = pydb.read_sql_query("select * from fcsq.dv_apps_and_orders_match")
#df

#### DV_apps_and_orders_match validation

In [None]:
#dv_apps_and_orders_match_count = pydb.read_sql_query("select count(*) as count from fcsq.dv_apps_and_orders_match")
#dv_apps_and_orders_match_count

## 24. Applicant_representation table - creates a table showing whether all, some, or none of the applicants for a case have representation
<a name="Applicant_representation"></a>

### Drop the applicant_representation table if it already exists and remove its data from the S3 bucket

In [None]:
drop_applicant_representation = "DROP TABLE IF EXISTS fcsq.applicant_representation"
pydb.start_query_execution_and_wait(drop_applicant_representation)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/applicant_representation").delete();

### Create the applicant_representation table in Athena

In [None]:
create_applicant_1 = f"""
SELECT Distinct t1.Case_Number, 
t1.Party,
MAX(t1.Representative_Role) as Max_Rep_Role

FROM fcsq.Applicants t1
Group by Case_Number, Party;
"""
pydb.create_temp_table(create_applicant_1,'applicant_1')


create_applicant_2 = f"""
SELECT  t1.*,
case when Max_Rep_Role IS NULL then 0
else 1
end as Rep_IND
FROM __temp__.applicant_1 t1;
"""
pydb.create_temp_table(create_applicant_2,'applicant_2')


create_applicant_3 = f"""
SELECT Distinct t1.Case_Number,
Count(t1.Party) as CountOfParty,
SUM(t1.Rep_Ind) as SumOfRep_IND
            
FROM __temp__.applicant_2 t1
Group by Case_Number;
"""
pydb.create_temp_table(create_applicant_3,'applicant_3')


create_applicant_representation = f"""
CREATE TABLE IF NOT EXISTS fcsq.applicant_representation
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/applicant_representation') AS
SELECT t1.Case_Number,
t1.CountOfParty,
t1.SumOfRep_IND,
case
when t1.SumOfRep_Ind > t1.CountOfParty then 'ERROR'
when t1.SumOfRep_Ind = t1.CountOfParty then 'ALL'
when t1.SumOfRep_Ind = 0 then 'NONE'
else 'SOME'
end as App_Rep_Cat

FROM __temp__.applicant_3 t1;
"""

pydb.start_query_execution_and_wait(create_applicant_representation);

#### Applicant_representation validation

In [None]:
#applicant_count = pydb.read_sql_query("select count(*) as count from fcsq.applicant_representation")
#applicant_count

## 25. Respondent_representation table - creates a table showing whether all, some, or none of the respondents for a case have representation
<a name="Respondent_representation"></a>

### Drop the respondent_representation table if it already exists and remove its data from the S3 bucket

In [None]:
drop_respondent_representation = "DROP TABLE IF EXISTS fcsq.respondent_representation"
pydb.start_query_execution_and_wait(drop_respondent_representation)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/respondent_representation").delete();

### Create the respondent_representation table in Athena

In [None]:
create_respondent_1 = f"""
SELECT Distinct t1.Case_Number, 
t1.Party,
MAX(t1.Representative_Role) as Max_Rep_Role

FROM fcsq.Respondents t1
Group by Case_Number, Party;
"""
pydb.create_temp_table(create_respondent_1,'respondent_1')


create_respondent_2 = f"""
SELECT  t1.*,
case when Max_Rep_Role IS NULL then 0
else 1
end as Rep_IND

FROM __temp__.respondent_1 t1;
"""
pydb.create_temp_table(create_respondent_2,'respondent_2')


create_respondent_3 = f"""
SELECT Distinct t1.Case_Number,
Count(t1.Party) as CountOfParty,
SUM(t1.Rep_Ind) as SumOfRep_IND

FROM __temp__.respondent_2 t1
Group by Case_Number;
"""
pydb.create_temp_table(create_respondent_3,'respondent_3')


create_respondent_representation = f"""
CREATE TABLE IF NOT EXISTS fcsq.respondent_representation
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/respondent_representation') AS
SELECT t1.Case_Number,
t1.CountOfParty,
t1.SumOfRep_IND,
case when t1.SumOfRep_Ind > t1.CountOfParty then 'ERROR'
when t1.SumOfRep_Ind = t1.CountOfParty then 'ALL'
when t1.SumOfRep_Ind = 0 then 'NONE'
else 'SOME'
end as Res_Rep_Cat

FROM __temp__.respondent_3 t1;
"""

pydb.start_query_execution_and_wait(create_respondent_representation);

#### Respondent_representation validation

In [None]:
#respondent_count = pydb.read_sql_query("select count(*) as count from fcsq.respondent_representation")
#respondent_count

## 26. DV_disposals_final table - joins the dv_apps_and_orders_match table and the representation tables, joining timeliness and legal representation data for a case together
<a name="DV_disposals_final"></a>

In [None]:
'''
Problem in fcsq.dv_apps_and_orders_match
'''

create_dom_viol_app_and_orders_with_rep = f"""
SELECT t1.*, 
Case when t1.Month< 4 then 1
When t1.Month< 7 then 2
when t1.Month< 10 then 3
else 4
End AS Quarter,
t2.APP_REP_CAT, 
t3.RES_REP_CAT
FROM fcsq.dv_apps_and_orders_match AS t1 
LEFT JOIN fcsq.applicant_representation AS t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER) 
LEFT JOIN fcsq.respondent_representation as t3 ON t1.CASE_NUMBER = t3.CASE_NUMBER;
"""
pydb.create_temp_table(create_dom_viol_app_and_orders_with_rep,'dom_viol_app_and_orders_with_rep')

df = pydb.read_sql_query(create_dom_viol_app_and_orders_with_rep)
df.to_csv(path_or_buf = 's3://alpha-family-data/CSVs/Domestic_Violence/dv_disposals_final.csv',index=False)

In [None]:
df = pydb.read_sql_query("select * from __temp__.dom_viol_app_and_orders_with_rep")
df

In [None]:
create_dv_disposals_final = f"""
SELECT t1.*,
t2.Region_Pre2014, 
t2.Region,
cast(t1.Year as varchar(3)) || '-Q' || cast(t1.quarter as varchar(3)) AS Quarter2, 
case when t1.APP_REP_CAT Is Null Or t1.RES_REP_CAT Is Null  then '5 Unknown'
    when t1.APP_REP_CAT = 'NONE' and t1.RES_REP_CAT = 'NONE' then '4 Neither'
    when t1.APP_REP_CAT = 'NONE' and t1.RES_REP_CAT != 'NONE' then '3 Respondent Only'
    when t1.APP_REP_CAT != 'NONE' and t1.RES_REP_CAT = 'NONE' then '2 Applicant Only'
Else '1 Both'
End AS REP_CAT,
Case when t1.YEAR < 2014 then t2.Region_Pre2014
Else t2.Region
End As Final_Region

FROM __temp__.dom_viol_app_and_orders_with_rep AS t1 LEFT JOIN fcsq.COURT_MV_FEB21_DFJ AS t2 
ON t1.DSP_COURT = cast(t2.Code as varchar(3));
"""
pydb.create_temp_table(create_dv_disposals_final,'dv_disposals_final')

In [None]:
df = pydb.read_sql_query("select * from __temp__.dv_disposals_final")
df

In [None]:
df = pydb.read_sql_query("select * from __temp__.dv_disposals_final where (year = 2014 and quarter = 1)")
df

In [None]:
df.to_csv(path_or_buf = 's3://alpha-family-data/CSVs/Domestic_Violence/dv timeliness check (2014 Q1).csv',index=False)

## 27. DV_Quarterly table - groups the disposals_final table by quarter, providing a total and mean wait_weeks per quarter
<a name="DV_Quarterly"></a>

### Drop the dv_quarter table if it already exists and remove its data from the S3 bucket

In [None]:
drop_dv_quarterly = "DROP TABLE IF EXISTS fcsq.dv_quarterly"
pydb.start_query_execution_and_wait(drop_dv_quarterly)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/dv_quarterly").delete();

### Create the dv_quarter table in Athena

In [None]:
create_dv_quarterly = f"""
CREATE TABLE IF NOT EXISTS fcsq.dv_quarterly
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/dv_quarterly') AS
SELECT DISTINCT
    'Domestic Violence' AS type,
    year,
    'Q' || cast(quarter as varchar(3)) AS quarter,
    rep_cat,
    COUNT(*) AS n,
    avg(wait_weeks) AS mean
FROM 
    __temp__.dv_disposals_final
WHERE year > 2010 
    AND quarter2 <> '2022-Q3'
GROUP BY
    year,
    quarter,
    rep_cat


UNION ALL
SELECT DISTINCT
    'Domestic Violence' AS type,
    year,
    'Q' || cast(quarter as varchar(3)) AS quarter,
    'All' AS rep_cat,
    COUNT(*) AS n,
    avg(wait_weeks) AS mean
FROM 
    __temp__.dv_disposals_final
WHERE year > 2010 
    AND quarter2 <> '2022-Q3'
GROUP BY
    year,
    quarter;
"""

pydb.start_query_execution_and_wait(create_dv_quarterly);

#### DV_quarterly validation

In [None]:
#df = pydb.read_sql_query("select count(*) as count from fcsq.dv_quarterly")
#df

## 28. DV_Annual table - groups the disposals_final table annually, providing a total and mean wait_weeks annually
<a name="DV_Annual"></a>

### Drop the dv_annual table if it already exists and remove its data from the S3 bucket

In [None]:
drop_dv_annually = "DROP TABLE IF EXISTS fcsq.dv_annually"
pydb.start_query_execution_and_wait(drop_dv_annually)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/dv_annually").delete();

### Create the dv_annual table in Athena

In [None]:
create_dv_annually = f"""
CREATE TABLE IF NOT EXISTS fcsq.dv_annually
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/dv_annually') AS
SELECT DISTINCT
    'Domestic Violence' AS type,
    year,
    'N/A' AS quarter,
    rep_cat,
    COUNT(*) AS n,
    avg(wait_weeks) AS mean
FROM 
    __temp__.dv_disposals_final
WHERE year > 2010
    AND year < {latest_year}
GROUP BY
    year,
    rep_cat

UNION ALL
SELECT DISTINCT
    'Domestic Violence' AS type,
    year,
    'N/A' AS quarter,
    'All' AS rep_cat,
    COUNT(*) AS n,
    avg(wait_weeks) AS mean
FROM 
    __temp__.dv_disposals_final
WHERE year > 2010
    AND year < {latest_year}
GROUP BY
    year;
"""

pydb.start_query_execution_and_wait(create_dv_annually);

#### DV_annually validation

In [None]:
#dv_annually_count = pydb.read_sql_query("select * from fcsq.dv_annually")
#dv_annually_count

## 29. DV_timeliness table - groups the disposals_final table regionally and nationally, providing a total, mean, and median wait_weeks
<a name="DV_timeliness"></a>

### Drop the dv_timeliness table if it already exists and remove its data from the S3 bucket

In [None]:
drop_dv_timeliness = "DROP TABLE IF EXISTS fcsq.dv_timeliness"
pydb.start_query_execution_and_wait(drop_dv_timeliness)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/dv_timeliness").delete();

### Create the dv_timeliness table in Athena

In [None]:
create_regional_median1 = f"""
SELECT *, 
    /* Finding the middle */
    NTILE(2) OVER (PARTITION BY Final_Region, Rep_Cat, Quarter2 ORDER BY wait_weeks) as tile2            /* Distributes rows into 2 roughly equal groups, in the column called 'tile2' the first group is given the value 1 and the other is given the value 2 */
FROM __temp__.dv_disposals_final
WHERE year > 2010
"""

create_dv_time_regional = f"""
SELECT  Final_Region,
        Rep_Cat,
        Quarter2,
        count(*) as Number_of_Cases,
        /* Mean */
        ROUND(avg(wait_weeks), 1) as Mean_duration, 
        /* Median */
        (CASE WHEN COUNT(tile2) % 2 = 0                                                                                                  
            THEN (MAX(CASE WHEN tile2 = 1 THEN wait_weeks END) + MIN(CASE WHEN tile2 = 2 THEN wait_weeks END)) / 2.0                     /* If the count is even the maximum value from group 1 is added to the minimum value from group 2 */
        ELSE MAX(CASE WHEN tile2 = 1 THEN wait_weeks END)                                                                            /* If the count is odd the maximum value from group 1 is used to represent the median */
        END) as median
FROM __temp__.regional_median1

GROUP BY Final_Region,
Quarter2, Rep_Cat

ORDER BY Final_Region,
Quarter2, Rep_Cat
"""
pydb.create_temp_table(create_dv_time_regional,'dv_time_regional')

#dv_time_regional = pydb.read_sql_query("select * from __temp__.dv_time_regional")
#dv_time_regional

In [None]:
create_national_median1 = f"""
SELECT *, 
    /* Finding the middle */
    NTILE(2) OVER (PARTITION BY Rep_Cat, Quarter2 ORDER BY wait_weeks) as tile2            /* Distributes rows into 2 roughly equal groups, in the column called 'tile2' the first group is given the value 1 and the other is given the value 2 */
FROM __temp__.dv_disposals_final
WHERE year > 2010
"""

create_dv_time_national = f"""
SELECT  Rep_Cat,
        Quarter2,
        count(*) as Number_of_Cases,
        /* Mean */
        ROUND(avg(wait_weeks), 1) as Mean_duration, 
        /* Median */
        (CASE WHEN COUNT(tile2) % 2 = 0                                                                                                  
            THEN (MAX(CASE WHEN tile2 = 1 THEN wait_weeks END) + MIN(CASE WHEN tile2 = 2 THEN wait_weeks END)) / 2.0                     /* If the count is even the maximum value from group 1 is added to the minimum value from group 2 */
        ELSE MAX(CASE WHEN tile2 = 1 THEN wait_weeks END)                                                                            /* If the count is odd the maximum value from group 1 is used to represent the median */
        END) as median
FROM __temp__.national_median1

GROUP BY Quarter2, Rep_Cat

ORDER BY Quarter2, Rep_Cat
"""
pydb.create_temp_table(create_dv_time_national,'dv_time_national')

#dv_time_national = pydb.read_sql_query("select * from __temp__.dv_time_national")
#dv_time_national

In [None]:
create_dv_timeliness = f"""
CREATE TABLE IF NOT EXISTS fcsq.dv_timeliness
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/dv_timeliness') AS
select 'Domestic Violence' AS Case_Type,
Final_Region as Region, Rep_Cat as Representation, Quarter2 as Quarter,
Number_of_Cases, 
Mean_duration,
ROUND(Median, 1) AS Median_duration
from __temp__.dv_time_regional

UNION ALL
select 'Domestic Violence' AS Case_Type,
'England & Wales' as Region, Rep_Cat as Representation, Quarter2 as Quarter,
Number_of_Cases,
Mean_duration,
ROUND(Median, 1) AS Median_duration
from __temp__.dv_time_national
"""


pydb.start_query_execution_and_wait(create_dv_timeliness);

In [None]:
timeliness_output =f"""
SELECT *
FROM fcsq.dv_timeliness
WHERE NOT (quarter = '{latest_year}-Q{latest_quarter}')
ORDER BY region, representation, quarter;
"""

df = pydb.read_sql_query(timeliness_output)
df.to_csv(path_or_buf = 's3://alpha-family-data/CSVs/Domestic_Violence/dv_csv_timeliness.csv',index=False)

## 30. DV_timeliness_combined table - combines and orders the annual and quarterly timeliness data 
<a name="DV_timeliness_combined"></a>

### Drop the dv_timeliness_combined table if it already exists and remove its data from the S3 bucket

In [None]:
drop_dv_timeliness_combined = "DROP TABLE IF EXISTS fcsq.dv_timeliness_combined"
pydb.start_query_execution_and_wait(drop_dv_timeliness_combined)
bucket.objects.filter(Prefix="fcsq_processing/Domestic_Violence/dv_timeliness_combined").delete();

### Create the dv_timeliness_combined table in Athena

In [None]:
create_dv_timeliness_combined = f"""
CREATE TABLE IF NOT EXISTS fcsq.dv_timeliness_combined
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Domestic_Violence/dv_timeliness_combined') AS
SELECT * FROM fcsq.dv_annually 
UNION ALL 
SELECT * FROM fcsq.dv_quarterly ORDER BY type,year,quarter,rep_cat
"""

pydb.start_query_execution_and_wait(create_dv_timeliness_combined);

#### dv_timeliness_combined validation

In [None]:
#dv_timeliness_combined_count = pydb.read_sql_query("select count(*) as count from fcsq.dv_timeliness_combined")
#dv_timeliness_combined_count

In [None]:
#pydb.read_sql_query("select * from fcsq.dv_timeliness_combined")

In [None]:
lookup_table = f"""
SELECT 
CASE when quarter = 'N/A' THEN 
type || '|' ||cast(year as varchar(20))||'|'
ELSE
type || '|' ||cast(year as varchar(20))||'|' || quarter end as lookup,
rep_cat,
n,
mean
from fcsq.dv_timeliness_combined;
"""

pydb.create_temp_table(lookup_table, 'lookup_table')

In [None]:
#df = pydb.read_sql_query("SELECT CASE when quarter = 'N/A' THEN type || '|' ||cast(year as varchar(20))||'|' ELSE type || '|' ||cast(year as varchar(20))||'|' || quarter || '|' end as lookup, rep_cat, n, mean from fcsq.dv_timeliness_combined")
#df.head()

In [None]:
n_count_table = f"""
SELECT lookup, rep_cat, n
FROM __temp__.lookup_table 
ORDER BY lookup;
"""

pydb.create_temp_table(n_count_table, 'n_count_table')

In [None]:
#df = pydb.read_sql_query("SELECT * FROM __temp__.n_count_table order by lookup;")
#df.head()

In [None]:
n_count_table_final = f"""
SELECT
lookup,
map_n['1 Both'] as "1_Bothn",
map_n['2 Applicant Only'] as "2_Applicant_Onlyn",
map_n['3 Respondent Only'] as "3_Respondent_Onlyn",
map_n['4 Neither'] as "4_Neithern",
map_n['5 Unknown'] as "5_Unknown",
map_n['All'] as "Alln"
from (
SELECT 
lookup,
map_agg(
rep_cat,
n
) map_n
FROM __temp__.n_count_table
group by lookup) 
order by lookup;
"""

pydb.create_temp_table(n_count_table_final,'n_count_table_final')

In [None]:
#df = pydb.read_sql_query("SELECT * FROM __temp__.n_count_table_final order by lookup;")
#df.head()

In [None]:
mean_table = f"""
SELECT lookup, rep_cat, mean
FROM __temp__.lookup_table 
ORDER BY lookup;
"""

pydb.create_temp_table(mean_table, 'mean_table')

In [None]:
#df = pydb.read_sql_query("SELECT * FROM __temp__.mean_table order by lookup;")
#df.head()

In [None]:
mean_table_final = f"""
SELECT
lookup,
map_mean['1 Both'] as "1_Both_mean",
map_mean['2 Applicant Only'] as "2_Applicant_Only_mean",
map_mean['3 Respondent Only'] as "3_Respondent_Only_mean",
map_mean['4 Neither'] as "4_Neither_mean",
map_mean['5 Unknown'] as "5_Unknown_mean",
map_mean['All'] as "All_mean"
from (
SELECT 
lookup,
map_agg(
rep_cat,
mean
) map_mean
FROM __temp__.mean_table
group by lookup) 
order by lookup;
"""

pydb.create_temp_table(mean_table_final,'mean_table_final')

In [None]:
csv_output_table ="""
SELECT 
t1.lookup,
t1."1_Bothn",
t2."1_Both_mean",
t1."2_Applicant_Onlyn",
t2."2_Applicant_Only_mean",
t1."3_Respondent_Onlyn",
t2."3_Respondent_Only_mean",
t1."4_Neithern",
t2."4_Neither_mean",
t1."5_Unknown",
t2."5_Unknown_mean",
t1."Alln",
t2."All_mean"
from 
__temp__.n_count_table_final t1
INNER JOIN
__temp__.mean_table_final t2
on (t1.lookup = t2.lookup);
"""


df = pydb.read_sql_query(csv_output_table)
df.to_csv(path_or_buf = 's3://alpha-family-data/CSVs/Domestic_Violence/dv_t10_timeliness.csv',index=False)

In [None]:
#view_table = pydb.read_sql_query("SELECT * FROM fcsq.dv_timeliness_combined")
#view_table.pivot_table(index=['type','year','quarter'],columns=['rep_cat'],values = ['n','mean'],aggfunc=sum, fill_value=0).swaplevel(axis=1).sort_index(axis=1)