# Adopt Timeliness

## Contents
#### Setup
1. [Import packages and options](#import_packages)
2. [Define key variables](#define_key_variables)

3. [Adopt_applications_data_sorted](#Adopt_applications_data_sorted) - Orders adopt_application_5 by case_number and application date
4. [Adopt_applications_temp](#Adopt_applications_temp) - takes the adopt_applications_data sorted table and filters it so it only has the first application record per case number
5. [adopt_orders_data_sorted](#adopt_orders_data_sorted) - Sorts the disposals table by case_number and receipt date, removing contact orders, placement revoke or vary orders and other type orders
6. [adopt_orders_temp](#adopt_orders_temp) - takes the adopt_orders_data_sorted table and filters it so it only has the first disposal record per case number
7. [adopt_apps_and_orders_match](#adopt_apps_and_orders_match) - calculates the wait_weeks from the disposal date and the application date


8. [Adopt_case_data_temp](#Adopt_case_data_temp) - filters by year>2010 and by first application date
9. [Applicant_Info](#Applicant_Info) - joins the roles, parties and address tables for applicants to get information on the applicants
10. [adopt_respondent_info](#adopt_respondent_info) - joins the roles, parties and address tables for respondents to get information on the respondents
11. [applicants_3](#applicants_3) takes the adopt_applicant_info table and reformats it, decoding gender and representative role values into strings

12. [Adopt_respondents_temp](#Adopt_respondents_temp) - takes the adopt_respondent_info table and reformats it, decoding gender and representative role values into strings
13. [Adopt_app_rep](#Adopt_app_rep) - joins the case data with the applicant representation data
14. [Adopt_resp_rep](#Adopt_resp_rep) - joins the case data with the respondent representation data
15. [Adopt_Hearing_Events](#Adopt_Hearing_Events) - takes the adoption hearing events from the hearings table and joins it with the events table data 
16. [Adopt_Hearings_Cases](#Adopt_Hearings_Cases) - takes the hearing events, filters by adoption event codes and adds a flag whether the hearing is the first in the case
17. [Hearing_Adopt_Applicants](#Hearing_Adopt_Applicants) - joins the cases from applicant representation table to the first hearing for the case in the adopt hearings cases table

18. [Hearing_Adopt_Respondents](#Hearing_Adopt_Respondents) - joins the cases from respondents representation table to the first hearing for the case in the adopt hearings cases table 
19. [Adopt_App](#Adopt_App) - Groups the hearing_adopt_applicants table and produces a count per group
20. [Adopt_Resp](#Adopt_Resp) - Groups the hearing_adopt_respondents table and produces a count per group
21. [adopt_case](#adopt_case) - groups and formats adopt_case_data_v3 table and gives a count for each group
22. [Adopt_Case_Hearings](#Adopt_Case_Hearings) - creates a count of all the cases with a hearing per quarter
23. [Adoption](#Adoption) - Joins the applicant/respondent representation count tables, case count table, and case hearing count tables 
24. [Applicant_representation](#Applicant_representation) - creates a table showing whether all, some, or none of the applicants for a case have representation
25. [Respondent_representation](#Respondent_representation) - creates a table showing whether all, some, or none of the respondents for a case have representation
26. [Adopt_Disposals_Final](#Adopt_Disposals_Final) - joins the adopt_apps_and_orders_match table and the representation tables, joining timeliness and legal representation data for a case together
27. [Adopt_Quarterly](#Adopt_Quarterly) - groups the disposals_final table by quarter, providing a total and mean wait_weeks per quarter
28. [Adopt_Annual](#Adopt_Annual) - groups the disposals_final table annually, providing a total and mean wait_weeks annually 
29. [Adopt_timeliness_combined](#Adopt_timeliness_combined) - combines and orders the annual and quarterly timeliness data


## 1. Import packages and set options 
<a name="import_packages"></a>

In [1]:
import pandas as pd  # a module which provides the data structures and functions to store and manipulate tables in dataframes
import pydbtools as pydb  # A module which allows SQL queries to be run on the Analytical Platform from Python, see https://github.com/moj-analytical-services/pydbtools
import boto3  # allows you to directly create, update, and delete AWS resources from Python scripts

# sets parameters to view dataframes for tables easier
pd.set_option("display.max_columns", 100)
pd.set_option("display.width", 900)
pd.set_option("display.max_colwidth", 200)


## 2. Define key variables to be used throughout the notebook 
<a name="define_key_variables"></a>

In [2]:
#this is the database we will be extracting from
database = "familyman_dev_v2"

#this is the snapshot date of familyman we will be extracting from
snapshot_date = "2022-05-23"
#snapshot_date = "2021-08-19"
#this is the athena database we will be storing our tables in
fcsq_database = "fcsq"

#this is the s3 bucket we will be saving data to
s3 = boto3.resource("s3")
bucket = s3.Bucket("alpha-family-data")

In [3]:
database = "familyman_dev_v3" 
snapshot_date = "2022-06-24" #snapshot for new database

In [4]:
"""
CREATE EXTERNAL TABLE IF NOT EXISTS `fcsq`.`COURT_MV_FEB21_DFJ` (
  `CODE` int,
  `COURT` string,
  `NAME` string,
  `Court_Type` string,
  `DFJ_Area` string,
  `Region` string,
  `Court_pre_2014` string,
  `Region_Pre2014` string,
  `DFJ_New` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://alpha-family-data/fcsq_processing/lookups/COURT_MV_FEB21_DFJ/'
TBLPROPERTIES ('has_encrypted_data'='false');
"""

"\nCREATE EXTERNAL TABLE IF NOT EXISTS `fcsq`.`COURT_MV_FEB21_DFJ` (\n  `CODE` int,\n  `COURT` string,\n  `NAME` string,\n  `Court_Type` string,\n  `DFJ_Area` string,\n  `Region` string,\n  `Court_pre_2014` string,\n  `Region_Pre2014` string,\n  `DFJ_New` string\n)\nROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' \nWITH SERDEPROPERTIES (\n  'serialization.format' = ',',\n  'field.delim' = ','\n) LOCATION 's3://alpha-family-data/fcsq_processing/lookups/COURT_MV_FEB21_DFJ/'\nTBLPROPERTIES ('has_encrypted_data'='false');\n"

## 3. Adopt_applications_data_sorted table - Orders adopt_application_5 by case_number and application date
<a name="Adopt_applications_data_sorted"></a>

### Drop the adopt_applications_data_sorted table if it already exists and remove its data from the S3 bucket

In [5]:
drop_adopt_applications_data_sorted = f"""
DROP TABLE IF EXISTS fcsq.adopt_applications_data_sorted;
"""
pydb.start_query_execution_and_wait(drop_adopt_applications_data_sorted)

# clean up previous adopt_applications_data_sorted files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/adopt_applications_data_sorted/").delete();

### Create the adopt_applications_data_sorted table in Athena

In [6]:
create_adopt_applications_data_sorted = f"""
CREATE TABLE IF NOT EXISTS fcsq.adopt_applications_data_sorted
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/adopt_applications_data_sorted') AS
SELECT * FROM
fcsq.adopt_application_5
ORDER BY 
CASE_NUMBER, APP_DATE
"""

pydb.start_query_execution_and_wait(create_adopt_applications_data_sorted);



#### adopt_applications_data_sorted validation

In [7]:
adopt_applications_data_sorted_count = pydb.read_sql_query("select count(*) as count from fcsq.adopt_applications_data_sorted")
adopt_applications_data_sorted_count


Unnamed: 0,count
0,182095


## 4. Create adopt_applications temporary tables - takes the adopt_applications_data sorted table and filters it so it only has the first application record per case number
<a name="Adopt_applications_temp"></a>

In [8]:
create_adopt_applications_1 = f"""
SELECT *, row_number() over (order by CASE_NUMBER, APP_DATE) as SEQ_NUM
FROM fcsq.adopt_applications_data_sorted
"""
pydb.create_temp_table(create_adopt_applications_1,'adopt_applications_1')

create_adopt_applications_2 = f"""
SELECT DISTINCT case_number, app_type, min(seq_num) as min_of_seq_num
FROM __temp__.adopt_applications_1 GROUP BY case_number, app_type
"""
pydb.create_temp_table(create_adopt_applications_2,'adopt_applications_2')

create_adopt_applications_3 = f"""
SELECT
    t1.case_number,
    t2.App_date,
    t2.year,
    t2.quarter,
    t2.court,
    t2.app_type,
    t2.Case_app_type, 
    t2.Adoption, 
    t2.Contested, 
    t2.Standard, 
    t2.Convention, 
    t2.Foreign, 
    t2.Placement, 
    t2.Placement_revoke_or_vary, 
    t2.Contact_s26, 
    t2.Contact_s26_revoke_or_vary, 
    t2.Change_surname, 
    t2.Remove_child_from_UK, 
    t2.Other_order_type, 
    t2.Adoption_Cases, 
    t2.Non_Adoption_Cases
FROM __temp__.ADOPT_APPLICATIONS_2 t1 LEFT JOIN
__temp__.ADOPT_APPLICATIONS_1 t2 ON (t1.MIN_of_seq_num = t2.Seq_Num)
"""
pydb.create_temp_table(create_adopt_applications_3,'adopt_applications_3')

## 5. adopt_orders_data_sorted table - Sorts the disposals table by case_number and receipt date, removing contact orders, placement revoke or vary orders and other type orders
<a name="adopt_orders_data_sorted"></a>

### Drop the adopt_orders_data_sorted table if it already exists and remove its data from the S3 bucket

In [9]:
drop_adopt_orders_data_sorted = f"""
DROP TABLE IF EXISTS fcsq.adopt_orders_data_sorted;
"""
pydb.start_query_execution_and_wait(drop_adopt_orders_data_sorted)

# clean up previous adopt_orders_data_sorted files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/adopt_orders_data_sorted/").delete();

### Create the adopt_orders_data_sorted table in Athena

In [10]:
create_adopt_orders_data_sorted = f"""
CREATE TABLE IF NOT EXISTS fcsq.adopt_orders_data_sorted
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/adopt_orders_data_sorted') AS
SELECT * 
FROM fcsq.adopt_disposals5 t1
    WHERE t1.Type != 'Contact_s26' AND t1.Type != 'Contact_s26_revoke_or_vary' AND t1.Type != 
        'Placement_revoke_or_vary' AND t1.Type != 'Other_order_type'
    ORDER BY t1.CASE_NUMBER, t1.Receipt_date;
"""
pydb.start_query_execution_and_wait(create_adopt_orders_data_sorted);

In [11]:
pydb.read_sql_query("SELECT distinct type from fcsq.adopt_orders_data_sorted")

Unnamed: 0,type
0,Standard
1,Change_surname
2,Placement
3,Remove_child_from_UK
4,Convention
5,?Foreign?
6,


In [12]:
pydb.read_sql_query("SELECT distinct type from fcsq.adopt_orders_data_sorted")

Unnamed: 0,type
0,Convention
1,Standard
2,?Foreign?
3,
4,Placement
5,Change_surname
6,Remove_child_from_UK


In [15]:
pydb.read_sql_query("SELECT count(*) as count from __temp__.non_main")

Unnamed: 0,count
0,64821


In [16]:
 pydb.read_sql_query("SELECT count(distinct __temp__.main.case_number) as count from __temp__.main inner join __temp__.non_main on __temp__.main.case_number=__temp__.non_main.case_number")

Unnamed: 0,count
0,24


In [17]:
pydb.read_sql_query("SELECT count(*) as count from fcsq.adopt_orders_data_sorted")

Unnamed: 0,count
0,156733


In [18]:
non_adoption =f"""
SELECT * FROM fcsq.adopt_orders_data_sorted where adoption='Non-adoption'
"""

pydb.create_temp_table(non_adoption,'non_adopt_orders')

adoption =f"""
SELECT * FROM fcsq.adopt_orders_data_sorted where adoption='Adoption'
"""

pydb.create_temp_table(adoption,'adopt_orders')

test = f"""
SELECT t1.year,count(*) as count
FROM __temp__.adopt_orders t1
inner join 
__temp__.non_adopt_orders t2
ON t1.case_number = t2.case_number AND t1.receipt_date=t2.receipt_date
group by t1.year
having t1.year>2010
order by t1.year;
"""
pydb.read_sql_query(test)

#pydb.read_sql_query("SELECT * FROM __temp__.non_adopt_orders")

Unnamed: 0,year,count
0,2011,2
1,2012,3
2,2013,2
3,2016,2
4,2018,1
5,2020,3
6,2021,1


#### adopt_orders_data_sorted validation

In [19]:
adopt_orders_data_sorted_count = pydb.read_sql_query("select count(*) as count from fcsq.adopt_orders_data_sorted")
adopt_orders_data_sorted_count

Unnamed: 0,count
0,156733


In [20]:
#pydb.read_sql_query("SELECT * FROM fcsq.adopt_orders_data_sorted")
pydb.read_sql_query("SELECT * FROM fcsq.adopt_orders_data_sorted where case_number='LV20Z04103'")

Unnamed: 0,case_number,court,year,quarter,receipt_date,event_model,field_model,order_type,country_of_birth,number_applicants,adopter_type,adopter,max_rtc,max_sex,min_rtc,min_sex,child_sex,age_band,child_age,adoption,type,adopter_2
0,LV20Z04103,251,2021,3,2021-08-18,A76,,,,2,same-sex couple,same-sex couple,Adopter,1,Adopter,1,2,1-4 years,2.324435,Adoption,Standard,same-sex couple


## 6. Create adopt_orders temporary tables - takes the adopt_orders_data_sorted table and filters it so it only has the first disposal record per case number
<a name="adopt_orders_temp"></a>

In [21]:
create_adopt_orders_0 = f"""
SELECT t1.CASE_NUMBER, 
          t1.Court, 
          t1.Year, 
          t1.Quarter, 
          t2.App_type, 
          t2.App_date, 
          t1.Receipt_date AS Disp_Date, 
          t1.EVENT_MODEL, 
          t1.FIELD_MODEL, 
          t1.Order_type, 
          t1.Country_of_birth, 
          t1.Number_applicants, 
          t1.Adopter_type, 
          t1.Adopter, 
          t1.Child_sex, /*Changed this from child_sex2 as the process has changed, check to see if an error comes up*/ 
          t1.Age_band, 
          t1.Child_age, 
          t1.Adoption, 
          t1.Type, 
          t1.Adopter_2, 
          DAY(t1.Receipt_date -t2.App_date) as DATE_DIFF
      FROM fcsq.ADOPT_ORDERS_DATA_SORTED t1
           INNER JOIN __temp__.ADOPT_APPLICATIONS_3 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)
                 WHERE (DAY(t1.Receipt_date -t2.App_date)) >= 0;
"""
pydb.create_temp_table(create_adopt_orders_0,'adopt_orders_0')


create_adopt_orders_1 = f"""
   SELECT t1.CASE_NUMBER, 
          t1.Court, 
          t1.Year, 
          t1.Quarter, 
          t1.App_type, 
          t1.EVENT_MODEL AS Disp_Type, 
          t1.App_date, 
          t1.Disp_Date, 
          t1.FIELD_MODEL, 
          t1.Order_type, 
          t1.Number_applicants, 
          t1.Child_sex, /*Changed this from child_sex2 as the process has changed, check to see if an error comes up*/ 
          t1.Adoption, 
          t1.Type, 
          row_number() over (order by CASE_NUMBER,App_type,App_date,disp_date) as Seq_no
      FROM __temp__.ADOPT_ORDERS_0 t1
      ORDER BY t1.CASE_NUMBER,
               t1.App_type,
               t1.App_date;
"""

pydb.create_temp_table(create_adopt_orders_1,'adopt_orders_1')

create_adopt_orders_2 = f"""
SELECT DISTINCT CASE_NUMBER, 
          App_type, 
          App_date, 
        (MIN(Seq_No)) AS MIN_of_Seq_No
      FROM __temp__.ADOPT_ORDERS_1 t1
      GROUP BY CASE_NUMBER,
               App_type,
               App_date;
"""

pydb.create_temp_table(create_adopt_orders_2,'adopt_orders_2')

create_adopt_orders_3 = f"""
   SELECT DISTINCT t1.CASE_NUMBER, 
          t1.App_type, 
          t2.Court, 
          t1.App_date, 
          t2.Disp_Date, 
          t2.Disp_Type, 
          t2.Year, 
          t2.Quarter, 
          t2.Order_type, 
          t2.Adoption, 
          t2.Type
      FROM __temp__.ADOPT_ORDERS_2 t1
           LEFT JOIN __temp__.ADOPT_ORDERS_1 t2 ON (t1.MIN_of_Seq_No = t2.Seq_No);
"""
pydb.create_temp_table(create_adopt_orders_3,'adopt_orders_3')


In [22]:
pydb.read_sql_query("SELECT * FROM __temp__.adopt_orders_1 where case_number='MB20Z01235'")

Unnamed: 0,case_number,court,year,quarter,app_type,disp_type,app_date,disp_date,field_model,order_type,number_applicants,child_sex,adoption,type,seq_no
0,MB20Z01235,270,2020,4,AO,A76,2020-07-25,2020-10-06,,,2,1,Adoption,Standard,81517
1,MB20Z01235,270,2020,4,AO,A73,2020-07-25,2020-10-06,A73_1,OFC,2,1,Non-adoption,,81518


In [23]:
pydb.read_sql_query("SELECT * FROM __temp__.adopt_orders_1 where case_number='MB20Z01235'")

Unnamed: 0,case_number,court,year,quarter,app_type,disp_type,app_date,disp_date,field_model,order_type,number_applicants,child_sex,adoption,type,seq_no
0,MB20Z01235,270,2020,4,AO,A76,2020-07-25,2020-10-06,,,2,1,Adoption,Standard,81517
1,MB20Z01235,270,2020,4,AO,A73,2020-07-25,2020-10-06,A73_1,OFC,2,1,Non-adoption,,81518


In [24]:
pydb.read_sql_query("SELECT * FROM __temp__.adopt_orders_1 where case_number='LV20Z04103'")

Unnamed: 0,case_number,court,year,quarter,app_type,disp_type,app_date,disp_date,field_model,order_type,number_applicants,child_sex,adoption,type,seq_no
0,LV20Z04103,251,2021,3,AO,A76,2020-11-25,2021-08-18,,,2,2,Adoption,Standard,69121


In [25]:
test = f"""
   SELECT DISTINCT t1.CASE_NUMBER,
          t1.MIN_of_Seq_No,
          t1.App_type, 
          t2.Court, 
          t1.App_date, 
          t2.Disp_Date, 
          t2.Disp_Type, 
          t2.Year, 
          t2.Quarter, 
          t2.Order_type, 
          t2.Adoption, 
          t2.Type
      FROM __temp__.ADOPT_ORDERS_2 t1
           LEFT JOIN __temp__.ADOPT_ORDERS_1 t2 ON (t1.MIN_of_Seq_No = t2.Seq_No)
    where t1.case_number='NN07Z03357';
"""
pydb.read_sql_query(test)

Unnamed: 0,case_number,min_of_seq_no,app_type,court,app_date,disp_date,disp_type,year,quarter,order_type,adoption,type
0,NN07Z03357,97119,AO,282,2007-05-30,2007-12-21,A76,2007,4,,Adoption,Standard


## 7. adopt_apps_and_orders_match table - calculates the wait_weeks from the disposal date and the application date
<a name="adopt_apps_and_orders_match"></a>

### Drop the adopt_apps_and_orders_match table if it already exists and remove its data from the S3 bucket

In [26]:
drop_adopt_apps_and_orders_match = f"""
DROP TABLE IF EXISTS fcsq.adopt_apps_and_orders_match;
"""
pydb.start_query_execution_and_wait(drop_adopt_apps_and_orders_match)

# clean up previous adopt_apps_and_orders_match files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/ADOPT_APPS_AND_ORDERS_MATCH/").delete();

### Create the adopt_apps_and_orders_match table in Athena

In [27]:
create_adopt_apps_and_orders_match =f"""
CREATE TABLE IF NOT EXISTS fcsq.ADOPT_APPS_AND_ORDERS_MATCH
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/ADOPT_APPS_AND_ORDERS_MATCH') AS
   SELECT t2.CASE_NUMBER, 
          t2.App_type, 
          t2.Court, 
          t2.App_date, 
          t2.Disp_Date, 
          /* Wait_weeks */
            (CAST (DAY(t2.Disp_Date-t2.App_date) as double)/7) AS Wait_weeks, 
          t2.Disp_Type, 
          t2.Year, 
          t2.Quarter, 
          t2.Order_type, 
          t2.Adoption, 
          t2.Type, 
          /* DSP_COURT */
          t2.Court AS DSP_COURT
      FROM __temp__.ADOPT_ORDERS_3 t2;


"""
pydb.start_query_execution_and_wait(create_adopt_apps_and_orders_match);

#### ADOPT_APPS_AND_ORDERS_MATCH validation

In [28]:
adopt_apps_and_orders_match_count = pydb.read_sql_query("select count(*) as count from fcsq.ADOPT_APPS_AND_ORDERS_MATCH")
adopt_apps_and_orders_match_count

Unnamed: 0,count
0,153991


In [29]:
pydb.read_sql_query("select * from fcsq.adopt_apps_and_orders_match where case_number='LV20Z04103'")

Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court
0,LV20Z04103,AO,251,2020-11-25,2021-08-18,38.0,A76,2021,3,,Adoption,Standard,251


## 8. Adopt_case_data temporary tables - filters by year>2010 and by first application date
<a name="Adopt_case_data_temp"></a>

In [30]:
create_adopt_case_data_v1 = f"""
SELECT T1.YEAR,
            T1.QUARTER,
            T1.COURT,
            T1.CASE_NUMBER,
            T1.APP_TYPE,
            T1.CASE_APP_TYPE,
            T1.ADOPTION,
            T1.HIGH_COURT,
            T1.CONTESTED,
            T1.NUMBER_APPLICANTS,
            T1.ADOPTER_TYPE,
            date_format(T1.APP_DATE,'%d-%m-%Y') AS APP_DATE2
    FROM fcsq.adopt_apps_6_adoptions_only AS t1
    ORDER BY case_number, app_date2, court;
    
"""

create_adopt_case_data_v2 = f"""
SELECT *,(case when row_number() over (partition by case_number order by 
        APP_DATE2) = 1 then 1 else 0 end) as case_number_id
FROM __temp__.adopt_case_data_v1
"""

create_adopt_case_data_v3 = f"""
SELECT *
FROM __temp__.adopt_case_data_v2
where case_number_id = 1 and year > 2010;
"""
pydb.create_temp_table(create_adopt_case_data_v1,'adopt_case_data_v1')

pydb.create_temp_table(create_adopt_case_data_v2,'adopt_case_data_v2')
pydb.create_temp_table(create_adopt_case_data_v3,'adopt_case_data_v3')





In [31]:
adopt_case_data_v3_check = "SELECT * from __temp__.adopt_case_data_v3 where case_number='WX20Z00011'"
pydb.read_sql_query(adopt_case_data_v3_check)

Unnamed: 0,year,quarter,court,case_number,app_type,case_app_type,adoption,high_court,contested,number_applicants,adopter_type,app_date2,case_number_id
0,2020,1,384,WX20Z00011,AO,AO,Adoption,N,N,1,single,09-03-2020,1


## 9. Applicant_Info table - joins the roles, parties and address tables for applicants to get information on the applicants
<a name="Applicant_Info"></a>

### Drop the Applicant_Info table if it already exists and remove its data from the S3 bucket

In [32]:
drop_Adopt_Applicant_Info = f"""
DROP TABLE IF EXISTS fcsq.Adopt_Applicant_Info;
"""
pydb.start_query_execution_and_wait(drop_Adopt_Applicant_Info)

# clean up previous Adopt_Applicant_Info files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_Applicant_Info/").delete();

### Create the Applicant_Info table in Athena

In [33]:
create_Adopt_Applicant_Info = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_Applicant_Info
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_Applicant_Info') AS
 SELECT DISTINCT
   {database}.roles.ROLE, 
   {database}.roles.REPRESENTATIVE_ROLE, 
   {database}.roles.ROLE_MODEL, 
   {database}.roles.PARTY, 
   {database}.roles.CASE_NUMBER, 
   {database}.parties.PERSON_GIVEN_FIRST_NAME, 
   {database}.parties.PERSON_FAMILY_NAME, 
   {database}.parties.COMPANY, 
   {database}.addresses.POSTCODE, 
   {database}.parties.GENDER, 
   {database}.roles.DELETE_FLAG
FROM 
  ({database}.roles INNER JOIN {database}.parties ON {database}.roles.PARTY = {database}.parties.PARTY) 
  INNER JOIN {database}.addresses ON {database}.roles.ADDRESS = {database}.addresses.ADDRESS
WHERE 
    (((({database}.roles.ROLE_MODEL)= 'APLZ') AND (({database}.roles.DELETE_FLAG)= 'N')) 
    OR ((({database}.roles.ROLE_MODEL)= 'APLA') AND (({database}.roles.DELETE_FLAG)= 'N')))
    AND {database}.roles.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.parties.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.addresses.mojap_snapshot_date = date '{snapshot_date}';
"""

pydb.start_query_execution_and_wait(create_Adopt_Applicant_Info);



In [34]:
pydb.read_sql_query("SELECT * from fcsq.adopt_applicant_info where case_number='NE19Z02909'")

Unnamed: 0,role,representative_role,role_model,party,case_number,person_given_first_name,person_family_name,company,postcode,gender,delete_flag
0,10712248,,APLZ,8748181,NE19Z02909,Christopher Martyn,Urwin,,,1,N


#### Applicant_Info validation

In [35]:
Adopt_Applicant_Info_count = pydb.read_sql_query("select count(*) as count from fcsq.Adopt_Applicant_Info")
Adopt_Applicant_Info_count

Unnamed: 0,count
0,252806


In [36]:
pydb.read_sql_query("select * from fcsq.adopt_applicant_info where case_number='HB12Z00228'")

Unnamed: 0,role,representative_role,role_model,party,case_number,person_given_first_name,person_family_name,company,postcode,gender,delete_flag
0,3232792,3437788,APLZ,2735994,HB12Z00228,Beverley Paula,Pierre,,CR2 9BA,2,N
1,3232791,3437788,APLZ,2736019,HB12Z00228,Donald Garfield,Finnikin And B Pierre,,CR2 9BA,1,N


## 10. adopt_respondent_info table - joins the roles, parties and address tables for respondents to get information on the respondents
<a name="adopt_respondent_info"></a>

### Drop the adopt_respondent_info table if it already exists and remove its data from the S3 bucket

In [37]:
drop_adopt_respondent_info = f"""
DROP TABLE IF EXISTS fcsq.adopt_respondent_info;
"""
pydb.start_query_execution_and_wait(drop_adopt_respondent_info)

# clean up previous adopt_respondent_info files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/adopt_respondent_info/").delete();

### Create the adopt_respondent_info table in Athena

In [38]:
create_adopt_respondent_info = f"""
CREATE TABLE IF NOT EXISTS fcsq.adopt_respondent_info
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/adopt_respondent_info') AS
SELECT DISTINCT
  {database}.roles.ROLE, 
  {database}.roles.REPRESENTATIVE_ROLE, 
  {database}.roles.ROLE_MODEL, 
  {database}.roles.PARTY, 
  {database}.roles.CASE_NUMBER, 
  {database}.parties.GENDER, 
  {database}.addresses.POSTCODE, 
  {database}.roles.DELETE_FLAG
FROM 
  ({database}.roles INNER JOIN {database}.parties ON {database}.roles.PARTY = {database}.parties.PARTY) 
    LEFT JOIN {database}.addresses ON {database}.roles.ADDRESS = {database}.addresses.ADDRESS
WHERE 
    (((({database}.roles.ROLE_MODEL)='RSPA') AND (({database}.roles.DELETE_FLAG)='N')) 
    OR ((({database}.roles.ROLE_MODEL)='RSPZ') AND (({database}.roles.DELETE_FLAG)='N')))
    AND {database}.roles.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.parties.mojap_snapshot_date = date '{snapshot_date}'
    AND {database}.addresses.mojap_snapshot_date = date '{snapshot_date}';
"""

pydb.start_query_execution_and_wait(create_adopt_respondent_info);



#### adopt_respondent_info validation

In [39]:
adopt_respondent_info_count = pydb.read_sql_query("select count(*) as count from fcsq.adopt_respondent_info")
adopt_respondent_info_count

Unnamed: 0,count
0,298261


In [40]:
pydb.read_sql_query("select * from fcsq.adopt_respondent_info where case_number='HB12Z00228'")

Unnamed: 0,role,representative_role,role_model,party,case_number,gender,postcode,delete_flag
0,3232774,3606633.0,RSPZ,2736056,HB12Z00228,1,CR7 8JF,N
1,3232794,,RSPZ,2736020,HB12Z00228,2,CR7 8JF,N


## 11. applicants 3 table - takes the adopt_applicant_info table and reformats it, decoding gender and representative role values into strings
<a name="applicants_3"></a>

In [41]:
create_adopt_applicants_1 = f"""
SELECT T1.role,
    T1.representative_role,
    T1.role_model,
    T1.party,
    T1.case_number,
    T1.gender,
    case when cast(gender as varchar(1)) = '1' then 'Male'
    when cast(gender as varchar(1)) = '2' then 'Female'
    else 'Unknown' end as Gender_Decode

    FROM fcsq.adopt_applicant_info AS t1
    ORDER BY t1.Case_Number;
"""
#pydb.start_query_execution_and_wait(create_adopt_applicants_1)
pydb.create_temp_table(create_adopt_applicants_1,'adopt_applicants_1')



create_adopt_applicants_2 = f"""
SELECT DISTINCT 
    T1.case_number,
    T1.party,
    max(T1.representative_role) as Rep_Role,
    max(T1.gender_decode) as Gender_Max
    from __temp__.adopt_applicants_1 AS t1
    group by Case_number, party;
"""

pydb.create_temp_table(create_adopt_applicants_2,'adopt_applicants_2')
#pydb.start_query_execution_and_wait(create_adopt_applicants_2)


create_adopt_applicants_3= f"""
SELECT t1.case_number,
    t1.party as App_Party_ID,
    t1.Rep_Role,
    t1.Gender_Max,
    case when t1.Rep_Role IS NULL then 'N'
    when t1.Rep_Role IS NOT NULL then 'Y'
    End as REPRESENTATION,
    case when Rep_Role IS NULL AND Gender_Max = 'Female' then 'Unrep_Female'
    when Rep_Role IS NULL AND Gender_Max = 'Male' then 'Unrep_Male'
    when Rep_Role IS NULL AND Gender_Max = 'Unknown' then 'Unrep_Unknown'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Female' then 'Rep_Female'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Male' then 'Rep_Male'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Unknown' then 'Rep_Unknown'
    else '' end as App_Rep_Cat
    
    from __temp__.adopt_applicants_2 AS t1;


"""
#pydb.start_query_execution_and_wait(create_adopt_applicants_3)
pydb.create_temp_table(create_adopt_applicants_3,'adopt_applicants_3')


## 12. Adopt_respondents temporary tables - takes the adopt_respondent_info table and reformats it, decoding gender and representative role values into strings
<a name="Adopt_respondents_temp"></a>

In [42]:
create_adopt_respondents_1 = f"""
SELECT T1.role,
    T1.representative_role,
    T1.role_model,
    T1.party,
    T1.case_number,
    T1.gender,
    case when cast(gender as varchar(1)) = '1' then 'Male'
    when cast(gender as varchar(1)) = '2' then 'Female'
    else 'Unknown' end as Gender_Decode

    FROM fcsq.adopt_respondent_info AS t1
    ORDER BY t1.Case_Number;
"""
#pydb.start_query_execution_and_wait(create_adopt_respondents_1)
pydb.create_temp_table(create_adopt_respondents_1,'adopt_respondents_1')



create_adopt_respondents_2 = f"""
    SELECT DISTINCT T1.case_number,
        T1.party,
        max(T1.representative_role) as Rep_Role,
        max(T1.gender_decode) as Gender_Max
    from __temp__.adopt_respondents_1 AS t1
    group by Case_number, party;
"""

pydb.create_temp_table(create_adopt_respondents_2,'adopt_respondents_2')
#pydb.start_query_execution_and_wait(create_adopt_respondents_2)


create_adopt_respondents_3= f"""
SELECT t1.case_number,
    t1.party as Resp_Party_ID,
    t1.Rep_Role,
    t1.Gender_Max,
    case when t1.Rep_Role IS NULL then 'N'
    when t1.Rep_Role IS NOT NULL then 'Y'
    End as REPRESENTATION,
    case when Rep_Role IS NULL AND Gender_Max = 'Female' then 'Unrep_Female'
    when Rep_Role IS NULL AND Gender_Max = 'Male' then 'Unrep_Male'
    when Rep_Role IS NULL AND Gender_Max = 'Unknown' then 'Unrep_Unknown'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Female' then 'Rep_Female'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Male' then 'Rep_Male'
    when Rep_Role IS NOT NULL AND Gender_Max = 'Unknown' then 'Rep_Unknown'
    else '' end as Resp_Rep_Cat
    
    from __temp__.adopt_respondents_2 AS t1;


"""
#pydb.start_query_execution_and_wait(create_adopt_respondents_3)
pydb.create_temp_table(create_adopt_respondents_3,'adopt_respondents_3')

## 13. Create adopt_app_rep table - joins the case data with the applicant representation data 
<a name="adopt_app_rep"></a>

In [43]:
adopt_app_rep_final = f"""
SELECT t1.YEAR, 
    t1.QUARTER,
    t1.CASE_NUMBER, 
    t1.Court,
    t2.App_Party_ID,
    t2.Representation,
    t2.Gender_Max as App_Gender,
    t2.App_Rep_Cat          
FROM __temp__.ADOPT_CASE_DATA_v3 t1
    LEFT JOIN __temp__.ADOPT_APPLICANTS_3 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER);

"""

pydb.create_temp_table(adopt_app_rep_final,'adopt_app_rep_final')


In [44]:
pydb.read_sql_query("SELECT * FROM adopt_app_rep_final where case_number='NE19Z02909'")
#pydb.read_sql_query("SELECT * FROM adopt_case_data_v3 where case_number='HB12Z00228'")

Unnamed: 0,year,quarter,case_number,court,app_party_id,representation,app_gender,app_rep_cat
0,2019,4,NE19Z02909,278,8748181,N,Male,Unrep_Male


In [45]:
adopt_app_rep_final_check = "SELECT COUNT(*) as Count from __temp__.adopt_app_rep_final"
pydb.read_sql_query(adopt_app_rep_final_check)

Unnamed: 0,count
0,107706


## 14. Create adopt_resp_rep table - joins the case data with the respondent representation data
<a name="Adopt_resp_rep"></a>

In [46]:
adopt_resp_rep_final = f"""
   SELECT t1.YEAR, 
        t1.QUARTER,
        t1.CASE_NUMBER, 
        t1.Court,
          t2.Resp_Party_ID,
          t2.Representation,
          t2.Gender_Max as Resp_Gender,
          t2.Resp_Rep_Cat
          
      FROM __temp__.ADOPT_CASE_DATA_v3 t1
           LEFT JOIN __temp__.ADOPT_RESPONDENTS_3 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER);
"""

pydb.create_temp_table(adopt_resp_rep_final,'adopt_resp_rep_final')

In [47]:
adopt_resp_rep_final_check = "SELECT COUNT(*) as Count from __temp__.adopt_resp_rep_final"
pydb.read_sql_query(adopt_resp_rep_final_check)

Unnamed: 0,count
0,107059


In [48]:
#pydb.read_sql_query("SELECT * FROM adopt_resp_rep_final where case_number='HB12Z00228'")
pydb.read_sql_query("SELECT * FROM adopt_case_data_v3 where case_number='HB12Z00228'")

Unnamed: 0,year,quarter,court,case_number,app_type,case_app_type,adoption,high_court,contested,number_applicants,adopter_type,app_date2,case_number_id
0,2012,4,150,HB12Z00228,AO,AO,Adoption,N,N,2,mixed-sex couple,06-12-2012,1


## 15. Adopt_Hearing_Events table - takes the adoption hearing events from the hearings table and joins it with the events table data 
<a name="Adopt_Hearing_Events"></a>

### Drop the Adopt_Hearing_Events table if it already exists and remove its data from the S3 bucket

In [49]:
drop_Adopt_Hearing_Events = f"""
DROP TABLE IF EXISTS fcsq.Adopt_Hearing_Events;
"""
pydb.start_query_execution_and_wait(drop_Adopt_Hearing_Events)

# clean up previous Adopt_Hearing_Events files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_Hearing_Events/").delete();

### Create the Adopt_Hearing_Events table in Athena

In [50]:
create_Adopt_Hearing_Events = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_Hearing_Events
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_Hearing_Events') AS
SELECT {database}.hearings.EVENT,
  {database}.hearings.VACATED_FLAG,
  {database}.hearings.HEARING_TYPE,
  {database}.hearings.HEARING_DATE,
  {database}.events.RECEIPT_DATE,
  {database}.events.ERROR,
  {database}.events.CASE_NUMBER,
  {database}.events.EVENT_MODEL
FROM {database}.hearings
INNER JOIN {database}.events
ON {database}.hearings.EVENT            = {database}.events.EVENT
WHERE {database}.hearings.VACATED_FLAG IS NULL
AND {database}.events.ERROR             = 'N'
AND HEARING_DATE > date_parse('31-12-2009 00:00:00', '%d-%m-%Y %H:%i:%s')
AND (substring(case_number,5,1)='A' OR substring(case_number,5,1)='Z')
AND {database}.hearings.mojap_snapshot_date = date '{snapshot_date}' and {database}.events.mojap_snapshot_date = date '{snapshot_date}';
"""

pydb.start_query_execution_and_wait(create_Adopt_Hearing_Events);



#### Adopt_Hearing_Events validation

In [51]:
Adopt_Hearing_Events_count = pydb.read_sql_query("select count(*) as count from fcsq.Adopt_Hearing_Events")
Adopt_Hearing_Events_count

Unnamed: 0,count
0,535689


## 16. Adopt_Hearings_Cases table - takes the hearing events, filters by adoption event codes and adds a flag whether the hearing is the first in the case
<a name="Adopt_Hearings_Cases"></a>

### Drop the Adopt_Hearings_Cases table if it already exists and remove its data from the S3 bucket

In [52]:
drop_Adopt_Hearings_Cases = f"""
DROP TABLE IF EXISTS fcsq.Adopt_Hearings_Cases;
"""
pydb.start_query_execution_and_wait(drop_Adopt_Hearings_Cases)

# clean up previous Adopt_Hearings_Cases files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_Hearings_Cases/").delete();

### Create the Adopt_Hearings_Cases table in Athena

In [53]:
"""
Equivalent to Hearings_Adopt_V3
"""

create_Adopt_Hearings_Cases = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_Hearings_Cases
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_Hearings_Cases') AS
select t1.case_number,
    t1.error,
    t1.event,
    t1.event_model,
    t1.hearing_date,
    t1.hearing_type,
    t1.receipt_date,
    t1.vacated_flag,
    substring(Case_Number,5,1) AS Case_Type
    from fcsq.Adopt_Hearing_Events AS t1
    where t1.event_model in ('A8', 'A90', 'A91', 'G60')
    order by t1.case_number, t1.receipt_date;
"""

pydb.start_query_execution_and_wait(create_Adopt_Hearings_Cases)

create_Adopt_Hearings_Cases_v2 = f"""
SELECT *,
(case when row_number() over (partition by Case_Number order by receipt_date) = 1 then 1 else 0 end) as Case_Number_ID
FROM fcsq.Adopt_Hearings_Cases
"""

pydb.create_temp_table(create_Adopt_Hearings_Cases_v2,'adopt_hearings_cases_v2')



#### Adopt_Hearings_Cases validation

In [54]:
Adopt_Hearings_Cases_count = pydb.read_sql_query("select count(*) as count from __temp__.Adopt_Hearings_Cases_v2")
Adopt_Hearings_Cases_count

Unnamed: 0,count
0,340760


## 17. Hearing_Adopt_Applicants table - joins the cases from applicant representation table to the first hearing for the case in the adopt hearings cases table 
<a name="Hearing_Adopt_Applicants"></a>

### Drop the Hearing_Adopt_Applicants table if it already exists and remove its data from the S3 bucket

In [55]:
drop_Hearing_Adopt_Applicants = f"""
DROP TABLE IF EXISTS fcsq.Hearing_Adopt_Applicants;
"""
pydb.start_query_execution_and_wait(drop_Hearing_Adopt_Applicants)

# clean up previous Hearing_Adopt_Applicants files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Hearing_Adopt_Applicants/").delete();

### Create the Hearing_Adopt_Applicants table in Athena

In [56]:
create_Hearing_Adopt_Applicants = f"""
CREATE TABLE IF NOT EXISTS fcsq.Hearing_Adopt_Applicants
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Hearing_Adopt_Applicants') AS
SELECT t1.*,
t2.Case_Number_ID AS Hearing_Count
FROM __temp__.ADOPT_APP_REP_FINAL t1
LEFT JOIN __temp__.Adopt_Hearings_Cases_v2 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)
where t2.Case_Number_ID > 0;
"""

pydb.start_query_execution_and_wait(create_Hearing_Adopt_Applicants);



#### Hearing_Adopt_Applicants validation

In [57]:
Hearing_Adopt_Applicants_count = pydb.read_sql_query("select count(*) as count from __temp__.adopt_app_rep_final")
Hearing_Adopt_Applicants_count

Unnamed: 0,count
0,107706


## 18. Hearing_Adopt_Respondents table - joins the cases from respondents representation table to the first hearing for the case in the adopt hearings cases table 
<a name="Hearing_Adopt_Respondents"></a>

### Drop the Hearing_Adopt_Respondents table if it already exists and remove its data from the S3 bucket

In [58]:
drop_Hearing_Adopt_Respondents = f"""
DROP TABLE IF EXISTS fcsq.Hearing_Adopt_Respondents;
"""
pydb.start_query_execution_and_wait(drop_Hearing_Adopt_Respondents)

# clean up previous Hearing_Adopt_Respondents files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Hearing_Adopt_Respondents/").delete();

### Create the Hearing_Adopt_Respondents table in Athena

In [59]:
create_Hearing_Adopt_Respondents = f"""
CREATE TABLE IF NOT EXISTS fcsq.Hearing_Adopt_Respondents
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Hearing_Adopt_Respondents') AS
    SELECT t1.*,
    t2.Case_Number_ID AS Hearing_Count
    FROM __temp__.ADOPT_RESP_REP_FINAL t1
    LEFT JOIN __temp__.Adopt_Hearings_Cases_v2 t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)
    where t2.Case_Number_ID > 0;
"""

pydb.start_query_execution_and_wait(create_Hearing_Adopt_Respondents);



#### Hearing_Adopt_Respondents validation

In [60]:
Hearing_Adopt_Respondents_count = pydb.read_sql_query("select count(*) as count from fcsq.Hearing_Adopt_Respondents")
Hearing_Adopt_Respondents_count

Unnamed: 0,count
0,100448


## 19. Adopt_App table - Groups the hearing_adopt_applicants table and produces a count per group
<a name="Adopt_App"></a>

### Drop the Adopt_App table if it already exists and remove its data from the S3 bucket

In [61]:
drop_Adopt_App = f"""
DROP TABLE IF EXISTS fcsq.Adopt_App;
"""
pydb.start_query_execution_and_wait(drop_Adopt_App)

# clean up previous Adopt_App files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_App/").delete();

### Create the Adopt_App table in Athena

In [62]:
create_Adopt_App = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_App
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_App') AS
SELECT
  'Adoption' AS CASE_TYPE,
  Year,
  Quarter,
  'Party' AS Category,
  'Applicant' AS PARTY,
   App_Gender AS Gender,
  Representation,
  Count (*) AS Count
FROM
  fcsq.HEARING_ADOPT_APPLICANTS
WHERE 
  Representation <> '' /*A very small number of cases from 2011/12 look into whether these should be recoded as N (gender is also blank)*/
GROUP BY
  'Adoption',
  Year,
  Quarter,
  'Party',
  'Applicant',
  App_Gender,
  Representation;
"""

pydb.start_query_execution_and_wait(create_Adopt_App);



#### Adopt_App validation

In [63]:
Adopt_App_count = pydb.read_sql_query("select count(*) as count from fcsq.Adopt_App")
Adopt_App_count

Unnamed: 0,count
0,236


## 20. Adopt_resp table - Groups the hearing_adopt_respondents table and produces a count per group
<a name="Adopt_resp"></a>

### Drop the Adopt_resp table if it already exists and remove its data from the S3 bucket

In [64]:
drop_Adopt_resp = f"""
DROP TABLE IF EXISTS fcsq.Adopt_resp;
"""
pydb.start_query_execution_and_wait(drop_Adopt_resp)

# clean up previous Adopt_resp files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_resp/").delete();

### Create the Adopt_resp table in Athena

In [65]:
create_Adopt_resp = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_resp
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_resp') AS
SELECT
  'Adoption' AS CASE_TYPE,
  Year,
  Quarter,
  'Party' AS Category,
  'Respondent' AS PARTY,
  Resp_Gender AS Gender,
  Representation,
  Count (*) AS Count
FROM
  fcsq.HEARING_ADOPT_RESPONDENTS
WHERE 
  Representation <> '' /*A very small number of cases from 2011/12 look into whether these should be recoded as N (gender is also blank)*/
GROUP BY
  'Adoption',
  Year,
  Quarter,
  'Party',
  'Applicant',
  Resp_Gender,
  Representation;
"""

pydb.start_query_execution_and_wait(create_Adopt_resp);



#### Adopt_resp validation

In [66]:
Adopt_resp_count = pydb.read_sql_query("select count(*) as count from fcsq.Adopt_resp")
Adopt_resp_count

Unnamed: 0,count
0,253


## 21. adopt_case table - groups and formats adopt_case_data_v3 table and gives a count for each group
<a name="adopt_case"></a>

### Drop the adopt_case table if it already exists and remove its data from the S3 bucket

In [67]:
drop_adopt_case = f"""
DROP TABLE IF EXISTS fcsq.adopt_case;
"""
pydb.start_query_execution_and_wait(drop_adopt_case)

# clean up previous adopt_case files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_case/").delete();

### Create the adopt_case table in Athena

In [68]:
create_adopt_case = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_case
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_case') AS
SELECT *,
    Count(*) as Count FROM
    (SELECT
        'Adoption' AS CASE_TYPE,
        Year,
        Quarter,
        'Cases' AS Category,
        'N/A' AS PARTY,
        'N/A' AS Gender,
        'N/A' AS Representation
    FROM
      __temp__.adopt_case_data_v3)
GROUP BY
  CASE_TYPE,
  Year,
  Quarter,
  Category,
  PARTY,
  Gender,
  Representation;
"""

pydb.start_query_execution_and_wait(create_adopt_case);



#### adopt_case validation

In [69]:
adopt_case_count = pydb.read_sql_query("select count(*) as count from fcsq.adopt_case")
adopt_case_count

Unnamed: 0,count
0,46


## 22. Adopt_Case_Hearings table - creates a count of all the cases with a hearing per quarter
<a name="Adopt_Case_Hearings"></a>

### Drop the Adopt_Case_Hearings table if it already exists and remove its data from the S3 bucket

In [70]:
drop_Adopt_Case_Hearings = f"""
DROP TABLE IF EXISTS fcsq.Adopt_Case_Hearings;
"""
pydb.start_query_execution_and_wait(drop_Adopt_Case_Hearings)

# clean up previous Adopt_Case_Hearings files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_Case_Hearings/").delete();

### Create the Adopt_Case_Hearings table in Athena

In [71]:
create_hearing_adopt_case =f"""
SELECT DISTINCT Year, Quarter, Case_Number
FROM fcsq.HEARING_ADOPT_Applicants;
"""

pydb.create_temp_table(create_hearing_adopt_case,'hearing_adopt_case')



create_Adopt_Case_Hearings = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_Case_Hearings
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_Case_Hearings') AS
SELECT *, Count(*) as Count FROM
    (SELECT
      'Adoption' AS CASE_TYPE,
      Year,
      Quarter,
      'Cases with a hearing' AS Category,
      'N/A' AS PARTY,
      'N/A' AS Gender,
      'N/A' AS Representation
    FROM
      __temp__.Hearing_ADOPT_Case)
GROUP BY
  CASE_TYPE,
  Year,
  Quarter,
  Category,
  PARTY,
  Gender,
  Representation;
"""

pydb.start_query_execution_and_wait(create_Adopt_Case_Hearings);



#### Adopt_Case_Hearings validation

In [72]:
Adopt_Case_Hearings_count = pydb.read_sql_query("select count(*) as count from fcsq.Adopt_Case_Hearings")
Adopt_Case_Hearings_count

Unnamed: 0,count
0,46


## 23. Adoption table - Joins the applicant/respondent representation count tables, case count table, and case hearing count tables 
<a name="Adoption"></a>

### Drop the Adoption table if it already exists and remove its data from the S3 bucket

In [73]:
drop_Adoption = f"""
DROP TABLE IF EXISTS fcsq.Adoption;
"""
pydb.start_query_execution_and_wait(drop_Adoption)

# clean up previous Adoption files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adoption/").delete();

### Create the Adoption table in Athena

In [74]:
create_Adoption = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adoption
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adoption') AS
SELECT
  *
FROM
 fcsq.ADOPT_APP
UNION ALL
SELECT
  *
FROM
  fcsq.ADOPT_RESP
UNION ALL
SELECT
  *
FROM
  fcsq.ADOPT_CASE
UNION ALL
SELECT
  *
FROM
  fcsq.ADOPT_CASE_HEARINGS;
"""

pydb.start_query_execution_and_wait(create_Adoption);



#### Adoption validation

In [75]:
Adoption_count = pydb.read_sql_query("select count(*) as count from fcsq.Adoption")
Adoption_count

Unnamed: 0,count
0,581


## 24. Applicant_representation table - creates a table showing whether all, some, or none of the applicants for a case have representation
<a name="Applicant_representation"></a>

### Drop the Applicant_representation table if it already exists and remove its data from the S3 bucket

In [76]:
drop_Applicant_representation = f"""
DROP TABLE IF EXISTS fcsq.Applicant_representation;
"""
pydb.start_query_execution_and_wait(drop_Applicant_representation)

# clean up previous Applicant_representation files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Applicant_representation/").delete();

### Create the Applicant_representation table in Athena

In [77]:
create_applicants_1 = f"""
SELECT Distinct t1.Case_Number, t1.Party, MAX(t1.Representative_Role) as Max_Rep_Role
FROM fcsq.Adopt_Applicant_Info t1
Group by Case_Number, Party;
"""

pydb.create_temp_table(create_applicants_1,'applicants_1')

create_applicants_2 = f"""
SELECT  t1.*,
case when Max_Rep_Role IS NULL then 0
else 1
end as Rep_IND
FROM __temp__.Applicants_1 t1;
"""
pydb.create_temp_table(create_applicants_2,'applicants_2')

create_applicants_3 = f"""
SELECT Distinct t1.Case_Number,
Count(t1.Party) as CountOfParty,
SUM(t1.Rep_Ind) as SumOfRep_IND
FROM __temp__.Applicants_2 t1
Group by Case_Number;
"""
pydb.create_temp_table(create_applicants_3,'applicants_3')

create_Applicant_representation = f"""
CREATE TABLE IF NOT EXISTS fcsq.Applicant_representation
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Applicant_representation') AS
SELECT t1.Case_Number,
t1.CountOfParty,
t1.SumOfRep_IND,
CASE WHEN t1.SumOfRep_Ind > t1.CountOfParty then 'Error'
WHEN t1.SumOfRep_Ind = t1.CountOfParty then 'All'
WHEN t1.SumOfRep_Ind =0 then 'None' else 'Some'  end as App_Rep_Cat
FROM __temp__.Applicants_3 t1;
"""

pydb.start_query_execution_and_wait(create_Applicant_representation);



#### Applicant_representation validation

In [78]:
Applicant_representation_count = pydb.read_sql_query("select count(*) as count from fcsq.Applicant_representation")
Applicant_representation_count

Unnamed: 0,count
0,182174


## 25. Respondent_Representation table - creates a table showing whether all, some, or none of the respondents for a case have representation
<a name="Respondent_representation"></a>

### Drop the Respondent_Representation table if it already exists and remove its data from the S3 bucket

In [79]:
drop_Respondent_Representation = f"""
DROP TABLE IF EXISTS fcsq.Respondent_Representation;
"""
pydb.start_query_execution_and_wait(drop_Respondent_Representation)

# clean up previous Respondent_Representation files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Respondent_Representation/").delete();

### Create the Respondent_Representation table in Athena

In [80]:
create_respondents_1 = f"""
SELECT Distinct t1.Case_Number, t1.Party, MAX(t1.Representative_Role) as Max_Rep_Role
FROM fcsq.Adopt_Respondent_Info t1
Group by Case_Number, Party;
"""

pydb.create_temp_table(create_respondents_1,'respondents_1')

create_respondents_2 = f"""
SELECT  t1.*,
case when Max_Rep_Role IS NULL then 0
else 1
end as Rep_IND
FROM __temp__.respondents_1 t1;
"""
pydb.create_temp_table(create_respondents_2,'respondents_2')

create_respondents_3 = f"""
SELECT Distinct t1.Case_Number,
Count(t1.Party) as CountOfParty,
SUM(t1.Rep_Ind) as SumOfRep_IND
FROM __temp__.respondents_2 t1
Group by Case_Number;
"""
pydb.create_temp_table(create_respondents_3,'respondents_3')

create_Respondent_Representation = f"""
CREATE TABLE IF NOT EXISTS fcsq.Respondent_Representation
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Respondent_Representation') AS
SELECT t1.Case_Number,
t1.CountOfParty,
t1.SumOfRep_IND,
CASE WHEN t1.SumOfRep_Ind > t1.CountOfParty then 'Error'
WHEN t1.SumOfRep_Ind = t1.CountOfParty then 'All'
WHEN t1.SumOfRep_Ind =0 then 'None' else 'Some'  end as Res_Rep_Cat
FROM __temp__.Respondents_3 t1
"""

pydb.start_query_execution_and_wait(create_Respondent_Representation);



#### Respondent_Representation validation

In [81]:
Respondent_Representation_count = pydb.read_sql_query("select count(*) as count from fcsq.Respondent_Representation")
Respondent_Representation_count

Unnamed: 0,count
0,171474


## 26. Adopt_Disposals_Final table - joins the adopt_apps_and_orders_match table and the representation tables, joining timeliness and legal representation data for a case together
<a name="Adopt_Disposals_Final"></a>

In [162]:
create_ADOPT_APP_AND_ORDERS_WITH_REP = f"""
SELECT t1.*,
t2.APP_REP_CAT, 
t3.RES_REP_CAT
FROM fcsq.ADOPT_APPS_AND_ORDERS_MATCH AS t1
LEFT JOIN fcsq.Applicant_Representation AS t2 ON (t1.CASE_NUMBER = t2.CASE_NUMBER)
LEFT JOIN fcsq.RESPONDENT_REPRESENTATION as t3 ON t1.CASE_NUMBER = t3.CASE_NUMBER;
"""
pydb.create_temp_table(create_ADOPT_APP_AND_ORDERS_WITH_REP,'ADOPT_APP_AND_ORDERS_WITH_REP')

In [170]:
df = pydb.read_sql_query("SELECT type,avg(wait_weeks) as avg_wait_weeks,count(*) as count FROM __temp__.ADOPT_APP_AND_ORDERS_WITH_REP group by type")
df.to_csv(path_or_buf = 'order_type_waits.csv',index=False)

In [143]:
pydb.read_sql_query("SELECT * from __temp__.adopt_app_and_orders_with_rep WHERE adoption <> 'Adoption'")

Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court,app_rep_cat,res_rep_cat
0,KH13Z05140,PLA,239,2013-03-08,2013-03-21,1.857143,A70,2013,1,,Non-adoption,Placement,239,All,All
1,SD21Z00076,PLA,554,2021-05-11,2021-05-19,1.142857,A70,2021,2,,Non-adoption,Placement,554,,All
2,WD20Z00932,PLA,362,2020-06-08,2020-07-17,5.571429,A70,2020,3,,Non-adoption,Placement,362,All,All
3,MA04A00006,FO,262,2004-01-12,2004-01-26,2.000000,A12,2004,1,,Non-adoption,Placement,262,,All
4,SA05A00275,FO,344,2005-03-30,2005-10-20,29.142857,A12,2005,4,,Non-adoption,Placement,344,All,Some
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70155,TF13Z03226,PLA,559,2013-06-03,2013-06-27,3.428571,A70,2013,2,,Non-adoption,Placement,559,,
70156,UN15Z00017,PLA,545,2015-08-25,2015-10-09,6.428571,A70,2015,4,,Non-adoption,Placement,545,,Some
70157,BM15Z09687,PLA,127,2015-10-23,2015-12-17,7.857143,A70,2015,4,,Non-adoption,Placement,127,,All
70158,SE16Z00326,PLA,320,2016-02-22,2016-02-24,0.285714,A70,2016,1,,Non-adoption,Placement,320,,All


In [113]:
pydb.read_sql_query("SELECT Count(*) as COUNT from __temp__.adopt_app_and_orders_with_rep")

Unnamed: 0,count
0,153991


In [144]:
create_Adopt_Disposals_Final = f"""
SELECT t1.*,
t2.Region_Pre2014, 
t2.Region,
cast(t1.Year as varchar(3)) || '-Q' || cast(t1.quarter as varchar(3)) AS Quarter2, 
case when (t1.APP_REP_CAT Is Null Or t1.RES_REP_CAT Is Null)  then '5 Unknown'
    when t1.APP_REP_CAT='None' and t1.RES_REP_CAT ='None' then '4 Neither'
    when t1.APP_REP_CAT='None' and t1.RES_REP_CAT != 'None' then '3 Respondent Only'
    when t1.APP_REP_CAT != 'None' and t1.RES_REP_CAT = 'None' then '2 Applicant Only'
Else '1 Both'
End AS REP_CAT,
Case when t1.YEAR < 2014 then t2.Region_Pre2014
Else t2.Region
End As Final_Region

FROM __temp__.ADOPT_APP_AND_ORDERS_WITH_REP AS t1 LEFT JOIN fcsq.COURT_MV_FEB21_DFJ as t2
ON t1.DSP_COURT = cast(t2.Code as varchar(3));

"""

pydb.create_temp_table(create_Adopt_Disposals_Final,'Adopt_Disposals_Final')


create_Adopt_Disposals_Final_2 = f"""
SELECT *
FROM __temp__.ADOPT_DISPOSALS_FINAL
WHERE adoption = 'Adoption';
"""

pydb.create_temp_table(create_Adopt_Disposals_Final_2,'Adopt_Disposals_Final_2')



In [115]:
pydb.read_sql_query(f"SELECT * FROM {database}.courts_mv WHERE {database}.courts_mv.mojap_snapshot_date = date '{snapshot_date}' AND code=150")

Unnamed: 0,court,code,name,tel_no,fax_no,live,dx_number,sups_centralised_flag,default_printer,district_registry,fap_id,deed_pack_number,welsh_high_court_name,welsh_county_court_name,lov_name,fpc_court_indicator,a_rowid,b_rowid,welsh_court_name,dr_tel_no,open_from,closed_at,dr_open_from,dr_closed_at,by_appointment_ind,dfj_area_id,mojap_image_tag,mojap_file_land_timestamp,mojap_snapshot_date
0,BN,150,BRIGHTON,0300 123 5577,,Y,98070 BRIGHTON 3,Y,CA325758BP-PRN0724,Y,PRN0724,7,,,BRIGHTON,N,AAAPFLAAJAAACgnAAD,AAAPFLAAJAAACgnAAD,,0300 123 5577,,,,,Y,BRG,v1.2.10,1656074245,2022-06-24
1,HB,150,BRIGHTON,01273 811333,01273 607638,N,142600 BRIGHTON-12,Y,CA325758BP-PRN0724,Y,PRN0724,7,,,BRIGHTON,N,AAAPFLAAJAAACgnAAD,AAAPFLAAJAAACgnAAD,,01273 811333,,,,,Y,BRG,v1.2.10,1656074245,2022-06-24


In [116]:
pydb.read_sql_query("SELECT * FROM __temp__.Adopt_Disposals_Final where CASE_NUMBER='HB12Z00228'")

Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court,app_rep_cat,res_rep_cat,region_pre2014,region,quarter2,rep_cat,final_region
0,HB12Z00228,AO,150,2012-12-06,2013-03-27,15.857143,A76,2013,1,,Adoption,Standard,150,All,Some,SOUTH EAST,SOUTH EAST,2013-Q1,1 Both,SOUTH EAST


In [117]:
pydb.read_sql_query("SELECT * FROM __temp__.Adopt_Disposals_Final_2 where Case_number='MB20Z01362'")

Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court,app_rep_cat,res_rep_cat,region_pre2014,region,quarter2,rep_cat,final_region


In [118]:
pydb.read_sql_query("SELECT * FROM __temp__.Adopt_Disposals_Final_2 ORDER BY wait_weeks desc")

Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court,app_rep_cat,res_rep_cat,region_pre2014,region,quarter2,rep_cat,final_region


In [119]:
pydb.read_sql_query("SELECT distinct app_type FROM __temp__.Adopt_Disposals_Final_2")

Unnamed: 0,app_type


In [120]:
df =pydb.read_sql_query("select * from __temp__.Adopt_Disposals_Final_2 order by case_number")
df.to_csv(path_or_buf = '~/FCSQ_data/debug.csv',index=False)

In [121]:
pydb.read_sql_query("select * from __temp__.Adopt_Disposals_Final_2 where case_number='MB20Z01363'")

Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court,app_rep_cat,res_rep_cat,region_pre2014,region,quarter2,rep_cat,final_region


#### Adopt_Disposals_Final validation

In [145]:
Adopt_Disposals_Final_count = pydb.read_sql_query("select count(*) as count from __temp__.Adopt_Disposals_Final_2")
Adopt_Disposals_Final_count

Unnamed: 0,count
0,63928


In [123]:
check = pydb.read_sql_query("select * from __temp__.Adopt_Disposals_Final_2 where rep_cat='4 Neither' and year=2020 and quarter=4 order by wait_weeks desc")
check.to_csv(path_or_buf = '~/FCSQ_data/check.csv',index=False)

In [124]:
pydb.read_sql_query("SELECT * FROM adopt_disposals_final_2 where case_number='LV20Z04103'")

Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court,app_rep_cat,res_rep_cat,region_pre2014,region,quarter2,rep_cat,final_region


In [125]:
pydb.read_sql_query("select * from __temp__.Adopt_Disposals_Final_2 where case_number in ('MB20Z01235','MB20Z01362','MB20Z01595','WX20Z00011','WX20Z00012','WX20Z00013','WX20Z00014','YO20Z00167')")


Unnamed: 0,case_number,app_type,court,app_date,disp_date,wait_weeks,disp_type,year,quarter,order_type,adoption,type,dsp_court,app_rep_cat,res_rep_cat,region_pre2014,region,quarter2,rep_cat,final_region


## 27. Adopt_Quarterly table - groups the disposals_final table by quarter, providing a total and mean wait_weeks per quarter
<a name="Adopt_Quarterly"></a>

### Drop the Adopt_Quarterly table if it already exists and remove its data from the S3 bucket

In [149]:
drop_Adopt_Quarterly = f"""
DROP TABLE IF EXISTS fcsq.Adopt_Quarterly;
"""
pydb.start_query_execution_and_wait(drop_Adopt_Quarterly)

# clean up previous Adopt_Quarterly files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_Quarterly/").delete();

### Create the Adopt_Quarterly table in Athena

In [150]:
create_Adopt_Quarterly = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_Quarterly
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_Quarterly') AS
SELECT DISTINCT
        'Adoption' as type,
        year,
        'Q' || cast(quarter as varchar(3)) AS quarter,
        rep_cat,
        count(*) as n,
        avg(wait_weeks) as mean
    FROM 
        __temp__.ADOPT_DISPOSALS_FINAL_2
    WHERE year > 2010
    AND adoption='Adoption'
GROUP BY
    year,
    quarter,
    rep_cat
    
UNION ALL
SELECT DISTINCT
        'Adoption' as type,
        year,
        'Q' || cast(quarter as varchar(3)) AS quarter,
        'All' as rep_cat,
        count(*) as n,
        avg(wait_weeks) as mean
    FROM 
        __temp__.ADOPT_DISPOSALS_FINAL_2
    WHERE year > 2010 
    AND adoption='Adoption'
GROUP BY
    year,
    quarter
"""


testA = f"""
SELECT * FROM __temp__.ADOPT_DISPOSALS_FINAL_2 WHERE year=2015 AND quarter = 3;
"""
testB = f"""
SELECT * FROM fcsq.Adopt_Quarterly WHERE year=2015 AND quarter = 'Q3' ORDER BY rep_cat
"""

pydb.start_query_execution_and_wait(create_Adopt_Quarterly)
#pydb.read_sql_query(testA)
pydb.read_sql_query(testB)





Unnamed: 0,type,year,quarter,rep_cat,n,mean
0,Adoption,2015,Q3,1 Both,179,6.959298
1,Adoption,2015,Q3,2 Applicant Only,31,8.493088
2,Adoption,2015,Q3,3 Respondent Only,775,7.036313
3,Adoption,2015,Q3,4 Neither,108,9.990741
4,Adoption,2015,Q3,5 Unknown,3,10.142857
5,Adoption,2015,Q3,All,1096,7.364572


#### Adopt_Quarterly validation

In [151]:
Adopt_Quarterly_count = pydb.read_sql_query("select count(*) as count from fcsq.Adopt_Quarterly")
Adopt_Quarterly_count

Unnamed: 0,count
0,248


In [129]:
pydb.read_sql_query("select * from fcsq.Adopt_Quarterly WHERE year = 2014 AND quarter = \'Q2' ORDER BY rep_cat")

Unnamed: 0,type,year,quarter,rep_cat,n,mean


## 28. Adopt_Annual table - groups the disposals_final table annually, providing a total and mean wait_weeks annually
<a name="Adopt_Annual"></a>

### Drop the Adopt_Annual table if it already exists and remove its data from the S3 bucket

In [152]:
drop_Adopt_Annual = f"""
DROP TABLE IF EXISTS fcsq.Adopt_Annual;
"""
pydb.start_query_execution_and_wait(drop_Adopt_Annual)

# clean up previous Adopt_Annual files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/Adopt_Annual/").delete();

### Create the Adopt_Annual table in Athena

In [153]:
create_Adopt_Annual = f"""
CREATE TABLE IF NOT EXISTS fcsq.Adopt_Annual
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/Adopt_Annual') AS
SELECT DISTINCT
        'Adoption' as type,
        year,
        'N/A' as quarter,
        rep_cat,
        count(*) as n,
        avg(wait_weeks) as mean
    FROM 
        __temp__.ADOPT_DISPOSALS_FINAL_2
    WHERE year > 2010 
    AND year < 2022
    AND adoption='Adoption'
GROUP BY
    year,
    rep_cat
    
UNION ALL
SELECT DISTINCT
        'Adoption' as type,
        year,
        'N/A' as quarter,
        'All' as rep_cat,
        count(*) as n,
        avg(wait_weeks) as mean
    FROM 
        __temp__.ADOPT_DISPOSALS_FINAL_2
    WHERE year > 2010 
    AND year < 2022
    AND adoption='Adoption'
GROUP BY
    year
"""

pydb.start_query_execution_and_wait(create_Adopt_Annual);



#### Adopt_Annual validation

In [154]:
Adopt_Annual_count = pydb.read_sql_query("select count(*) as count from fcsq.Adopt_Annual")
Adopt_Annual_count



Unnamed: 0,count
0,63


In [155]:
df = pydb.read_sql_query("SELECT * FROM fcsq.Adoption")
df.to_csv(path_or_buf = 's3://alpha-family-data/CSVs/Adoption_legrep.csv',index=False)
df.to_csv(path_or_buf = '~/FCSQ_data/Adoption_legrep.csv',index=False)
#df.to_excel('adoption.xlsx')


In [156]:
df = pydb.read_sql_query("SELECT * FROM fcsq.Adopt_Annual UNION ALL SELECT * FROM fcsq.Adopt_Quarterly ORDER BY type,year,quarter,rep_cat")
df.to_csv(path_or_buf = '~/FCSQ_data/timeliness.csv',index=False)

## 29. adopt_timeliness_combined table - combines and orders the annual and quarterly timeliness data 
<a name="Adopt_timeliness_combined"></a>

### Drop the adopt_timeliness_combined table if it already exists and remove its data from the S3 bucket

In [157]:
drop_adopt_timeliness_combined = f"""
DROP TABLE IF EXISTS fcsq.adopt_timeliness_combined;
"""
pydb.start_query_execution_and_wait(drop_adopt_timeliness_combined)

# clean up previous adopt_timeliness_combined files
bucket.objects.filter(Prefix="fcsq_processing/Adoption/adopt_timeliness_combined/").delete();

### Create the adopt_timeliness_combined table in Athena

In [158]:
create_adopt_timeliness_combined = f"""
CREATE TABLE IF NOT EXISTS fcsq.adopt_timeliness_combined
WITH (format = 'PARQUET', external_location = 's3://alpha-family-data/fcsq_processing/Adoption/adopt_timeliness_combined') AS
SELECT * FROM fcsq.Adopt_Annual 
UNION ALL 
SELECT * FROM fcsq.Adopt_Quarterly ORDER BY type,year,quarter,rep_cat
"""

pydb.start_query_execution_and_wait(create_adopt_timeliness_combined);



#### adopt_timeliness_combined validation

In [159]:
adopt_timeliness_combined_count = pydb.read_sql_query("select count(*) as count from fcsq.adopt_timeliness_combined")
adopt_timeliness_combined_count

Unnamed: 0,count
0,311


In [160]:
df = pydb.read_sql_query("SELECT * FROM fcsq.adopt_timeliness_combined")
df = df.pivot_table(index=['type','year','quarter'],columns=['rep_cat'],values = ['n','mean'],aggfunc=sum, fill_value=0).swaplevel(axis=1).sort_index(axis=1)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,rep_cat,1 Both,1 Both,2 Applicant Only,2 Applicant Only,3 Respondent Only,3 Respondent Only,4 Neither,4 Neither,5 Unknown,5 Unknown,All,All
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,n,mean,n,mean,n,mean,n,mean,n,mean,n
type,year,quarter,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2
Adoption,2011,,7.487707,889,7.408623,222,7.705748,3062,7.674733,856,4.521008,17,7.638271,5046
Adoption,2011,Q1,6.65742,206,6.370262,49,8.008535,703,8.03588,215,3.795918,7,7.684625,1180
Adoption,2011,Q2,6.851935,192,10.171429,60,7.165254,708,6.144928,207,3.785714,2,7.081633,1169
Adoption,2011,Q3,7.871429,210,3.909091,55,7.744257,796,9.064815,216,7.942857,5,7.823824,1282
Adoption,2011,Q4,8.244026,281,8.746305,58,7.868505,855,7.39384,218,1.0,3,7.891368,1415
Adoption,2012,,7.693182,1144,7.854937,259,7.062042,4018,9.260298,978,6.054945,13,7.539925,6412
Adoption,2012,Q1,9.429603,277,8.966518,64,8.080847,857,10.941276,253,6.44898,7,8.864491,1458
Adoption,2012,Q2,7.236047,279,7.007389,58,6.796913,944,7.18797,209,0.142857,1,6.937626,1491
Adoption,2012,Q3,7.211567,289,7.100703,61,6.999742,1107,10.967285,262,5.857143,3,7.640534,1722
Adoption,2012,Q4,6.976589,299,8.171053,76,6.563063,1110,7.530371,254,7.928571,2,6.846968,1741


In [161]:
df.to_csv(path_or_buf = '~/FCSQ_data/timeliness.csv',index=False)