# Average treatment effect

Source: https://towardsdatascience.com/twenty-five-sql-practice-exercises-5fc791e24082

From the following table summarizing the results of a study, calculate the average treatment effect as well as upper and lower bounds of the 95% confidence interval. Round these numbers to 3 decimal places.

In [1]:
%run Question.ipynb

 * postgresql://fknight:***@localhost/postgres
Done.
Done.
9 rows affected.
9 rows affected.


# Part A

For the control group, write a query to find the 

* Group size
* Average outcome 
* Standard deviation

## Example answer

In [2]:
%%sql

SELECT 
    1.0*sum(outcome)/count(*) AS avg_outcome,
    stddev(outcome) AS std_dev,
    count(*) AS group_size 
FROM study
WHERE assignment = 0

 * postgresql://fknight:***@localhost/postgres
1 rows affected.


avg_outcome,std_dev,group_size
0.5,0.5773502691896257,4


# Part B

For the treatment group, write a query to find the 

* Group size
* Average outcome 
* Standard deviation

## Example answer

In [3]:
%%sql

SELECT 1.0*sum(outcome)/count(*) AS avg_outcome,
stddev(outcome) AS std_dev,
count(*) AS group_size FROM study
WHERE assignment = 1

 * postgresql://fknight:***@localhost/postgres
1 rows affected.


avg_outcome,std_dev,group_size
0.8,0.4472135954999579,5


# Part C

Using the subqueries from Part A & B, write a query that shows the difference in average outcome between the treatment group and control group.

```sql
WITH control_group AS (
    SELECT 
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 0    
),

treatment_group AS (
    SELECT
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 1
)
```

## Example answer

In [5]:
%%sql

WITH control_group AS (
    SELECT 
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 0    
),

treatment_group AS (
    SELECT
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 1
)

SELECT t.avg_outcome - c.avg_outcome AS effect_size 
FROM control_group c, treatment_group t

 * postgresql://fknight:***@localhost/postgres
1 rows affected.


effect_size
0.3


# Part D

Using the subqueries from Parts A, B, & C, construct 95% confidence interval using `z* = 1.96` and magnitude of individual standard errors `std dev / sqrt(sample size)`

```sql
WITH control_group AS (
    SELECT 
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 0    
),

treatment_group AS (
    SELECT
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 1
),

effect AS (
    SELECT t.avg_outcome - c.avg_outcome AS effect_size 
    FROM control_group c, treatment_group t
)
```

## Example answer

In [7]:
%%sql

WITH control_group AS (
    SELECT 
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 0    
),

treatment_group AS (
    SELECT
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 1
),

effect AS (
    SELECT t.avg_outcome - c.avg_outcome AS effect_size 
    FROM control_group c, treatment_group t
)

SELECT 
    1.96 * (t.std_dev^2 / t.group_size
            + c.std_dev^2 / c.group_size)^0.5 AS conf_int 
FROM control_group c, treatment_group t

 * postgresql://fknight:***@localhost/postgres
1 rows affected.


conf_int
0.6883293785197122


# Part E

Using the subqueries from Parts A, B, C, & D, solve the original problem.

```sql
WITH control_group AS (
    SELECT 
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 0    
),

treatment_group AS (
    SELECT
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 1
),

effect AS (
    SELECT t.avg_outcome - c.avg_outcome AS effect_size 
    FROM control_group c, treatment_group t
),

interval AS (
    SELECT 
    1.96 * (t.std_dev^2 / t.group_size
            + c.std_dev^2 / c.group_size)^0.5 AS conf_int 
    FROM control_group c, treatment_group t
)
```

## Example answer

In [11]:
%%sql

WITH control_group AS (
    SELECT 
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 0    
),

treatment_group AS (
    SELECT
        1.0*sum(outcome)/count(*) AS avg_outcome,
        stddev(outcome) AS std_dev,
        count(*) AS group_size 
    FROM study
    WHERE assignment = 1
),

effect AS (
    SELECT t.avg_outcome - c.avg_outcome AS effect_size 
    FROM control_group c, treatment_group t
),

interval AS (
    SELECT 
    1.96 * (t.std_dev^2 / t.group_size
            + c.std_dev^2 / c.group_size)^0.5 AS conf_int 
    FROM control_group c, treatment_group t
)


SELECT 
    round(e.effect_size, 3) AS point_estimate, 
    round(e.effect_size - i.conf_int, 3) AS lower_bound, 
    round(e.effect_size + i.conf_int, 3) AS upper_bound
FROM effect e, interval i;

 * postgresql://fknight:***@localhost/postgres
1 rows affected.


point_estimate,lower_bound,upper_bound
0.3,-0.388,0.988


## The solution is given below

In [2]:
%%sql

-- get average outcomes, standard deviations, and group sizes for 
-- control and treatment groups

WITH control AS (
    SELECT 1.0*sum(outcome)/count(*) AS avg_outcome,
    stddev(outcome) AS std_dev,
    count(*) AS group_size FROM study
    WHERE assignment = 0 
),

treatment AS (
    SELECT 1.0*sum(outcome)/count(*) AS avg_outcome,
    stddev(outcome) AS std_dev,
    count(*) AS group_size FROM study
    WHERE assignment = 1 
),

-- get average treatment effect size

effect_size AS (
    SELECT t.avg_outcome - c.avg_outcome AS effect_size 
    FROM control c, treatment t 
),

-- construct 95% confidence interval using z* = 1.96 and magnitude 
-- of individual standard errors [ std dev / sqrt(sample size) ]

conf_interval AS (
    SELECT 
        1.96 * (t.std_dev^2 / t.group_size
                + c.std_dev^2 / c.group_size)^0.5 AS conf_int 
    FROM treatment t, control c 
)

SELECT 
    round(es.effect_size, 3) AS point_estimate, 
    round(es.effect_size - ci.conf_int, 3) AS lower_bound, 
    round(es.effect_size + ci.conf_int, 3) AS upper_bound
FROM effect_size es, conf_interval ci;

 * postgresql://fknight:***@localhost/postgres
1 rows affected.


point_estimate,lower_bound,upper_bound
0.3,-0.388,0.988
