# Patient Support Analysis (Part 2)
## UnitedHealth SQL Interview Question

### Question:
UnitedHealth Group (UHG) has a program called Advocate4Me, which allows policy holders (or, members) to call an advocate and receive support for their health care needs – whether that's claims and benefits support, drug coverage, pre- and post-authorisation, medical records, emergency assistance, or member portal services.

Calls to the Advocate4Me call centre are classified into various categories, but some calls cannot be neatly categorised. These uncategorised calls are labeled as “n/a”, or are left empty when the support agent does not enter anything into the call category field.

Write a query to calculate the percentage of calls that cannot be categorised. Round your answer to 1 decimal place. For example, 45.0, 48.5, 57.7.

### callers Table:
| Column Name       | Type   |
|-------------------|--------|
| policy_holder_id  | integer|
| case_id           | varchar|
| call_category     | varchar|
| call_date         | timestamp|
| call_duration_secs| integer|

#### callers Example Input:
| policy_holder_id | case_id                              | call_category     | call_date               | call_duration_secs |
|------------------|--------------------------------------|-------------------|-------------------------|--------------------|
| 1                | f1d012f9-9d02-4966-a968-bf6c5bc9a9fe | emergency assistance| 2023-04-13T19:16:53Z   | 144                |
| 1                | 41ce8fb6-1ddd-4f50-ac31-07bfcce6aaab | authorisation      | 2023-05-25T09:09:30Z   | 815                |
| 2                | 9b1af84b-eedb-4c21-9730-6f099cc2cc5e | n/a                | 2023-01-26T01:21:27Z   | 992                |
| 2                | 8471a3d4-6fc7-4bb2-9fc7-4583e3638a9e | emergency assistance| 2023-03-09T10:58:54Z   | 128                |
| 2                | 38208fae-bad0-49bf-99aa-7842ba2e37bc | benefits           | 2023-06-05T07:35:43Z   | 619                |

#### Example Output:
| uncategorised_call_pct |
|------------------------|
| 20.0                   |

#### Explanation:
Out of the total of 5 calls registered, one call was not categorised. Therefore, the percentage of uncategorised calls is calculated as 20.0% (1 out of 5 multiplied by 100 and rounded to one decimal place).


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.functions import *

# Initialize Spark session
spark = SparkSession.builder.master('local[1]').appName("PatientSupportAnalysis").getOrCreate()

# Sample data for the callers table
data = [
    (1, "f1d012f9-9d02-4966-a968-bf6c5bc9a9fe", "emergency assistance", "2023-04-13T19:16:53Z", 144),
    (1, "41ce8fb6-1ddd-4f50-ac31-07bfcce6aaab", "authorisation", "2023-05-25T09:09:30Z", 815),
    (2, "9b1af84b-eedb-4c21-9730-6f099cc2cc5e", "n/a", "2023-01-26T01:21:27Z", 992),
    (2, "8471a3d4-6fc7-4bb2-9fc7-4583e3638a9e", "emergency assistance", "2023-03-09T10:58:54Z", 128),
    (2, "38208fae-bad0-49bf-99aa-7842ba2e37bc", "benefits", "2023-06-05T07:35:43Z", 619)
]

# Define schema for the callers table
columns = ["policy_holder_id", "case_id", "call_category", "call_date", "call_duration_secs"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Show DataFrame
df.show()


+----------------+--------------------+--------------------+--------------------+------------------+
|policy_holder_id|             case_id|       call_category|           call_date|call_duration_secs|
+----------------+--------------------+--------------------+--------------------+------------------+
|               1|f1d012f9-9d02-496...|emergency assistance|2023-04-13T19:16:53Z|               144|
|               1|41ce8fb6-1ddd-4f5...|       authorisation|2023-05-25T09:09:30Z|               815|
|               2|9b1af84b-eedb-4c2...|                 n/a|2023-01-26T01:21:27Z|               992|
|               2|8471a3d4-6fc7-4bb...|emergency assistance|2023-03-09T10:58:54Z|               128|
|               2|38208fae-bad0-49b...|            benefits|2023-06-05T07:35:43Z|               619|
+----------------+--------------------+--------------------+--------------------+------------------+



In [17]:
df.agg(
    (
        100*sum( 
            when((col('call_category').isNull()) | (col('call_category')=='n/a'),1)\
                .otherwise(0)
            )/sum(lit(1))).alias('uncategorised_call_pct')
    ).show()

+----------------------+
|uncategorised_call_pct|
+----------------------+
|                  20.0|
+----------------------+



In [2]:
df.createOrReplaceTempView('callers')

In [4]:
%%sparksql
SELECT 
  round(100.0*sum(case when call_category IS NULL OR call_category='n/a' 
           then 1 
           else 0 
           end)/sum(1),1) as uncategorised_call_pct
FROM callers 

+----------------------+
|uncategorised_call_pct|
+----------------------+
|                  20.0|
+----------------------+

