# Patient Support Analysis (Part 1)

## UnitedHealth SQL Interview Question

### Question  
UnitedHealth Group (UHG) has a program called **Advocate4Me**, which allows policy holders (or, members) to call an advocate and receive support for their health care needs – whether that's claims and benefits support, drug coverage, pre- and post-authorisation, medical records, emergency assistance, or member portal services.

Write a query to find how many UHG policy holders made **three or more calls**, assuming each call is identified by the `case_id` column.

If you like this question, try out **Patient Support Analysis (Part 2)!**

---

### `callers` Table:

| Column Name         | Type      |
|---------------------|-----------|
| policy_holder_id    | integer   |
| case_id             | varchar   |
| call_category       | varchar   |
| call_date           | timestamp |
| call_duration_secs  | integer   |

---

### Example Input:

| policy_holder_id | case_id                                | call_category        | call_date              | call_duration_secs |
|------------------|----------------------------------------|----------------------|------------------------|--------------------|
| 1                | f1d012f9-9d02-4966-a968-bf6c5bc9a9fe    | emergency assistance | 2023-04-13T19:16:53Z   | 144                |
| 1                | 41ce8fb6-1ddd-4f50-ac31-07bfcce6aaab    | authorisation        | 2023-05-25T09:09:30Z   | 815                |
| 2                | 9b1af84b-eedb-4c21-9730-6f099cc2cc5e    | claims assistance    | 2023-01-26T01:21:27Z   | 992                |
| 2                | 8471a3d4-6fc7-4bb2-9fc7-4583e3638a9e    | emergency assistance | 2023-03-09T10:58:54Z   | 128                |
| 2                | 38208fae-bad0-49bf-99aa-7842ba2e37bc    | benefits             | 2023-06-05T07:35:43Z   | 619                |

---

### Example Output:

| policy_holder_count |
|---------------------|
| 1                   |

---

### Explanation:

The only caller who made **three or more calls** is **policy holder ID 2**.  
Therefore, the number of such policy holders is **1**.


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import *

# Create Spark session
spark = SparkSession.builder.master('local[1]').appName("PatientSupportAnalysisPart1").getOrCreate()

# Define schema
schema = StructType([
    StructField("policy_holder_id", IntegerType(), True),
    StructField("case_id", StringType(), True),
    StructField("call_category", StringType(), True),
    StructField("call_date", StringType(), True),
    StructField("call_duration_secs", IntegerType(), True),
])

# Sample data
data = [
    (1, "f1d012f9-9d02-4966-a968-bf6c5bc9a9fe", "emergency assistance", "2023-04-13T19:16:53", 144),
    (1, "41ce8fb6-1ddd-4f50-ac31-07bfcce6aaab", "authorisation", "2023-05-25T09:09:30", 815),
    (2, "9b1af84b-eedb-4c21-9730-6f099cc2cc5e", "claims assistance", "2023-01-26T01:21:27", 992),
    (2, "8471a3d4-6fc7-4bb2-9fc7-4583e3638a9e", "emergency assistance", "2023-03-09T10:58:54", 128),
    (2, "38208fae-bad0-49bf-99aa-7842ba2e37bc", "benefits", "2023-06-05T07:35:43", 619),
]

# Create DataFrame
callers_df = spark.createDataFrame(data, schema)

# Show DataFrame
callers_df.show(truncate=False)


+----------------+------------------------------------+--------------------+-------------------+------------------+
|policy_holder_id|case_id                             |call_category       |call_date          |call_duration_secs|
+----------------+------------------------------------+--------------------+-------------------+------------------+
|1               |f1d012f9-9d02-4966-a968-bf6c5bc9a9fe|emergency assistance|2023-04-13T19:16:53|144               |
|1               |41ce8fb6-1ddd-4f50-ac31-07bfcce6aaab|authorisation       |2023-05-25T09:09:30|815               |
|2               |9b1af84b-eedb-4c21-9730-6f099cc2cc5e|claims assistance   |2023-01-26T01:21:27|992               |
|2               |8471a3d4-6fc7-4bb2-9fc7-4583e3638a9e|emergency assistance|2023-03-09T10:58:54|128               |
|2               |38208fae-bad0-49bf-99aa-7842ba2e37bc|benefits            |2023-06-05T07:35:43|619               |
+----------------+------------------------------------+-----------------

In [7]:
callers_df.groupBy('policy_holder_id').agg(count('case_id'))\
        .where('count(case_id)>=3')\
        .agg(count('policy_holder_id').alias('policy_holder_count'))\
        .show()

+-------------------+
|policy_holder_count|
+-------------------+
|                  1|
+-------------------+



In [4]:
callers_df.createOrReplaceTempView('callers')

spark.sql('''with cte as 
(SELECT policy_holder_id,count(case_id)
FROM callers
group by policy_holder_id
HAVING count(case_id)>=3)

select count(policy_holder_id) as policy_holder_count 
FROM cte''').show()

+-------------------+
|policy_holder_count|
+-------------------+
|                  1|
+-------------------+

