# Pharmacy Analytics (Part 3)

## CVS Health SQL Interview Question

### Question  
CVS Health wants to gain a clearer understanding of its pharmacy sales and the performance of various products.

Write a query to calculate the total drug sales for each manufacturer.  
Round the answer to the **nearest million** and report your results in **descending order** of total sales.  
In case of any duplicates, sort them **alphabetically by the manufacturer name**.

Since this data will be displayed on a dashboard viewed by business stakeholders, please format your results as follows:  
**`"$36 million"`**

If you like this question, try out **Pharmacy Analytics (Part 4)!**

---

### `pharmacy_sales` Table:

| Column Name    | Type      |
|----------------|-----------|
| product_id     | integer   |
| units_sold     | integer   |
| total_sales    | decimal   |
| cogs           | decimal   |
| manufacturer   | varchar   |
| drug           | varchar   |

---

### Example Input:

| product_id | units_sold | total_sales | cogs       | manufacturer | drug             |
|------------|------------|-------------|------------|--------------|------------------|
| 94         | 132362     | 2041758.41  | 1373721.70 | Biogen       | UP and UP        |
| 9          | 37410      | 293452.54   | 208876.01  | Eli Lilly    | Zyprexa          |
| 50         | 90484      | 2521023.73  | 2742445.90 | Eli Lilly    | Dermasorb        |
| 61         | 77023      | 500101.61   | 419174.97  | Biogen       | Varicose Relief  |
| 136        | 144814     | 1084258.00  | 1006447.73 | Biogen       | Burkhart         |

---

### Example Output:

| manufacturer | sale       |
|--------------|------------|
| Biogen       | $4 million |
| Eli Lilly    | $3 million |

---

### Explanation:

The total sales for **Biogen** is about **$4 million**:  
$2,041,758.41 + $500,101.61 + $1,084,258.00 = $3,626,118.02 → rounded to **$4 million**

The total sales for **Eli Lilly** is about **$3 million**:  
$293,452.54 + $2,521,023.73 = $2,814,476.27 → rounded to **$3 million**


In [4]:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *

# Initialize Spark session
spark = SparkSession.builder.master('local[1]').appName("PharmacyAnalyticsPart3").getOrCreate()

# Define the schema
schema = StructType([
    StructField("product_id", IntegerType(), True),
    StructField("units_sold", IntegerType(), True),
    StructField("total_sales", FloatType(), True),
    StructField("cogs", FloatType(), True),
    StructField("manufacturer", StringType(), True),
    StructField("drug", StringType(), True),
])

# Example input data
data = [
    (94, 132362, 2041758.41, 1373721.70, "Biogen", "UP and UP"),
    (9, 37410, 293452.54, 208876.01, "Eli Lilly", "Zyprexa"),
    (50, 90484, 2521023.73, 2742445.90, "Eli Lilly", "Dermasorb"),
    (61, 77023, 500101.61, 419174.97, "Biogen", "Varicose Relief"),
    (136, 144814, 1084258.00, 1006447.73, "Biogen", "Burkhart"),
]

# Create the DataFrame
pharmacy_sales_df = spark.createDataFrame(data, schema)

# Show the DataFrame
pharmacy_sales_df.show(truncate=False)


+----------+----------+-----------+----------+------------+---------------+
|product_id|units_sold|total_sales|cogs      |manufacturer|drug           |
+----------+----------+-----------+----------+------------+---------------+
|94        |132362    |2041758.4  |1373721.8 |Biogen      |UP and UP      |
|9         |37410     |293452.53  |208876.02 |Eli Lilly   |Zyprexa        |
|50        |90484     |2521023.8  |2742446.0 |Eli Lilly   |Dermasorb      |
|61        |77023     |500101.62  |419174.97 |Biogen      |Varicose Relief|
|136       |144814    |1084258.0  |1006447.75|Biogen      |Burkhart       |
+----------+----------+-----------+----------+------------+---------------+



In [25]:
pharmacy_sales_df.groupBy('manufacturer')\
    .agg(concat(lit('$'), round(sum('total_sales')/1000000).cast('int'), lit(' million')).alias('sale'))\
    .orderBy(sum('total_sales'),'manufacturer',ascending=[0,1]).show()

+------------+----------+
|manufacturer|      sale|
+------------+----------+
|      Biogen|$4 million|
|   Eli Lilly|$3 million|
+------------+----------+



In [24]:
pharmacy_sales_df.createOrReplaceTempView('pharmacy_sales')

spark.sql('''
SELECT
  manufacturer,CONCAT('$', cast(round(sum(total_sales)/1000000) as INT), ' million') as sale
FROM pharmacy_sales
GROUP BY manufacturer
ORDER BY sum(total_sales) DESC, manufacturer ASC;
''').show()

+------------+----------+
|manufacturer|      sale|
+------------+----------+
|      Biogen|$4 million|
|   Eli Lilly|$3 million|
+------------+----------+

