# Pharmacy Analytics (Part 2)

## CVS Health SQL Interview Question

### Question  
CVS Health is analyzing its pharmacy sales data, and how well different products are selling in the market. Each drug is exclusively manufactured by a single manufacturer.

Write a query to identify the manufacturers associated with the drugs that resulted in **losses** for CVS Health and calculate the total amount of losses incurred.

Output the manufacturer's name, the **number of drugs** associated with losses, and the **total losses in absolute value**.  
Display the results sorted in descending order with the highest losses displayed at the top.

If you like this question, try out **Pharmacy Analytics (Part 3)!**

---

### `pharmacy_sales` Table:

| Column Name    | Type      |
|----------------|-----------|
| product_id     | integer   |
| units_sold     | integer   |
| total_sales    | decimal   |
| cogs           | decimal   |
| manufacturer   | varchar   |
| drug           | varchar   |

---

### Example Input:

| product_id | units_sold | total_sales | cogs       | manufacturer | drug                          |
|------------|------------|-------------|------------|--------------|-------------------------------|
| 156        | 89514      | 3130097.00  | 3427421.73 | Biogen       | Acyclovir                     |
| 25         | 222331     | 2753546.00  | 2974975.36 | AbbVie       | Lamivudine and Zidovudine     |
| 50         | 90484      | 2521023.73  | 2742445.90 | Eli Lilly    | Dermasorb TA Complete Kit     |
| 98         | 110746     | 813188.82   | 140422.87  | Biogen       | Medi-Chord                    |

---

### Example Output:

| manufacturer | drug_count | total_loss |
|--------------|------------|------------|
| Biogen       | 1          | 297324.73  |
| AbbVie       | 1          | 221429.36  |
| Eli Lilly    | 1          | 221422.17  |

---

### Explanation:

The first three rows indicate that some drugs resulted in losses. Among these, **Biogen** had the highest losses, followed by **AbbVie** and **Eli Lilly**.  
However, the **Medi-Chord** drug manufactured by Biogen reported a **profit** and was excluded from the result.


In [5]:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
# Create Spark session
spark = SparkSession.builder.master('local[1]').appName("PharmacyAnalyticsPart2").getOrCreate()

# Define schema
schema = StructType([
    StructField("product_id", IntegerType(), True),
    StructField("units_sold", IntegerType(), True),
    StructField("total_sales", FloatType(), True),
    StructField("cogs", FloatType(), True),
    StructField("manufacturer", StringType(), True),
    StructField("drug", StringType(), True),
])

# Sample data
data = [
    (156, 89514, 3130097.00, 3427421.73, "Biogen", "Acyclovir"),
    (25, 222331, 2753546.00, 2974975.36, "AbbVie", "Lamivudine and Zidovudine"),
    (50, 90484, 2521023.73, 2742445.90, "Eli Lilly", "Dermasorb TA Complete Kit"),
    (98, 110746, 813188.82, 140422.87, "Biogen", "Medi-Chord"),
]

# Create DataFrame
pharmacy_sales_df = spark.createDataFrame(data, schema)

# Show DataFrame
pharmacy_sales_df.show(truncate=False)


+----------+----------+-----------+---------+------------+-------------------------+
|product_id|units_sold|total_sales|cogs     |manufacturer|drug                     |
+----------+----------+-----------+---------+------------+-------------------------+
|156       |89514     |3130097.0  |3427421.8|Biogen      |Acyclovir                |
|25        |222331    |2753546.0  |2974975.2|AbbVie      |Lamivudine and Zidovudine|
|50        |90484     |2521023.8  |2742446.0|Eli Lilly   |Dermasorb TA Complete Kit|
|98        |110746    |813188.8   |140422.88|Biogen      |Medi-Chord               |
+----------+----------+-----------+---------+------------+-------------------------+



In [29]:
pharmacy_sales_df.where('total_sales<cogs')\
    .groupBy('manufacturer')\
    .agg(sum(col('cogs')-col('total_sales')).alias('total_loss'))\
    .orderBy('total_loss', ascending=0).show()

+------------+----------+
|manufacturer|total_loss|
+------------+----------+
|      Biogen| 297324.75|
|      AbbVie| 221429.25|
|   Eli Lilly| 221422.25|
+------------+----------+



In [3]:
pharmacy_sales_df.createOrReplaceTempView('pharmacy_sales')

spark.sql('''
SELECT manufacturer,
      count(product_id) as drug_count,
      sum(cogs-total_sales) as total_loss
FROM pharmacy_sales
where total_sales<cogs
group by manufacturer
ORDER BY total_loss DESC
''').show()

+------------+----------+----------+
|manufacturer|drug_count|total_loss|
+------------+----------+----------+
|      Biogen|         1| 297324.75|
|      AbbVie|         1| 221429.25|
|   Eli Lilly|         1| 221422.25|
+------------+----------+----------+

