# 05_Gold_Analytics_and_Insights

## Setup & Table References

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import sum as spark_sum, avg, count

spark = SparkSession.getActiveSession()

GOLD_TABLE = "cost_aware_capstone.risk_decisioning.gold_decision_recommendations"

---
## Overall Business Impact Summary

In [0]:
spark.sql(f"""
SELECT
    COUNT(*) AS total_cases,
    SUM(decision) AS investigated_cases,
    ROUND(SUM(expected_savings_if_investigated * decision), 2) AS total_expected_savings
FROM {GOLD_TABLE}
""").show()

---
## Average Value per Investigation

In [0]:
spark.sql(f"""
SELECT
    decision,
    ROUND(AVG(expected_savings_if_investigated), 2) AS avg_expected_savings
FROM {GOLD_TABLE}
GROUP BY decision
""").show()

---
## Risk vs Business Value

In [0]:
spark.sql(f"""
SELECT
    ROUND(risk_probability, 2) AS risk_bucket,
    ROUND(AVG(expected_savings_if_investigated), 2) AS avg_savings,
    COUNT(*) AS cases
FROM {GOLD_TABLE}
GROUP BY ROUND(risk_probability, 2)
ORDER BY risk_bucket DESC
LIMIT 10
""").show()

----
## Capacity Sensitivity Analysis

In [0]:
spark.sql(f"""
SELECT
    COUNT(*) AS total_cases,
    SUM(CASE WHEN expected_savings_if_investigated > 0 THEN 1 ELSE 0 END)
        AS cases_with_positive_savings
FROM {GOLD_TABLE}
""").show()

## Key Insights

* Most cases are low risk, but a small subset drives disproportionate financial loss

* Investigation capacity is the primary bottleneck, not prediction accuracy

* Cost-aware optimization ensures limited resources are spent where financial impact is highest

---
## Final Takeaway
This system demonstrates how machine learning predictions can be transformed into optimal, explainable business decisions under real-world constraints.























































































































----