# Gold Marts: Enron Communication Analytics

## Purpose
Email communication patterns and text analytics

## Source
- Gold: `fact_enron_emails`

## Mart Tables
1. **mart_email_patterns** - Communication volume and patterns

**Author:** Kevin  
**Date:** Feb 9, 2026


In [0]:
from pyspark.sql.functions import col, count, avg, sum, current_timestamp, round as spark_round

storage_account_name = "stgolistmigration"
account_key = ""

spark.conf.set(
    f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
    account_key
)

def get_gold_path(table):
    return f"abfss://gold@{storage_account_name}.dfs.core.windows.net/{table}/"

print("✅ Config loaded")


✅ Config loaded


In [0]:
print("📖 Loading Gold tables...")

fact_emails = spark.read.format("delta").load(get_gold_path("fact_enron_emails"))
print(f"✅ fact_enron_emails: {fact_emails.count():,}")


📖 Loading Gold tables...
✅ fact_enron_emails: 51,522


In [0]:
print("📊 Building mart_email_patterns...")

mart_email = fact_emails \
    .groupBy("message_category") \
    .agg(
        count("*").alias("email_count"),
        avg("message_length").alias("avg_message_length"),
        sum("message_length").alias("total_characters"),
        count(col("is_long_email")).alias("long_email_count")
    ) \
    .withColumn("avg_message_length", spark_round(col("avg_message_length"), 0)) \
    .withColumn("pct_of_total", 
        spark_round((col("email_count") / fact_emails.count()) * 100, 2)
    ) \
    .withColumn("mart_created_at", current_timestamp()) \
    .orderBy(col("email_count").desc())

print(f"✅ Created: {mart_email.count()} email pattern categories")
mart_email.show(truncate=False)

# Write
mart_email.write.format("delta").mode("overwrite").save(get_gold_path("mart_email_patterns"))
print("💾 Saved to: mart_email_patterns")

print("\n🎉 Enron Communication Analytics Marts Complete!")


📊 Building mart_email_patterns...
✅ Created: 4 email pattern categories
+----------------+-----------+------------------+----------------+----------------+------------+--------------------------+
|message_category|email_count|avg_message_length|total_characters|long_email_count|pct_of_total|mart_created_at           |
+----------------+-----------+------------------+----------------+----------------+------------+--------------------------+
|Very Long       |18173      |5441.0            |98884471        |18173           |35.27       |2026-02-09 13:35:52.751862|
|Long            |17314      |1444.0            |25005410        |17314           |33.61       |2026-02-09 13:35:52.751862|
|Medium          |14286      |738.0             |10549265        |14286           |27.73       |2026-02-09 13:35:52.751862|
|Short           |1749       |470.0             |821702          |1749            |3.39        |2026-02-09 13:35:52.751862|
+----------------+-----------+------------------+-----------