# Azure Cost Management Data Generation - PySpark Version

This notebook generates sample Azure cost management data using PySpark for distributed processing.

## Data Attributes
- **SubscriptionGuid**: Unique subscription identifier
- **ResourceGroup**: Azure resource group name
- **ResourceLocation**: Geographic location of the resource (90% East US, 10% South Central US)
- **UsageDateTime**: Timestamp of usage
- **MeterCategory**: Category of the meter (Compute, Storage, Network, etc.)
- **MeterSubCategory**: Sub-category of the meter
- **MeterId**: Unique meter identifier
- **MeterName**: Human-readable meter name
- **MeterRegion**: Region where the meter applies
- **UsageQuantity**: Amount of resource consumed
- **ResourceRate**: Rate per unit of resource
- **PreTaxCost**: Cost before taxes
- **ConsumedService**: Service that consumed the resource
- **ResourceType**: Type of Azure resource
- **InstanceId**: Unique instance identifier
- **Tags**: Key-value pairs for resource tagging
- **OfferId**: Azure offer identifier
- **AdditionalInfo**: Additional metadata
- **ServiceInfo1/2**: Service-specific information
- **ServiceName**: Name of the Azure service
- **ServiceTier**: Tier of the service (Basic, Standard, Premium)
- **Currency**: Currency code (USD only)
- **UnitOfMeasure**: Unit of measurement for the resource


In [None]:
# Import required libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window
import uuid
import json
from datetime import datetime, timedelta
import random
import numpy as np

# Initialize Spark session
spark = SparkSession.builder \
    .appName("AzureCostDataGeneration") \
    .config("spark.sql.adaptive.enabled", "true") \
    .config("spark.sql.adaptive.coalescePartitions.enabled", "true") \
    .getOrCreate()

# Set log level to reduce verbosity
spark.sparkContext.setLogLevel("WARN")

print("PySpark session initialized successfully!")
print(f"Spark version: {spark.version}")
print(f"Available cores: {spark.sparkContext.defaultParallelism}")
