Azure Databricks is based on Apache Spark, and both use log4j as the standard library for logging. In addition to the default logging provided by Apache Spark, this pattern and practice sends logs and metrics to Azure Log Analytics. To achieve that, we need to deploy custom handlers for the logging events. While the Apache Spark logger messages are strings, Azure Log Analytics requires log messages to be formatted as JSON. The com.microsoft.pnp.log4j.LogAnalyticsAppender class transforms these messages to JSON.
Referenced architecture: https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/data/stream-processing-databricks
You require the Log Analytics workspace ID and primary key. The workspace ID is the workspaceId value from the Log Analytics resource in Azure. The primary key is the secret the resource specified in order to inteact with the service.
To configure log4j logging, open log4j.properties. Edit the following two values and save the file. We will use it later.
log4j.appender.A1.workspaceId=[Log Analytics workspace ID]
log4j.appender.A1.secret=[Log Analytics primary key]
To configure custom logging, open metrics.properties. Edit the following two values and save the file. We will use it later.
*.sink.loganalytics.workspaceId=[Log Analytics workspace ID]
*.sink.loganalytics.secret=[Log Analytics workspace ID]
databricks fs cp --overwrite azure-databricks-monitoring-0.9.jar dbfs:/azure-databricks-job/azure-databricks-monitoring-0.9.jar
Copy the custom logging properties from metrics.properties to the Databricks file system by entering the following command:
databricks fs cp --overwrite metrics.properties dbfs:/azure-databricks-job/metrics.properties
Copy the initialization script from spark.metrics to the Databricks file system by entering the following command:
databricks fs cp --overwrite spark-metrics.sh dbfs:/databricks/init/[cluster-name]/spark-metrics.sh