#### Notebook setup

Run the next cell once to initialize the notebook parameters.

In [0]:
dbutils.widgets.text("target_catalog", "my-catalog", "UC Catalog")
dbutils.widgets.text("target_schema", "my-schema", "UC Schema")
dbutils.widgets.text("target_volume", "my-volume", "UC Volume")
dbutils.widgets.text("target_cluster", "my-cluster", "Cluster ID")

#### Verify running processes

Run the next cell to verify that the integration process is running and, if enabled, that the New Relic Infrastructure agent and Fluent Bit processes are running.

The result should be something like the following.

```text
Integration process running .............. YES
Infrastructure processes running ......... YES
Fluent Bit process running ............... YES
```

If the integration process is not running, enable the [startup logs](https://github.com/newrelic/newrelic-databricks-integration/blob/main/docs/troubleshooting.md#enabling-startup-logs-when-deployed-to-a-cluster) and inspect the startup logs for errors. If the startup logs are missing or contain no errors, [collect the cluster init script logs](https://github.com/newrelic/newrelic-databricks-integration/blob/main/docs/troubleshooting.md#collecting-cluster-init-script-logs) and inspect the logs for errors.

If the infrastructure process is enabled but not running or the Fluent Bit process is not running and infrastructure logs is enabled, [collect the cluster init script logs](https://github.com/newrelic/newrelic-databricks-integration/blob/main/docs/troubleshooting.md#collecting-cluster-init-script-logs) and inspect the logs for errors.

In [0]:
%sh

if [ "$(ps -ef | grep '/databricks/driver/newrelic/newrelic-databricks-integration' | grep -v grep | wc -l)" -eq 1 ]; then
  echo "Integration process running .............. YES"
else
  echo "Integration process running .............. NO"
fi

if [ "$(ps -ef | grep '/usr/bin/newrelic-infra' | grep -v grep | wc -l)" -eq 2 ]; then
  echo "Infrastructure processes running ......... YES"
else
  echo "Infrastructure processes running ......... NO"
fi

if [ "$(ps -ef | grep '/opt/fluent-bit/bin/fluent-bit' | grep -v grep | wc -l)" -eq 1 ]; then
  echo "Fluent Bit process running ............... YES"
else
  echo "Fluent Bit process running ............... NO"
fi

#### Verify connectivity

Run the next cell to verify that your cluster can access New Relic. To verify that this test is successful, first confirm that the last line of the output is the following:

```json
{"success":true, "uuid":"[RANDOM-UUID]"}
```

Second, use the [query builder](https://one.newrelic.com/data-exploration/query-builder) to run the following NRQL:

```sql
SELECT * FROM DatabricksTest
```

The result should be the following.

| Timestamp      | Message                  |
| -------------- | ------------------------ |
| [DATE], [TIME] | Hello, Databricks world! |

If the test is unsuccessful, work with your Databricks administrator to determine if proxy, firewall, or other network issues may be blocking access to New Relic. For details on New Relic endpoints, ports, IP ranges, and other networking requirements, please consult the [New Relic network traffic documentation](https://docs.newrelic.com/docs/new-relic-solutions/get-started/networks/).

In [0]:
%sh

echo '[{"eventType": "DatabricksTest", "message": "Hello, Databricks world!"}]' | \
  gzip | \
  curl -v \
    -H "Content-Type: application/json" \
    -H "Accept: application/json" \
    -H "Api-Key: $NEW_RELIC_LICENSE_KEY" \
    -H "Content-Encoding: gzip" \
    https://insights-collector.newrelic.com/v1/accounts/$NEW_RELIC_ACCOUNT_ID/events \
    --data-binary \
    @-

#### Copy Databricks Integration startup logs to a volume

Ensure that the [DBFS root](https://docs.databricks.com/aws/en/dbfs/#what-is-the-dbfs-root) is enabled for your account and workspace, and the Databricks Integration is configured to [copy startup logs](https://github.com/newrelic/newrelic-databricks-integration/blob/main/docs/troubleshooting.md#copying-startup-logs-to-dbfs). Then, use the notebook widget parameters to set the catalog, schema, and volume to the volume where the startup logs should be copied and run the next cell to copy the startup logs to the selected volume.

When the files were copied successfully, the message "Files copied!" will be displayed in the cell output. In this case, navigate to the volume where you copied the startup logs to and use the appropriate controls to view or download the log files.

If the files could not be copied, an error message will be displayed in the cell output. In this case, try [sending your startup logs to New Relic Logs](https://github.com/newrelic/newrelic-databricks-integration/blob/main/docs/troubleshooting.md#sending-startup-logs-to-new-relic-logs) instead.

**NOTE:** Copying startup logs requires that access to the [DBFS root](https://docs.databricks.com/aws/en/dbfs/#what-is-the-dbfs-root) is enabled for your account and workspace. If access to the DBFS root is not enabled, startup logs will not be copied to DBFS and running the cell below will fail.

In [0]:
catalog = dbutils.widgets.get("target_catalog")
schema = dbutils.widgets.get("target_schema")
volume = dbutils.widgets.get("target_volume")

dbutils.fs.cp("dbfs:/tmp/newrelic-databricks-integration.log", f"/Volumes/{catalog}/{schema}/{volume}")
dbutils.fs.cp("dbfs:/tmp/newrelic-databricks-integration-startup-checks.log", f"/Volumes/{catalog}/{schema}/{volume}")
dbutils.fs.cp("dbfs:/tmp/newrelic-databricks-start-integration.sh.stdout.log", f"/Volumes/{catalog}/{schema}/{volume}")
dbutils.fs.cp("dbfs:/tmp/newrelic-databricks-start-integration.sh.stderr.log", f"/Volumes/{catalog}/{schema}/{volume}")

print("Files copied!")

#### Find the latest init script directories

Ensure that [compute log delivery](https://docs.databricks.com/aws/en/compute/configure#compute-log-delivery) is configured to delivery cluster logs to a _Unity Catalog volume_ and restart the cluster if necessary. Then run this cell to locate the latest set of init script directories. Use the links generated in the cell output to navigate directly to the init script folders for the driver and worker nodes.

**NOTE:** The init script directories are listed in descending order based on the modification time of the init script logs in each directory. There should be multiple links in the output. The init script log output for each startup attempt consists of one directory for the driver node and a separate, per-node directory for every worker in the cluster.

In [0]:
from datetime import datetime
import time

catalog = dbutils.widgets.get("target_catalog")
schema = dbutils.widgets.get("target_schema")
volume = dbutils.widgets.get("target_volume")
cluster_id = dbutils.widgets.get("target_cluster")
hours_ago = 1

min_time = (int(time.time()) - (3600 * hours_ago)) * 1000
dir = f"/Volumes/{catalog}/{schema}/{volume}/{cluster_id}/init_scripts"
dirs = {}
 
for f in dbutils.fs.ls(dir):
    for f2 in dbutils.fs.ls(f"{dir}/{f.name}"):
        dirs[f.name] = f2.modificationTime

displayHTML(f"<h2>Latest Init Script Directories (since {hours_ago} hours ago)</h2>")

for d in sorted(dirs.keys(), key=lambda x: dirs[x], reverse=True):
    if dirs[d] > min_time:
        dt = datetime.fromtimestamp(dirs[d] / 1000.0)
        displayHTML(
            f"<h3>{dt.strftime('%Y-%m-%d %H:%M:%S %Z')}: <a href='/explore/data/volumes/{catalog}/{schema}/{volume}?volumePath={dir}/{d}'>{d}</a></h3>"
        )