# Integrating MindBridge with Databricks: A Step-by-Step Guide
This tutorial will guide you through the process of integrating MindBridge with Databricks. We will walkthrough how to set up an Organization and Engagement using the MindBridge SDK, how to perform a General Ledger Analysis on data from Databricks, then how you can extract key information from your Analysis Results. Here's what we'll cover:

1. **Installing the MindBridge SDK in Databricks**  
   We'll start by installing the MindBridge SDK in your Databricks environment, enabling access to MindBridge's API features within your notebooks.

2. **Storing your MindBridge API Token in Databricks**  
   Learn to securely save your MindBridge API token in a Databricks-backed secret scope. This step ensures that sensitive credentials are safely managed within Databricks.

3. **Loading your MindBridge API Token in Databricks**  
   With the token stored, we'll show you how to load and use it to configure your API connection.

4. **Setting Up an Organization and Engagement**  
   Here, you'll learn how to configure an Organization and Engagement for your Analysis.

5. **Uploading Files to the File Manager**  
   How to upload files into the File Manager created for your Engagement

6. **Creating and Running Analyses**  
   In this section, you'll learn how to create a new analysis, link the necessary data from the File Manager, and execute the analysis.

7. **Getting the Analysis Results**  
   Finally, we'll retrieve and display the results of your analysis, providing insights into your data that you can further leverage within Databricks.

## Installing the MindBridge SDK in Databricks
The following commands can be run to install the [mindbridge-api-python-client](https://pypi.org/project/mindbridge-api-python-client/) to your currently selected cluster. Version 1.5.1 or newer is required for the steps in this tutorial.

In [0]:
%pip install --upgrade mindbridge-api-python-client pydantic<2.12
dbutils.library.restartPython()

In [0]:
from datetime import date

analysis_url = "https://psus.mindbridge.ai/app/organization/68f7dc86efccc75265245b0e/engagement/68f7dc87efccc75265245b10/analysis/68f7dc9eefccc75265245bf4/analyze/financial-statements?productCode=GENERAL_LEDGER#facetBarState=JTdCJTIydmlzaWJsZV9mYWNldF9pZHMlMjIlM0ElNUIlNUQlMkMlMjJmYWNldF9zdGF0ZXMlMjIlM0ElNUIlNUQlN0Q="
data_table_query = {
    "effective_date": {"$gte": date(2022, 5, 1), "$lt": date(2022, 6, 1)},
    "risk": {"$gte": 3_000},
}

assigned_user_email = "kevin.paulson@mindbridge.ai"
data_table_logical_name = "gl_journal_lines"

## Storing Your MindBridge API Token in Databricks
If you do not have an API Token already you can follow [Create an API token](https://support.mindbridge.ai/hc/en-us/articles/9349943782039-Create-an-API-token) on our knowledge base. Once that is done, we'll [Create a Databricks-backed secret scope](https://learn.microsoft.com/en-us/azure/databricks/security/secrets/#create-a-databricks-backed-secret-scope) using the Databricks CLI to securely store your token. If you haven't set up the Databricks CLI yet, you can follow [Install or update the Databricks CLI](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/install).

After installing and configuring the CLI, use the following commands to create a new secret scope and add your API Token to the scope:

```sh
# Create the secret scope named "mindbridge-api-tutorials"
databricks secrets create-scope mindbridge-api-tutorials

# Add your MindBridge API Token to the scope
databricks secrets put-secret mindbridge-api-tutorials MINDBRIDGE_API_TOKEN
```

Once your token is added, you can verify its existence by running the following:
```sh
databricks secrets list-scopes
databricks secrets list-secrets mindbridge-api-tutorials
```

In [0]:
from urllib3.util import parse_url

parsed_url = parse_url(analysis_url)
mindbridge_url = parsed_url.host

parsed_url_path = parsed_url.path.split("/")
analysis_result_id = parsed_url_path[parsed_url_path.index("analysis") + 1]

## Loading your MindBridge API Token in Databricks

In this section, we'll load the MindBridge API token securely stored in Databricks and configure the API connection. Replace the ```url``` with the url for your MindBridge instance. Upon execution, you should see details about the user associated with your token.

In [0]:
import mindbridgeapi as mbapi

# Load your token from the secret scope
token = dbutils.secrets.get(
    scope="mindbridge-api-tutorials", key="MINDBRIDGE_API_TOKEN"
)

# Create a connection to the server
server = mbapi.Server(url=mindbridge_url, token=token)

# Get the analysis
analysis_result = server.analysis_results.get_by_id(analysis_result_id)
analysis = server.analyses.get_by_id(analysis_result.analysis_id)

print("Available data tables:")
for x in analysis.data_tables:
    print(f"- logical_name: {x.logical_name} (type: {x.type}, id: {x.id})")

server.analyses.restart_data_tables(analysis)
data_table = next(
    x for x in analysis.data_tables if x.logical_name == data_table_logical_name
)
print(
    f"Using logical_name: {data_table.logical_name} (type: {data_table.type}, id: {data_table.id})"
)

user, *others = server.users.get(json={"email": assigned_user_email})
if not user:
    raise ValueError(f"User with email {assigned_user_email} not found")

if others:
    raise Exception(
        f"Unexpected error: multiple users found with email {assigned_user_email}"
    )

In [0]:
from pathlib import Path
from tempfile import NamedTemporaryFile
import csv

with NamedTemporaryFile(delete=False) as temp_file:
    temp_file_path = Path(temp_file.name)

print(f"Exporting to: {temp_file_path}")
async_result = server.data_tables.export(
    data_table,
    fields=["rowid", "txid", "risk", "effective_date"],
    query=data_table_query,
)
server.data_tables.wait_for_export(async_result)
temp_file_path = server.data_tables.download(
    async_result, output_file_path=temp_file_path
)

with temp_file_path.open(newline="", encoding="utf_8") as infile:
    reader = csv.DictReader(infile)
    print("Creating Tasks for:")
    for row in reader:
        row_id = row["rowid"]
        transaction_id = row["txid"]
        print(
            f"- row_id: {row_id}, transaction_id: {transaction_id}, risk: {row['risk']}, effective_date: {row['effective_date']}"
        )
        task = mbapi.TaskItem(
            row_id=row_id,
            transaction_id=transaction_id,
            type=mbapi.TaskType.ENTRY,
            status=mbapi.TaskStatus.OPEN,
            engagement_id=analysis.engagement_id,
            analysis_result_id=analysis_result.id,
            audit_areas=["Audit Area 1"],
            assigned_id=user.id,
        )
        task = server.tasks.create(task)

print("Done.")
temp_file_path.unlink()