# Integrating MindBridge with Databricks: A Step-by-Step Guide
This tutorial will guide you through the process of integrating MindBridge with Databricks. We will walkthrough how to set up an Organization and Engagement using the MindBridge SDK, how to perform a General Ledger Analysis on data from Databricks, then how you can extract key information from your Analysis Results. Here's what we'll cover:

1. **Installing the MindBridge SDK in Databricks**  
   We'll start by installing the MindBridge SDK in your Databricks environment, enabling access to MindBridge's API features within your notebooks.

2. **Storing your MindBridge API Token in Databricks**  
   Learn to securely save your MindBridge API token in a Databricks-backed secret scope. This step ensures that sensitive credentials are safely managed within Databricks.

3. **Loading your MindBridge API Token in Databricks**  
   With the token stored, we'll show you how to load and use it to configure your API connection.

4. **Setting Up an Organization and Engagement**  
   Here, you'll learn how to configure an Organization and Engagement for your Analysis.

5. **Uploading Files to the File Manager**  
   How to upload files into the File Manager created for your Engagement

6. **Creating and Running Analyses**  
   In this section, you'll learn how to create a new analysis, link the necessary data from the File Manager, and execute the analysis.

7. **Getting the Analysis Results**  
   Finally, we'll retrieve and display the results of your analysis, providing insights into your data that you can further leverage within Databricks.

## Installing the MindBridge SDK in Databricks
The following commands can be run to install the [mindbridge-api-python-client](https://pypi.org/project/mindbridge-api-python-client/) to your currently selected cluster. Version 1.5.1 or newer is required for the steps in this tutorial.

In [0]:
%pip install --upgrade mindbridge-api-python-client
dbutils.library.restartPython()

## Storing Your MindBridge API Token in Databricks
If you do not have an API Token already you can follow [Create an API token](https://support.mindbridge.ai/hc/en-us/articles/9349943782039-Create-an-API-token) on our knowledge base. Once that is done, we'll [Create a Databricks-backed secret scope](https://learn.microsoft.com/en-us/azure/databricks/security/secrets/#create-a-databricks-backed-secret-scope) using the Databricks CLI to securely store your token. If you haven't set up the Databricks CLI yet, you can follow [Install or update the Databricks CLI](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/install).

After installing and configuring the CLI, use the following commands to create a new secret scope and add your API Token to the scope:

```sh
# Create the secret scope named "mindbridge-api-tutorials"
databricks secrets create-scope mindbridge-api-tutorials

# Add your MindBridge API Token to the scope
databricks secrets put-secret mindbridge-api-tutorials MINDBRIDGE_API_TOKEN
```

Once your token is added, you can verify its existence by running the following:
```sh
databricks secrets list-scopes
databricks secrets list-secrets mindbridge-api-tutorials
```

## Loading your MindBridge API Token in Databricks

In this section, we'll load the MindBridge API token securely stored in Databricks and configure the API connection. Replace the ```url``` with the url for your MindBridge instance. Upon execution, you should see details about the user associated with your token.

In [0]:
import mindbridgeapi as mbapi

# Load your token from the secret scope
token = dbutils.secrets.get(
    scope="mindbridge-api-tutorials", key="MINDBRIDGE_API_TOKEN"
)

# Create a connection to the server
server = mbapi.Server(url="yoursubdomain.mindbridge.ai", token=token)

# Get the current user
user = server.users.get_current()
print(user)

## Setting up an Organization and Engagement
In this section, we'll walk through setting up an Organization and an Engagement to house your Analysis. We'll set up the Engagement using the MindBridge for-profit library. If you already have an existing Organization and Engagement you would like to use, you can update the `organization_name` and `engagement_name` values to use your preferred entities. If your Engagement uses a different Library, you may need to update later steps so that your Analysis Type and Analysis Source Types are compatible with the Library you selected.

In [0]:
# Define details for organization, engagement, and library
organization_name = "MindBridge Databricks Integration"
engagement_name = "My Engagement"
library_name = "MindBridge for-profit"

# Step 1: Create the Organization, or get it if it already exists
try:
    organization_item = mbapi.OrganizationItem(
        name=organization_name,
        external_client_code="My Client ID",  # Optional
        manager_user_ids=[user.id],  # Optional
    )
    organization = server.organizations.create(organization_item)
    print(f"Created the Organization: '{organization_name}'")
except mbapi.exceptions.ValidationError:
    organization = next(server.organizations.get({"name": organization_name}))
    print(f"Organization '{organization_name}' already exists. Fetched it instead.")

# Step 2: Get the Library we want to use in the Engagement
system_libraries = server.libraries.get({"system": True})
mindbridge_for_profit_library = next(
    (x for x in system_libraries if x.name == library_name), None
)

# Step 3: Create the Engagement, or get it if it already exists
try:
    engagement_item = mbapi.EngagementItem(
        organization_id=organization.id,
        name=engagement_name,
        engagement_lead_id=user.id,
        library_id=mindbridge_for_profit_library.id,
    )
    engagement = server.engagements.create(engagement_item)
    print(f"Created the Engagement: '{engagement_name}'")
except mbapi.exceptions.ValidationError:
    engagement = next(
        server.engagements.get(
            {"organizationId": organization.id, "name": engagement_name}
        )
    )
    print(
        f"Engagement '{engagement_name}' already exists within the Organization. "
        "Fetched it instead."
    )

## Uploading Files to the File Manager
After an Engagement is created, a File Manager entity is automatically set up to store data used by Analyses within the Engagement. We will upload our example General Ledger file from the data folder to the File Manager.

In [0]:
from pathlib import Path

relative_path = "./data/GENERAL_LEDGER_JOURNAL.csv"
full_path = Path(relative_path).resolve()

gl_file_manager_item = mbapi.FileManagerItem(engagement_id=engagement.id)

# Upload General Ledger
gl_file_manager_file = server.file_manager.upload(
    input_item=gl_file_manager_item, input_file=full_path
)

# We can check to see if our files are in the File Manager
file_manager_generator = server.file_manager.get({"engagementId": engagement.id})
print("Here are the files in the File Manager for your Engagement")
for file_manager_entity in file_manager_generator:
    print(file_manager_entity.original_name)

## Creating and Running Analyses
With our Engagement set up and files uploaded, we're ready to create an Analysis. We'll import the General Ledger File Manager as an Analysis Source, mark the Account Mappings as verified then run the Analysis.

If you used a custom library instead of the MindBridge for-profit library used in *Setting up an Organization and Engagement*, you will need to upload a Chart of Accounts or use the Account Mappings endpoints to map the accounts within the General Ledger. In this example we will use the Verify Mappings endpoint to use the suggested Account Mappings created by MindBridge.

In [0]:
import time

# Create an analysis
analysis_item = mbapi.AnalysisItem(
    engagement_id=engagement.id,
    analysis_periods=[{"startDate": "2020-01-01", "endDate": "2021-01-01"}],
    analysis_type_id=mbapi.AnalysisTypeItem.GENERAL_LEDGER,
    currency_code="CAD",
    name="General Ledger Analysis",
)

analysis = server.analyses.create(analysis_item)

# Add the General Ledger from the File Manager as an Analysis Source
gl_analysis_source_item = mbapi.AnalysisSourceItem(
    engagement_id=engagement.id,
    analysis_id=analysis.id,
    file_manager_file_id=gl_file_manager_file.id,
    analysis_period_id=analysis.analysis_periods[0].id,
    analysis_source_type_id=mbapi.AnalysisSourceTypeItem.GENERAL_LEDGER_JOURNAL,
    target_workflow_state=mbapi.TargetWorkflowState.COMPLETED,
)

print("Creating Analysis Source")
gl_analysis_source = server.analysis_sources.create(gl_analysis_source_item)

print("Waiting for Analysis Source to be ready")
max_polls = 5 * 60
polls = 0
prev_state = ""
while polls < max_polls:
    polls += 1
    time.sleep(1)
    gl_analysis_source = server.analysis_sources.get_by_id(gl_analysis_source.id)
    if gl_analysis_source.workflow_state.value != prev_state:
        print(f"Current State: {gl_analysis_source.workflow_state.value}")
        prev_state = gl_analysis_source.workflow_state.value
    if (
        gl_analysis_source.workflow_state.value
        == gl_analysis_source_item.target_workflow_state.value
    ):
        print("Analysis Source is ready.")
        break

print("Setting Account Mappings to Verified")
engagement = server.engagements.verify_account_mappings(engagement)
analysis = server.analyses.wait_for_analysis_sources(analysis)

print("Running the Analysis")
analysis = server.analyses.run(analysis)
print("Analysis has been started")

analysis = server.analyses.wait_for_analysis(analysis)
print("Analysis is complete")

## Getting the Analysis Results

### Overview of Data Table information
First we will take a look at what Data Tables exist on this Analysis and how the data is structured. Below is some code to generate an overview of each Data Table in the Analysis, including the fields, data type and whether the field is searchable.

In [0]:
import pandas as pd

server.analyses.restart_data_tables(analysis)

# Loop through each data_table and display information
for data_table in analysis.data_tables:
    # Display general data_table information
    general_info = pd.DataFrame(
        {
            "Analysis ID": [data_table.analysis_id],
            "Data Table ID": [data_table.id],
            "Logical Name": [data_table.logical_name],
            "Type": [data_table.type],
        }
    )
    display(general_info)

    # Prepare column data for this data_table
    column_data = [
        {
            "Field": col.field,
            "Filter Only": col.filter_only,
            "Keyword Search": col.keyword_search,
            "Type": col.type.value,
        }
        for col in data_table.columns
    ]

    # Create DataFrame for column data and display it
    column_df = pd.DataFrame(column_data)
    display(column_df)

### Extracting Key Info from MindBridge 
Now let's extract specific data from our Data Tables. In this example we will fetch all of the transactions which had a Risk Score greater or equal to 30%. Then we will save the results in a CSV and load them into a pandas DataFrame.

In [0]:
from pathlib import Path

# Define the output file path where the results will be saved

output_relative_path = "./results/result.csv"
output_full_path = Path(output_relative_path).resolve()

print("Requesting Elevated Risk General Ledger Transactions")
# Restart data tables to reset generator
server.analyses.restart_data_tables(analysis)

# Select the transactions table
data_table = next(x for x in analysis.data_tables if x.logical_name == "gl_journal_tx")

# Define a query to extract transactions with a risk score greater than or equal to 30%
query = {"risk": {"$gte": 3000}}

# Export the filtered data
export_async_result = server.data_tables.export(data_table, query=query)
server.data_tables.wait_for_export(export_async_result)

# Define the output path for saving the CSV file
path_output = server.data_tables.download(
    export_async_result, output_file_path=output_full_path
)
print(f"Success! Saved to: {path_output}")

In [0]:
# Read the CSV file saved in the previous step in pandas

df = pd.read_csv(output_full_path)

display(df)