# BrickByte - Confluence Example

This notebook demonstrates how to sync data from Atlassian Confluence to Databricks using BrickByte.

## Prerequisites
- Confluence Cloud account
- API Token (generate at https://id.atlassian.com/manage-profile/security/api-tokens)
- Databricks workspace with Unity Catalog


In [None]:
%run ./_setup


In [None]:
from brickbyte import BrickByte

bb = BrickByte(
    sources=["source-confluence"],
    destination="destination-databricks",
    destination_install="git+https://github.com/park-peter/brickbyte.git#subdirectory=integrations/destination-databricks-py"
)
bb.setup()


In [None]:
import airbyte as ab

FORCE_FULL_REFRESH = True
cache = bb.get_or_create_cache()

# Configure the Confluence source
# Documentation: https://docs.airbyte.com/integrations/sources/confluence
source = ab.get_source(
    "source-confluence",
    config={
        "domain_name": "",  # e.g., "your-company.atlassian.net"
        "email": "",        # Your Atlassian account email
        "api_token": "",    # Generate at https://id.atlassian.com/manage-profile/security/api-tokens
    },
    local_executable=bb.get_source_exec_path("source-confluence")
)
source.check()
source.select_all_streams()
print("Available streams:", source.get_available_streams())


In [None]:
# Configure the Databricks destination
destination = ab.get_destination(
    "destination-databricks",
    config={
        "server_hostname": "",  # e.g., "adb-xxx.azuredatabricks.net"
        "http_path": "",        # e.g., "/sql/1.0/warehouses/abc123"
        "token": "",            # Your Databricks PAT
        "catalog": "",          # Unity Catalog name
        "schema": "",           # Target schema
    },
    local_executable=bb.get_destination_exec_path()
)

write_result = destination.write(source, cache=cache, force_full_refresh=FORCE_FULL_REFRESH)
print("Sync completed!")


In [None]:
bb.cleanup()


In [None]:
# Cleanup virtual environments
bb.cleanup()


## Query Your Data

After the sync completes, you can query your Confluence data:

```sql
-- View synced pages
SELECT 
    _airbyte_ab_id,
    _airbyte_emitted_at,
    _airbyte_data:id AS page_id,
    _airbyte_data:title AS title,
    _airbyte_data:status AS status
FROM your_catalog.your_schema._airbyte_raw_pages
LIMIT 10;
```
