# 01 - Infrastructure Setup

**Prerequisites:** Before running this notebook, ensure you have completed the local setup:
1. Run `scripts/setup_azure_infrastructure.sh` (creates Azure resources)
2. Run `scripts/setup_databricks_secrets.sh` (configures Databricks secrets)

This notebook will:
- Install custom SFTP data source package
- Verify secrets and SSH keys are configured
- Test SFTP connections
- Save configuration to Unity Catalog

## 1. Install Custom SFTP Data Source Package

**Important Instructions:**
1. Run the cell below to install the package
2. Wait for Python to restart (you'll see a message)

In [0]:
# Install dependencies from requirements.txt
%pip install -r ../requirements.txt
#%pip install -q -e ../
dbutils.library.restartPython()

## 2. Import Installed Packages

**Important:** Run this cell AFTER Python has restarted from the previous cell.

In [0]:
from src.ingest import SFTPWriter, SFTPDataSource
import tempfile
import os

## 3. Configure Catalog and Schema

Set the catalog and schema names for this demo. These can be customized via widgets.

In [0]:
# Create widgets for catalog and schema configuration
dbutils.widgets.text("catalog_name", "sftp_demo", "Catalog Name")
dbutils.widgets.text("schema_name", "default", "Schema Name")
dbutils.widgets.text("source_connection_name", "source_sftp_connection", "Source Connection Name")
dbutils.widgets.text("target_connection_name", "target_sftp_connection", "Target Connection Name")

# Get widget values
CATALOG_NAME = dbutils.widgets.get("catalog_name")
SCHEMA_NAME = dbutils.widgets.get("schema_name")
SOURCE_CONNECTION_NAME = dbutils.widgets.get("source_connection_name")
TARGET_CONNECTION_NAME = dbutils.widgets.get("target_connection_name")

print(f"Catalog: {CATALOG_NAME}")
print(f"Schema: {SCHEMA_NAME}")
print(f"Source Connection: {SOURCE_CONNECTION_NAME}")
print(f"Target Connection: {TARGET_CONNECTION_NAME}")

## 4. Verify Databricks Secrets

Before proceeding, ensure you've run the setup scripts on your local machine:
- `scripts/setup_azure_infrastructure.sh`
- `scripts/setup_databricks_secrets.sh`

These scripts create the secret scope and store all necessary credentials (host, username, SSH private key).

## 5. Verify SSH Private Key in Secrets

Verify that the SSH private key was stored by the setup script:

In [0]:
# Verify SSH private key exists in secrets
try:
    ssh_key = dbutils.secrets.get(scope="sftp-credentials", key="ssh-private-key")
    print("✓ SSH private key found in secrets")
    print(f"  Scope: sftp-credentials")
    print(f"  Key: ssh-private-key")
    print(f"  Length: {len(ssh_key)} characters")
except Exception as e:
    print("✗ SSH private key not found!")
    print(f"  Error: {e}")
    print("  Please run: ./scripts/setup_databricks_secrets.sh")
    raise e

## 6. Configure SFTP Connection Parameters

In [0]:
# Get SSH private key from secrets and write to temporary file
ssh_key_content = dbutils.secrets.get(scope="sftp-credentials", key="ssh-private-key")
tmp_key_file = tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='_sftp_key')
tmp_key_file.write(ssh_key_content)
tmp_key_file.close()
os.chmod(tmp_key_file.name, 0o600)

# Source SFTP configuration
source_config = {
    "host": dbutils.secrets.get(scope="sftp-credentials", key="source-host"),
    "username": dbutils.secrets.get(scope="sftp-credentials", key="source-username"),
    "private_key_path": tmp_key_file.name,
    "port": 22
}

# Target SFTP configuration
target_config = {
    "host": dbutils.secrets.get(scope="sftp-credentials", key="target-host"),
    "username": dbutils.secrets.get(scope="sftp-credentials", key="target-username"),
    "private_key_path": tmp_key_file.name,
    "port": 22
}

print("SFTP configurations loaded from secrets")

## 7. Test Source SFTP Connection

In [0]:
# Test connection to source SFTP
source_writer = SFTPDataSource.create_writer(source_config)

with source_writer.session():
    files = source_writer.list_files(".")
    print("Source SFTP files:")
    for f in files:
        print(f"  - {f}")

## 8. Test Target SFTP Connection

In [0]:
# Test connection to target SFTP
target_writer = SFTPDataSource.create_writer(target_config)

with target_writer.session():
    files = target_writer.list_files(".")
    print("Target SFTP files:")
    for f in files:
        print(f"  - {f}")

## 9. Save Configuration to Catalog

In [0]:
# Create catalog and schema for configuration
spark.sql(f"CREATE CATALOG IF NOT EXISTS {CATALOG_NAME}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG_NAME}.config")

# Get SSH key fingerprint from secrets
ssh_key_fingerprint = dbutils.secrets.get(scope="sftp-credentials", key="ssh-key-fingerprint")

# Store configuration (without sensitive data - credentials are in secrets)
config_data = [
    ("catalog_name", CATALOG_NAME),
    ("schema_name", SCHEMA_NAME),
    ("source_connection_name", SOURCE_CONNECTION_NAME),
    ("target_connection_name", TARGET_CONNECTION_NAME),
    ("source_host", source_config["host"]),
    ("source_username", source_config["username"]),
    ("target_host", target_config["host"]),
    ("target_username", target_config["username"]),
    ("secret_scope", "sftp-credentials"),
    ("ssh_key_secret", "ssh-private-key"),
    ("ssh_key_fingerprint", ssh_key_fingerprint)
]

config_df = spark.createDataFrame(config_data, ["key", "value"])
config_df.write.mode("overwrite").saveAsTable(f"{CATALOG_NAME}.config.connection_params")

print(f"Configuration saved to {CATALOG_NAME}.config.connection_params")
print("Note: Sensitive credentials are stored in Databricks secrets, not in this table")

## 10. Verify Configuration

In [0]:
# Display configuration
display(spark.table(f"{CATALOG_NAME}.config.connection_params"))

## Summary

Infrastructure setup completed:
- ✓ Custom SFTP data source package installed
- ✓ Databricks secrets configured (host, username, SSH private key)
- ✓ SSH private key retrieved from secrets
- ✓ SFTP connections tested
- ✓ Configuration saved to Unity Catalog

Next step: Run notebook `02_uc_connection_setup.ipynb` to configure Unity Catalog connections