# 01 - Infrastructure Setup

**Prerequisites:** Before running this notebook, ensure you have completed the local setup:
1. Run `scripts/setup_azure_infrastructure.sh` (creates Azure resources)
2. Run `scripts/setup_databricks_secrets.sh` (configures Databricks secrets)

This notebook will:
- Install custom SFTP data source package
- Verify secrets and SSH keys are configured
- Test SFTP connections
- Save configuration to Unity Catalog

## 1. Install Custom SFTP Data Source Package

In [None]:
# Install dependencies from requirements.txt
%pip install -r /Workspace/Repos/<your-repo>/databricks-sftp-data-source/requirements.txt

# Install the custom SFTP package
%pip install -e /Workspace/Repos/<your-repo>/databricks-sftp-data-source

dbutils.library.restartPython()

## 2. Import Libraries

In [None]:
from pyspark.sql import SparkSession
from ingest import SFTPWriter, SFTPDataSource
import json

## 3. Verify Databricks Secrets

Before proceeding, ensure you've run the setup scripts on your local machine:
- `scripts/setup_azure_infrastructure.sh`
- `scripts/setup_databricks_secrets.sh`

These scripts create the secret scope and store all necessary credentials.

## 4. Verify SSH Key in DBFS

Verify that the SSH private key was uploaded by the setup script:

In [None]:
# Verify SSH key exists in DBFS
try:
    key_info = dbutils.fs.ls("/FileStore/ssh-keys/sftp_key")
    print("✓ SSH private key found in DBFS:")
    print(f"  Path: dbfs:/FileStore/ssh-keys/sftp_key")
    print(f"  Size: {key_info[0].size} bytes")
except:
    print("✗ SSH private key not found!")
    print("  Please run: scripts/setup_databricks_secrets.sh")

## 5. Configure SFTP Connection Parameters

In [None]:
# Source SFTP configuration
source_config = {
    "host": dbutils.secrets.get(scope="sftp-credentials", key="source-host"),
    "username": dbutils.secrets.get(scope="sftp-credentials", key="source-username"),
    "private_key_path": "/dbfs/FileStore/ssh-keys/sftp_key",
    "port": 22
}

# Target SFTP configuration
target_config = {
    "host": dbutils.secrets.get(scope="sftp-credentials", key="target-host"),
    "username": dbutils.secrets.get(scope="sftp-credentials", key="target-username"),
    "private_key_path": "/dbfs/FileStore/ssh-keys/sftp_key",
    "port": 22
}

print("SFTP configurations loaded")

## 6. Test Source SFTP Connection

In [None]:
# Test connection to source SFTP
source_writer = SFTPDataSource.create_writer(source_config)

with source_writer.session():
    files = source_writer.list_files(".")
    print("Source SFTP files:")
    for f in files:
        print(f"  - {f}")

## 7. Test Target SFTP Connection

In [None]:
# Test connection to target SFTP
target_writer = SFTPDataSource.create_writer(target_config)

with target_writer.session():
    files = target_writer.list_files(".")
    print("Target SFTP files:")
    for f in files:
        print(f"  - {f}")

## 8. Save Configuration to Catalog

In [None]:
# Create catalog and schema for configuration
spark.sql("CREATE CATALOG IF NOT EXISTS sftp_demo")
spark.sql("CREATE SCHEMA IF NOT EXISTS sftp_demo.config")

# Store configuration (without sensitive data)
config_data = [
    ("source_host", source_config["host"]),
    ("source_username", source_config["username"]),
    ("target_host", target_config["host"]),
    ("target_username", target_config["username"]),
    ("ssh_key_path", "/dbfs/FileStore/ssh-keys/sftp_key")
]

config_df = spark.createDataFrame(config_data, ["key", "value"])
config_df.write.mode("overwrite").saveAsTable("sftp_demo.config.connection_params")

print("Configuration saved to sftp_demo.config.connection_params")

## 9. Verify Configuration

In [None]:
# Display configuration
display(spark.table("sftp_demo.config.connection_params"))

## Summary

Infrastructure setup completed:
- ✓ Custom SFTP data source package installed
- ✓ Databricks secrets configured
- ✓ SSH keys uploaded to DBFS
- ✓ SFTP connections tested
- ✓ Configuration saved to Unity Catalog

Next step: Run notebook `02_uc_connection_setup.ipynb` to configure Unity Catalog connections