# 00 - Prerequisites: Azure Storage and SFTP Setup

This notebook contains Azure CLI commands to set up the infrastructure prerequisites:
- Create Azure Storage accounts
- Enable SFTP on storage accounts
- Generate SSH keys
- Upload sample CSV files

**Note:** Run these commands in your local terminal or Azure Cloud Shell.

## 1. Set Variables

In [None]:
# Configuration variables
RESOURCE_GROUP = "rg-databricks-sftp-demo"
LOCATION = "eastus"
SOURCE_STORAGE = "sftpsourcestorage001"
TARGET_STORAGE = "sftptargetstorage001"
SOURCE_CONTAINER = "source-data"
TARGET_CONTAINER = "target-data"
SFTP_USER = "sftpuser"
SSH_KEY_PATH = "~/.ssh/sftp_key"

print(f"Resource Group: {RESOURCE_GROUP}")
print(f"Source Storage: {SOURCE_STORAGE}")
print(f"Target Storage: {TARGET_STORAGE}")

## 2. Generate SSH Key Pair

In [None]:
%%bash -s "$SSH_KEY_PATH"

# Generate SSH key pair (RSA 4096-bit)
ssh-keygen -t rsa -b 4096 -f $1 -N ""

echo "SSH key pair generated at: $1"
echo "Public key:"
cat $1.pub

## 3. Create Resource Group

In [None]:
%%bash -s "$RESOURCE_GROUP" "$LOCATION"

# Create resource group
az group create \
  --name $1 \
  --location $2

echo "Resource group created: $1"

## 4. Create Source Storage Account with SFTP

In [None]:
%%bash -s "$SOURCE_STORAGE" "$RESOURCE_GROUP" "$LOCATION"

# Create source storage account
az storage account create \
  --name $1 \
  --resource-group $2 \
  --location $3 \
  --sku Standard_LRS \
  --kind StorageV2 \
  --hierarchical-namespace true \
  --enable-sftp true

echo "Source storage account created: $1"

## 5. Create Target Storage Account with SFTP

In [None]:
%%bash -s "$TARGET_STORAGE" "$RESOURCE_GROUP" "$LOCATION"

# Create target storage account
az storage account create \
  --name $1 \
  --resource-group $2 \
  --location $3 \
  --sku Standard_LRS \
  --kind StorageV2 \
  --hierarchical-namespace true \
  --enable-sftp true

echo "Target storage account created: $1"

## 6. Create Containers

In [None]:
%%bash -s "$SOURCE_STORAGE" "$SOURCE_CONTAINER" "$TARGET_STORAGE" "$TARGET_CONTAINER"

# Create source container
az storage fs create \
  --name $2 \
  --account-name $1 \
  --auth-mode login

# Create target container
az storage fs create \
  --name $4 \
  --account-name $3 \
  --auth-mode login

echo "Containers created successfully"

## 7. Create SFTP Local User for Source Storage

In [None]:
%%bash -s "$SOURCE_STORAGE" "$SFTP_USER" "$SOURCE_CONTAINER" "$SSH_KEY_PATH"

# Read public key
PUBLIC_KEY=$(cat $4.pub)

# Create SFTP user for source storage
az storage account local-user create \
  --account-name $1 \
  --name $2 \
  --home-directory $3 \
  --permission-scope permissions=rwdlc service=blob resource-name=$3 \
  --ssh-authorized-key key="$PUBLIC_KEY"

echo "SFTP user created for source storage"

## 8. Create SFTP Local User for Target Storage

In [None]:
%%bash -s "$TARGET_STORAGE" "$SFTP_USER" "$TARGET_CONTAINER" "$SSH_KEY_PATH"

# Read public key
PUBLIC_KEY=$(cat $4.pub)

# Create SFTP user for target storage
az storage account local-user create \
  --account-name $1 \
  --name $2 \
  --home-directory $3 \
  --permission-scope permissions=rwdlc service=blob resource-name=$3 \
  --ssh-authorized-key key="$PUBLIC_KEY"

echo "SFTP user created for target storage"

## 9. Upload Sample CSV Files to Source Storage

In [None]:
%%bash -s "$SOURCE_STORAGE" "$SOURCE_CONTAINER"

# Upload customers.csv
az storage fs file upload \
  --file-system $2 \
  --account-name $1 \
  --source ../data/customers.csv \
  --path customers.csv \
  --auth-mode login

# Upload orders.csv
az storage fs file upload \
  --file-system $2 \
  --account-name $1 \
  --source ../data/orders.csv \
  --path orders.csv \
  --auth-mode login

echo "Sample CSV files uploaded successfully"

## 10. Get SFTP Connection Details

In [None]:
%%bash -s "$SOURCE_STORAGE" "$TARGET_STORAGE" "$SFTP_USER"

# Get source SFTP endpoint
SOURCE_ENDPOINT=$(az storage account show \
  --name $1 \
  --query 'primaryEndpoints.dfs' -o tsv | sed 's|https://||' | sed 's|/||')

# Get target SFTP endpoint
TARGET_ENDPOINT=$(az storage account show \
  --name $2 \
  --query 'primaryEndpoints.dfs' -o tsv | sed 's|https://||' | sed 's|/||')

echo "=== SFTP Connection Details ==="
echo ""
echo "Source SFTP:"
echo "  Host: $SOURCE_ENDPOINT"
echo "  Username: $1.$3"
echo "  Port: 22"
echo ""
echo "Target SFTP:"
echo "  Host: $TARGET_ENDPOINT"
echo "  Username: $2.$3"
echo "  Port: 22"
echo ""
echo "SSH Key: ~/.ssh/sftp_key"

## 11. Test SFTP Connection

In [None]:
%%bash -s "$SOURCE_STORAGE" "$SFTP_USER" "$SSH_KEY_PATH"

# Get SFTP endpoint
SOURCE_ENDPOINT=$(az storage account show \
  --name $1 \
  --query 'primaryEndpoints.dfs' -o tsv | sed 's|https://||' | sed 's|/||')

# Test connection and list files
sftp -i $3 -P 22 $1.$2@$SOURCE_ENDPOINT <<EOF
ls
bye
EOF

echo "SFTP connection test completed"

## Summary

Prerequisites completed:
- ✓ SSH key pair generated
- ✓ Resource group created
- ✓ Source and target storage accounts created with SFTP enabled
- ✓ Containers created
- ✓ SFTP users configured
- ✓ Sample CSV files uploaded

Next steps:
1. Run notebook `01_infrastructure_setup.ipynb` to configure Databricks infrastructure
2. Run notebook `02_uc_connection_setup.ipynb` to set up Unity Catalog connections