**About this notebook**

This notebook scans a Snowflake database schema for Iceberg tables, and creates corresponding [Iceberg shortcuts](https://learn.microsoft.com/fabric/onelake/onelake-iceberg-tables) in a Fabric Lakehouse.

**Before you run this notebook**

Using Microsoft Fabric, create a cloud connection for storage account (ADLS Gen2). The cloud connection must be pointing to the same ADLS Gen2 account which is used by Snowflake to host iceberg tables as an [external volume](https://docs.snowflake.com/user-guide/tables-iceberg). 

Once you have configured a connection, you can use its Connection ID (unique identifier) across various Microsoft Fabric experiences such as external shortcuts, Data Pipelines and Dataflow Gen2. In this notebook, you will use Connection ID to create external shortcuts. For more information on cloud connection and how to obtain Connection ID, refer to [this link](https://learn.microsoft.com/fabric/data-factory/connector-azure-data-lake-storage-gen2). You can also create a cloud connection using [Fabric REST API](https://learn.microsoft.com/fabric/data-factory/connector-rest).

**Input parameters**
- [Connection ID](https://learn.microsoft.com/fabric/data-factory/data-source-management#retrieve-a-data-source-connection-id) - You can retrieve this either via API or Fabric web interface. Connection ID is used to create shortcuts once the notebook generates list of iceberg tables.
- Storage account name - This is the storage account used by Snowflake database to store Apache Iceberg tables.
- Name of Snowflake database
- Name of Snowflake schema 
- Username and password to access Snowflake database. This is used to scan metadata and build a list of iceberg tables on Snowflake. Consider using credentials which have permission to scan metadata and generate this list. You do not need admin privileges for this.
- [Snowflake account identifier](https://docs.snowflake.com/user-guide/admin-account-identifier)


In [None]:
# Install Snowflake connector
!pip install -q snowflake-connector-python

In [None]:
# Import libraries
import snowflake.connector
import pandas as pd
import requests, json
from requests.adapters import HTTPAdapter, Retry
import time
from datetime import datetime
import notebookutils
import sempy.fabric as fabric


Use the cell below to set variable values.

In [None]:
# Replace with connection ID of ADLS 
connection_id = "<replace with connection id>"

# Storage account name on ADLS
storage_account = "<ADLS storage account name used with Snowflake to store iceberg tables>"

# Snowflake variables
# Name of Snowflake database
catalog = '<replace with Snowflake database name>'

# Name of schema on Snowflake database where iceberg tables are deployed
db = '<replace with Snowflake schema name>'


# Attention: This sample defines username and password inline. For production scenarios, please use Azure Key Vault (AKV) for secrets management.

# Username for Snowflake
user = "<replace with Snowflake user>"

# Password for Snowflake
password = "<replace with password for Snowflake user>"

# Snowflake account
account_snow = "<Snowflake account identifier>"

The cells below this section do not require any user inputs and should just run once variables have been configured in the cell above. You can modify the code (below) to enhance capabilities of this notebook.

In [None]:
# Fabric - workspace and lakehouse IDs 
# This is the workspace / lakehouse where iceberg shortcuts will be created
lakehouse_id = fabric.get_lakehouse_id()
workspace_id = fabric.get_notebook_workspace_id()

# Sync configuration
sync_config = {
    "batch_size": 10,
    "max_retries": 3,
    "retry_delay": 5,
    "shortcut_conflict_policy": "GenerateUniqueName",  # Changed from Abort
    "sync_metadata": True,
    "create_sync_log": True
}

In [None]:
# Table discovery and metadata extraction from Snowflake
def get_snowflake_iceberg_metadata():
    """Extract Iceberg table metadata from Snowflake - Unity Catalog style"""
    conn = snowflake.connector.connect(
        user = user ,
        account= account_snow, 
        password = password ,
        insecure_mode=True
    )
    
    snow = conn.cursor()
    snow.execute(f"USE {catalog}.{db}")
    
    df = snow.execute(f'''
    SELECT table_name 
    FROM INFORMATION_SCHEMA.TABLES 
    WHERE table_schema = '{db}' 
    AND IS_ICEBERG = 'YES'
''').fetch_pandas_all()

    # Process all tables in single query using UNION ALL
    table_list = "', '".join(df['TABLE_NAME'].tolist())
    results_df = snow.execute(f'''
        SELECT 
            catalog, schema_name,
            table_name as name,
            CASE 
                WHEN CONTAINS(info_json, '"metadataLocation"') THEN
                    REGEXP_REPLACE(
                        SPLIT_PART(
                            PARSE_JSON(info_json):metadataLocation::STRING, 
                            '/metadata/', 1
                        ), 
                        '^(abfss://|azure://)[^/]+\\.(blob\\.core\\.windows\\.net|dfs\\.core\\.windows\\.net)/', ''
                    ) || '/'
                ELSE ''
            END as location
        FROM (
            {" UNION ALL ".join([f"SELECT'{catalog}' as catalog,'{db}' as schema_name, '{tbl}' as table_name, SYSTEM$GET_ICEBERG_TABLE_INFORMATION('{tbl}') as info_json" for tbl in df['TABLE_NAME']])}
        )
        WHERE info_json IS NOT NULL
    ''').fetch_pandas_all()
    return results_df

In [None]:
# Use OneLake shortcut API to create shortcuts based on iceberg table list.

class FabricSyncManager:
    def __init__(self, workspace_id, lakehouse_id, storage_account, connection_id):
        self.workspace_id = workspace_id
        self.lakehouse_id = lakehouse_id
        self.storage_account = storage_account
        self.connection_id = connection_id
        self.api_base = "api.fabric.microsoft.com/v1"
        self.sync_log = []
        
    def get_auth_headers(self):
        return {
            "Authorization": "Bearer " + notebookutils.credentials.getToken("pbi"),
            "Content-Type": "application/json"
        }
    
    def invoke_api(self, method, uri, payload=None):
        """Enhanced API call with retry logic"""
        url = f"https://{self.api_base}/{uri}"
        
        session = requests.Session()
        retries = Retry(
            total=sync_config["max_retries"], 
            backoff_factor=sync_config["retry_delay"], 
            status_forcelist=[429, 502, 503, 504]
        )
        adapter = HTTPAdapter(max_retries=retries)
        session.mount('http://', adapter)
        session.mount('https://', adapter)
        
        try:
            response = session.request(
                method, url, 
                headers=self.get_auth_headers(), 
                json=payload, 
                timeout=240
            )
            
            return {
                'status_code': response.status_code,
                'response': response.text,
                'success': response.status_code < 400
            }
            
        except requests.RequestException as ex:
            return {
                'status_code': 0,
                'response': str(ex),
                'success': False
            }
    
    def check_existing_shortcuts(self):
        """Check existing shortcuts in lakehouse"""
        uri = f"workspaces/{self.workspace_id}/items/{self.lakehouse_id}/shortcuts"
        result = self.invoke_api("GET", uri)
        
        if result['success']:
            shortcuts_data = json.loads(result['response'])
            return {s['name']: s for s in shortcuts_data.get('value', [])}
        return {}
    
    def create_shortcut(self, table_info):
        """Create shortcut with enhanced metadata"""
        shortcut_name = table_info['NAME']
        
        payload = {
            "path": "Tables/dbo",
            "name": shortcut_name,
            "target": {
                "adlsGen2": {
                    "location": f"https://{self.storage_account}.dfs.core.windows.net/",
                    "subpath": table_info['LOCATION'],
                    "connectionId": self.connection_id
                }
            },
            "description": f"Snowflake Iceberg table: {table_info['CATALOG']}.{table_info['SCHEMA_NAME']}.{table_info['NAME']}"
        }
        
        uri = f"workspaces/{self.workspace_id}/items/{self.lakehouse_id}/shortcuts?shortcutConflictPolicy={sync_config['shortcut_conflict_policy']}"
        
        result = self.invoke_api("POST", uri, payload)
        
        # Log the sync operation
        log_entry = {
            'timestamp': datetime.now().isoformat(),
            'name': shortcut_name,
            'operation': 'create_shortcut',
            'status': 'success' if result['success'] else 'failed',
            'details': result
        }
        self.sync_log.append(log_entry)
        
        return result
    
    def sync_tables(self, tables_df):
        """Sync all tables with batch processing"""
        print(f"Starting sync of {len(tables_df)} tables...")
        
        # Check existing shortcuts
        existing_shortcuts = self.check_existing_shortcuts()
        print(f"Found {len(existing_shortcuts)} existing shortcuts")
        
        success_count = 0
        error_count = 0
        
        for index, row in tables_df.iterrows():
            name = row['NAME']
            
            print(f"Processing {index + 1}/{len(tables_df)}: {name}")
            
            if name in existing_shortcuts:
                print(f"  - Shortcut already exists, skipping...")
                continue
            
            if not row['LOCATION']:
                print(f"  - No data location found, skipping...")
                continue
                
            result = self.create_shortcut(row)
            
            if result['success']:
                print(f"  ✓ Successfully created shortcut")
                success_count += 1
            else:
                print(f"  ✗ Failed: {result['response']}")
                error_count += 1
            
            # Rate limiting
            time.sleep(1)
        
        print(f"\nSync completed: {success_count} success, {error_count} errors")
        return self.sync_log

Bringing it all together - this last cell calls various functions to create shortcuts in Microsoft Fabric.

In [None]:
# Run sync
print("=== Snowflake to Fabric Sync ===")
print(f"Catalog: {catalog}, Schema: {db}")
print(f"Target Lakehouse: {lakehouse_id}")
print()

# Get table metadata
print("1. Extracting Iceberg table metadata from Snowflake...")
tables_df = get_snowflake_iceberg_metadata()
print(f"Found {len(tables_df)} Iceberg tables")

if not tables_df.empty:
    display(tables_df[['CATALOG', 'SCHEMA_NAME', 'NAME', 'LOCATION']])
    
    # Initialize sync
    print("\n2. Initializing ...")
    sync_manager = FabricSyncManager(
        workspace_id, lakehouse_id, 
        storage_account, connection_id
    )
    
    # Execute sync
    print("\n3. Syncing tables to Fabric...")
    sync_log = sync_manager.sync_tables(tables_df)
    
    # Display sync results
    print("\n4. Sync Summary:")
    sync_df = pd.DataFrame(sync_log)
    if not sync_df.empty:
        display(sync_df[['timestamp', 'name', 'operation', 'status']])
        
        # Save sync log for audit
        sync_df.to_json(f"sync_log_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
        print("Sync log saved for audit purposes")
    
else:
    print("No Iceberg tables found to sync")

print("\n=== Sync Complete ===")