### SQLAlchemy pre-configuration

In [None]:
!pip install snowflake-sqlalchemy ipython-sql

In [None]:
import re
from os.path import expanduser
from snowflake.sqlalchemy import URL

USER_PATH = expanduser("~")
# set CONNECTION_KEY to the same name defined in the local SnowSQL configuration file
CONNECTION_KEY = "sfcsupport-feature_training"

with open(f'{USER_PATH}/.snowsql/config') as f:
    connection = {}
    skip_mode = False
    connection_key = None
    for line in f:
        buffer = line.strip()
        if (buffer is not None) and (buffer != "") and (buffer[0] != "#"):
            if buffer[0] == "[":
                result = re.search(r"\[connections\.(.*)\]", buffer)
                if result is not None and result.group(1) is not None:
                    if connection_key is not None and not skip_mode:
                        connection[connection_key] = config
                    connection_key = result.group(1)
                    if connection_key != "example":
                        config = {}
                        skip_mode = False
                    else:
                        skip_mode = True
                else:
                    skip_mode = True
            elif not skip_mode:
                (key, value) = buffer.replace(" ", "").split("=")
                config[key] = value

def strip_quotes(v):
    if v is not None:
        return v.replace("'", "").replace("\"", "")
    else:
        return v

SNOW_LOCATOR = strip_quotes(connection[CONNECTION_KEY]["accountname"]) if "accountname" in connection[CONNECTION_KEY] else None
SNOW_USER = strip_quotes(connection[CONNECTION_KEY]["username"]) if "username" in connection[CONNECTION_KEY] else None
SNOW_PASSWD = strip_quotes(connection[CONNECTION_KEY]["password"]) if "password" in connection[CONNECTION_KEY] else None
SNOW_DB = strip_quotes(connection[CONNECTION_KEY]["database"]) if "database" in connection[CONNECTION_KEY] else None
SNOW_WAREHOUSE = strip_quotes(connection[CONNECTION_KEY]["warehouse"]) if "warehouse" in connection[CONNECTION_KEY] else None
SNOW_ROLE = strip_quotes(connection[CONNECTION_KEY]["role"]) if "role" in connection[CONNECTION_KEY] else None

if (SNOW_LOCATOR is not None) and (SNOW_USER is not None) and (SNOW_PASSWD is not None) and \
    (SNOW_DB is not None) and (SNOW_WAREHOUSE is not None) and (SNOW_ROLE is not None):
    %reload_ext sql
    %sql snowflake://{SNOW_USER}:{SNOW_PASSWD}@{SNOW_LOCATOR}/{SNOW_DB}?role={SNOW_ROLE}&warehouse={SNOW_WAREHOUSE}
else:
    raise Exception("One or more of the following connection parameters is not defined: accountname, " \
        "username, password, database, warehouse, role")

### SRR Resources

- Training
    - Engineering
        - [Iceberg Tables Training for Support](https://snowflakecomputing.atlassian.net/wiki/spaces/CustomerSupport/pages/3028713756/Iceberg+Tables+Training+for+Support)
    - Support
        - [Intro to Iceberg Tables - Key Concepts training](https://snowflake.zoom.us/rec/share/997cOWJYC6rBIuyzs2P02oD8f28M7dzWj0ygDG9NuIhw3uti2EUqq8_h1m9VUQ8l.dpBvyZ9p_YEQVkOx)
            - Passcode: <code>AAPQ&W5$</code>    
- Runbook(s)
    - [Iceberg Support Runbook](https://docs.google.com/document/d/18MjH6n3ypi4VEbs5_wa_Vpxgi0XtVWCLW4_EEe7DBLQ/edit#heading=h.1bu2wjpos1k9)
- Product documentation
    - [Iceberg tables](https://docs.snowflake.com/en/user-guide/tables-iceberg)

### What is Apache Iceberg?

[Apache Iceberg](https://iceberg.apache.org/docs/latest/) is an open-source table format specification developed for huge analytic datasets. 

### What are Iceberg Tables?

Iceberg tables are a new table type designed to support the [Apache Iceberg table specification](https://iceberg.apache.org/spec/) to represent a large collection of slowly-changing files on a distributed file system (AWS S3, Azure Blob, Google Cloud Storage) with performance close to that of native Snowflake tables.

The architecture of an Apache Iceberg table is defined as three distinct layers:
- [Iceberg Catalog](https://iceberg.apache.org/concepts/catalog/#iceberg-catalogs) - Used to manage a collection of tables</li>
- [Metadata](https://iceberg.apache.org/spec/#specification) - Used to manage table states in a catalog through a combination of metadata files (JSON), manifest lists (Avro), and manifest files (Avro)
- Data - Collection of files that represent the data for all tables in the catalog

<div>
<img src="https://iceberg.apache.org/img/iceberg-metadata.png" width="50%"/>
</div>


Snowflake supports Iceberg tables with externally managed catalogs (<strong>unmanaged</strong>) and natively managed catalogs (<strong>managed</strong>).

The following catalog types are supported for unmanaged iceberg tables:
- [AWS Glue data catalog](https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html)
- Object storage
    - Iceberg
    - Loose parquet (PrPr)

To create an unmanaged Iceberg table, you will need to configure:
- [EXTERNAL VOLUME](https://docs.snowflake.com/sql-reference/sql/create-external-volume) - Defines the location(s) for the external catalog, metadata, and data
- [CATALOG INTEGRATION](https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration) - Defines the details of the external catalog

### Exercise 1: Create an unmanaged AWS Iceberg table (type=OBJECT_STORE)
<u>Prerequisites</u>:
1. You must have a personal AWS S3 bucket in <code>us-west-2</code>
2. You must have an AWS IAM policy that allows read/write access to your personal AWS S3 bucket
3. You must have an AWS IAM role that is assigned the AWS IAM policy (2)

In [None]:
# Replace <MY_S3_BUCKET> with the name of your bucket in us-west2
MY_S3_BUCKET='<MY_S3_BUCKET>'
# Replace <MY_IAM_ROLE_ARN> with the name of your IAM role ARN
MY_IAM_ROLE_ARN='<MY_IAM_ROLE_ARN>'

#### Step 1: Create an AWS Iceberg table using AWS Athena
NOTE: Run the cell below to generate instructions

In [19]:
from IPython.display import Markdown as md

SOURCE_TABLE='kterada_db.kt_os_iceberg_tpcds_sf10tcl_web_sales'
TARGET_DB_NAME=SNOW_USER + '_db'
TARGET_TABLE_NAME='iceberg_tcpds_sf10tcl_web_sales_t'

here = """
<u>Instructions</u>:
1. Login to the AWS CE-Sandbox Console via [SnowBiz Okta](https://snowbiz.okta.com/)
2. Navigate to the [AWS Athena query editor](https://us-west-2.console.aws.amazon.com/athena/home?region=us-west-2#/query-editor) in <code>us-west-2</code>
3. Execute the following SQL to create a new database:
```
create database if not exists {username}_db;
```
4. Execute the following SQL to create an AWS Iceberg table:
```
create table if not exists {db_name}.{table_name} with (table_type='iceberg', location='s3://{my_s3_bucket}/iceberg_feature_lab/', is_external=false)
as select * from {source_table} limit 1000;
```
""".format(my_s3_bucket=MY_S3_BUCKET, username=SNOW_USER, db_name=TARGET_DB_NAME, table_name=TARGET_TABLE_NAME, source_table=SOURCE_TABLE)

md(here)


<u>Instructions</u>:
1. Login to the AWS CE-Sandbox Console via [SnowBiz Okta](https://snowbiz.okta.com/)
2. Navigate to the [AWS Athena query editor](https://us-west-2.console.aws.amazon.com/athena/home?region=us-west-2#/query-editor) in <code>us-west-2</code>
3. Execute the following SQL to create a new database:
```
create database if not exists kterada_db;
```
4. Execute the following SQL to create an AWS Iceberg table:
```
create table if not exists kterada_db.iceberg_tcpds_sf10tcl_web_sales_t with (table_type='iceberg', location='s3://<MY_S3_BUCKET>/iceberg_feature_lab/', is_external=false)
as select * from kterada_db.kt_os_iceberg_tpcds_sf10tcl_web_sales limit 1000;
```


#### Step 2: Create an external volume

References:
- Snowflake Documentation: [Configure an external volume for Iceberg tables](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume)

In [None]:
%%sql
use role support_rl;
create external volume if not exists {SNOW_USER}_ext_vol
    storage_locations = (
        (
            name = 's3_iceberg_unmanaged'
            storage_provider = 's3'
            storage_base_url = 's3://{TARGET_S3_BUCKET}/'
            storage_aws_role_arn = '{IAM_ROLE_ARN}'
            encryption = (type='aws_sse_s3')
        )
    )
;
desc external volume {SNOW_USER}_ext_vol;

NOTE: After creating the external volume, you must update trust relationships in your AWS role based on the `DESCRIBE` output

#### Step 3: Create a catalog integration

References:
- Snowflake Documentation: [Configure a catalog integration for Iceberg tables](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration)

##### OBJECT STORAGE (ICEBERG)

In [None]:
%%sql

use role support_rl;

create catalog integration if not exists {SNOW_USER}_os_iceberg_cat_int
    catalog_source = object_store
    table_format = iceberg
    enabled = true
;
desc integration {SNOW_USER}_os_iceberg_cat_int;

#### Step 4: Create the unmanaged Snowflake Iceberg table

References:
- Snowflake Documentation: [Create an Iceberg table](https://docs.snowflake.com/user-guide/tables-iceberg-create)

In [None]:
%%sql

use role support_rl;
create warehouse if not exists {SNOW_USER}_wh warehouse_size='xsmall';
create database if not exists {SNOW_USER}_db;
create schema if not exists {SNOW_USER}_db.iceberg;

##### OBJECT STORE (ICEBERG)

NOTE: Prior to execution, you will need to replace <MY_ICEBERG_TABLE_METADATA_FILENAME> in the command with the *.metadata.json file created in Step 1 by inspecting the `iceberg_feature_lab/tcpds_sf10tcl/web_sales/metadata/` path in the S3 bucket provided.

In [None]:
%%sql

use role support_rl;
create iceberg table if not exists {SNOW_USER}_db.iceberg.{SNOW_USER}_unmanaged_os_iceberg_web_sales_t
    external_volume = {SNOW_USER}_ext_vol
    catalog = {SNOW_USER}_os_iceberg_cat_int
    metadata_file_path = 'iceberg_feature_lab/tcpds_sf10tcl/web_sales/metadata/<MY_ICEBERG_TABLE_METADATA_FILENAME>'
;

In [None]:
%%sql

use role support_rl;
use warehouse {SNOW_USER}_wh;
select
    *
from {SNOW_USER}_db.iceberg.{SNOW_USER}_unmanaged_os_iceberg_web_sales_t
where true
;

#### Step 5: Update the AWS Iceberg table
NOTE: Run the cell below to generate instructions

In [21]:
from IPython.display import Markdown as md

here = """
<u>Instructions</u>:
1. In the AWS CE-Sandbox Console, navigate to the [AWS Athena query editor](https://us-west-2.console.aws.amazon.com/athena/home?region=us-west-2#/query-editor) in <code>us-west-2</code>
2. Execute the following SQL to insert additional records into the AWS Iceberg table:
```
insert into {db_name}.{table_name} (select * from {source_table} except select * from {db_name}.{table_name} limit 1000;
```
""".format(my_s3_bucket=MY_S3_BUCKET, username=SNOW_USER, db_name=TARGET_DB_NAME, table_name=TARGET_TABLE_NAME, source_table=SOURCE_TABLE)

md(here)


<u>Instructions</u>:
1. In the AWS CE-Sandbox Console, navigate to the [AWS Athena query editor](https://us-west-2.console.aws.amazon.com/athena/home?region=us-west-2#/query-editor) in <code>us-west-2</code>
2. Execute the following SQL to insert additional records into the AWS Iceberg table:
```
insert into kterada_db.iceberg_tcpds_sf10tcl_web_sales_t (select * from kterada_db.kt_os_iceberg_tpcds_sf10tcl_web_sales except select * from kterada_db.iceberg_tcpds_sf10tcl_web_sales_t limit 1000;
```


#### Step 6: Manually refresh the unmanaged Snowflake Iceberg table

NOTE: Prior to execution, you will need to replace <MY_ICEBERG_TABLE_METADATA_FILENAME> in the command with new the *.metadata.json file created in Step 5 by inspecting the `iceberg_feature_lab/tcpds_sf10tcl/web_sales/metadata/` path in the S3 bucket provided. (HINT: Look for the JSON file with the most recent timestamp)

References:
- Snowflake Documentation: [Manage an Iceberg table](https://docs.snowflake.com/user-guide/tables-iceberg-manage)

In [None]:
%%sql

alter iceberg table {SNOW_USER}_db.iceberg.{SNOW_USER}_unmanaged_os_iceberg_web_sales_t refresh 'iceberg_feature_lab/tcpds_sf10tcl/web_sales/metadata/<MY_ICEBERG_TABLE_METADATA_FILENAME>';