# Verify Polaris Setup

This notebook allows us to check if the Apache Polaris setup has been successful and we are able to work with the catalog e.g create namespaces, tables etc.,

## Imports

In [2]:
import os
import traceback
from pathlib import Path

import pyarrow as pa
from pyiceberg.catalog.rest import RestCatalog
from pyiceberg.exceptions import NamespaceAlreadyExistsError, TableAlreadyExistsError
from pyiceberg.types import StringType

## Retrieve Principal Credentials
As part of the catalog setup script, the Principal(`super_user`) credentials are stored in  `$PROJECT_HOME/work/principal.txt`, let us retrieve it for further operations.

In [3]:

principal_creds = Path(os.getcwd()).parent.joinpath("work", "principal.txt")
with open(principal_creds, "r") as file:
    realm, client_id, client_secret = file.readline().split(",")

## Define Variables
Let us define some variables for us across the notebook

In [4]:
# database 
namespace = "balloon_pops"
# IMPORTANT!!! /api/catalog or get the prefix from your OpenCatalog instance
CATALOG_URI = "http://localhost:18181/api/catalog"
catalog_name = "balloon-game"

## Working with Catalog
Let us retrieve the catalog `polardb` that we created earlier using the `catalog_setup.yml` script.

In [5]:
catalog = RestCatalog(
    name=catalog_name,
    **{
        "uri": CATALOG_URI,
        "credential": f"{client_id}:{client_secret}",
        "header.content-type": "application/vnd.api+json",
        "header.X-Iceberg-Access-Delegation": "vended-credentials",
        "header.Polaris-Realm": realm,
        "warehouse": catalog_name,
        "scope": "PRINCIPAL_ROLE:ALL",
    },
)



### Create Namespace
Create a new namespace named `demo_db`

In [6]:
try:
    catalog.create_namespace(namespace)
except NamespaceAlreadyExistsError:
    print(f"Namespace '{namespace}' already exists")
except Exception as e:
    print(e)

Namespace 'balloon_pops' already exists


### Load Table
Let us load the created table

In [7]:
try:
    table = catalog.load_table(f"{namespace}.leaderboard")
    print(table)
except Exception as e:
    print(e)

leaderboard(
  1: player: optional string,
  2: total_score: optional long,
  3: bonus_hits: optional long (Total number of bonus hits popped by the player)
),
partition by: [],
sort order: [],
snapshot: Operation.APPEND: id=2, parent_id=1, schema_id=2


In [8]:
df = table.scan().to_pandas()
print(df.head(10))

           player  total_score  bonus_hits
0  Bouncy Balloon         6190         NaN
1      Wild Cloud         3380         NaN
2   Lucky Phoenix         4520         NaN
3     Mighty Star         3100         NaN
4   Gentle Dragon         3595         NaN
5      Swift Star         2785         NaN
6   Cosmic Dragon         4275         NaN
7     Wild Spirit         4205         NaN
8      Lucky Star        13240         NaN
9  Bouncy Phoenix        10140         NaN


## Schema Evolution

An Optional example of how to do schema evolution with the leaderboard table. Table definition with two columns `player` and `total_score`, but as part of my analytics I thought to add the `bonus_hits`. Now the sink from Rising will not work as the query returns 3 columns where as the target table has two columns. 

**Solution** is to evolve the schema to accomodate :)

In [None]:
from pyiceberg.types import LongType # int64
with table.update_schema() as update:
    update.add_column("bonus_hits", LongType(), "Total number of bonus hits popped by the player")

Now scanning the table and loading again will result in  additional column but with Null values.

In [16]:
df = table.scan().to_pandas()
print(df.head(10))

           player  total_score  bonus_hits
0  Bouncy Balloon         6190         NaN
1      Wild Cloud         3380         NaN
2   Lucky Phoenix         4520         NaN
3     Mighty Star         3100         NaN
4   Gentle Dragon         3595         NaN
5      Swift Star         2785         NaN
6   Cosmic Dragon         4275         NaN
7     Wild Spirit         4205         NaN
8      Lucky Star        13240         NaN
9  Bouncy Phoenix        10140         NaN


Lets recreate the sink.

In [24]:
table.inspect.snapshots()

  "committed_at": datetime.utcfromtimestamp(snapshot.timestamp_ms / 1000.0),


pyarrow.Table
committed_at: timestamp[ms] not null
snapshot_id: int64 not null
parent_id: int64
operation: string
manifest_list: string not null
summary: map<string, string>
  child 0, entries: struct<key: string not null, value: string> not null
      child 0, key: string not null
      child 1, value: string
----
committed_at: [[2025-02-16 15:42:46.404,2025-02-16 16:28:13.910]]
snapshot_id: [[1,2]]
parent_id: [[null,1]]
operation: [["append","append"]]
manifest_list: [["s3://balloon-game/balloon_pops/leaderboard/metadata/snap-1-1-9fd29ce6-fa20-4dde-949c-7e8643fbdbf7.avro","s3://balloon-game/balloon_pops/leaderboard/metadata/snap-2-1-3b3b2a39-101b-4dd8-a609-2790c062d733.avro"]]
summary: [[keys:["added-delete-files","added-data-files","added-position-delete-files","total-delete-files","total-records",...,"added-records","added-position-deletes","added-equality-delete-files","total-position-deletes","total-files-size"]values:["0","4","0","0","18",...,"18","0","0","0","4822"],keys:["tota

In [10]:
try:
    table2 = catalog.load_table(f"{namespace}.realtime_scores")
    print(table2)
except Exception as e:
    print(e)

realtime_scores(
  1: player: optional string,
  2: total_score: optional long,
  3: window_start: optional timestamptz,
  4: window_end: optional timestamptz
),
partition by: [],
sort order: [],
snapshot: Operation.APPEND: id=1, schema_id=0


In [11]:
df = table2.scan().to_pandas()
print(df.head(10))

           player  total_score              window_start  \
0   Cosmic Dragon          125 2025-02-16 15:08:45+00:00   
1     Wild Spirit          120 2025-02-16 15:04:45+00:00   
2  Bouncy Balloon          120 2025-02-16 14:37:30+00:00   
3   Lucky Phoenix          150 2025-02-16 14:33:15+00:00   
4      Lucky Star          340 2025-02-16 14:35:15+00:00   
5      Lucky Wind           80 2025-02-16 14:36:00+00:00   
6   Lucky Balloon           70 2025-02-16 14:36:45+00:00   
7   Swift Balloon          220 2025-02-16 14:41:30+00:00   
8     Bouncy Star          185 2025-02-16 15:04:00+00:00   
9      Lucky Wind          110 2025-02-16 14:36:30+00:00   

                 window_end  
0 2025-02-16 15:09:00+00:00  
1 2025-02-16 15:05:00+00:00  
2 2025-02-16 14:37:45+00:00  
3 2025-02-16 14:33:30+00:00  
4 2025-02-16 14:35:30+00:00  
5 2025-02-16 14:36:15+00:00  
6 2025-02-16 14:37:00+00:00  
7 2025-02-16 14:41:45+00:00  
8 2025-02-16 15:04:15+00:00  
9 2025-02-16 14:36:45+00:00  
