# Verify Polaris Setup

This notebook allows us to check if the Apache Polaris setup has been successful and we are able to work with the catalog e.g create namespaces, tables etc.,

## Imports

In [None]:
import os
import traceback
from pathlib import Path

import pyarrow as pa
from pyiceberg.catalog.rest import RestCatalog
from pyiceberg.exceptions import NamespaceAlreadyExistsError, TableAlreadyExistsError
from pyiceberg.types import StringType

## Retrieve Principal Credentials
As part of the catalog setup script, the Principal(`super_user`) credentials are stored in  `$PROJECT_HOME/work/principal.txt`, let us retrieve it for further operations.

In [None]:

principal_creds = Path(os.getcwd()).parent.joinpath("work", "principal.txt")
with open(principal_creds, "r") as file:
    realm, client_id, client_secret = file.readline().split(",")

## Define Variables
Let us define some variables for us across the notebook

In [None]:
namespace = "demo_db"
table_name = "fruits"
# IMPORTANT!!! /api/catalog or get the prefix from your OpenCatalog instance
CATALOG_URI = "http://localhost:18181/api/catalog"
catalog_name = "balloon-game"

## Working with Catalog
Let us retrieve the catalog `polardb` that we created earlier using the `catalog_setup.yml` script.

In [None]:
catalog = RestCatalog(
    name=catalog_name,
    **{
        "uri": CATALOG_URI,
        "credential": f"{client_id}:{client_secret}",
        "header.content-type": "application/vnd.api+json",
        "header.X-Iceberg-Access-Delegation": "vended-credentials",
        "header.Polaris-Realm": realm,
        "warehouse": catalog_name,
        "scope": "PRINCIPAL_ROLE:ALL",
    },
)

### Create Namespace
Create a new namespace named `demo_db`

In [None]:
try:
    catalog.create_namespace(namespace)
except NamespaceAlreadyExistsError:
    print(f"Namespace '{namespace}' already exists")
except Exception as e:
    print(e)

### Create Table
Create a table named `fruits` with two columns.

In [None]:
_schema = pa.schema(
    [
        pa.field("id", pa.int64(), nullable=False),
        pa.field("name", pa.string(), nullable=True),
    ]
)
try:
    new_tbl = catalog.create_table(
        identifier=f"{namespace}.{table_name}",
        schema=_schema,
    )
    print(new_tbl)
except TableAlreadyExistsError:
    print(f"Table '{table_name}' already exists")
except Exception as e:
    print(e)

### Load Table
Let us load the created table

In [None]:
try:
    table = catalog.load_table(f"{namespace}.{table_name}")
    df = table.scan().to_pandas()
    print(df.head())
except Exception as e:
    print(e)

### Insert Data
Insert some fruits data

In [None]:
try:
    data = pa.Table.from_pylist(
        [
            {"id": 1, "name": "mango"},
            {"id": 2, "name": "banana"},
            {"id": 3, "name": "orange"},
        ],
        schema=_schema,
    )
    table.append(data)
except Exception:
    print(traceback.format_exc())

### Query Data
Query the inserted data.

In [None]:
df = table.scan().to_pandas()
df.head(10)

### Schema Evolution
Let us now add a new column named `season` to the `fruits` table.

In [None]:

with table.update_schema() as update:
    update.add_column("season",StringType(),doc="Fruit Season")



Print the table to view its structure and other details. If you go back and query the data again.

In [None]:
print(table)

If you query the table again your query still works and in fact gets the new column with empty/null value.

In [None]:
df = table.scan().to_pandas()
df.head()

Let us insert season data using the modified new schema.

In [None]:
new_schema =  _schema.append( pa.field("season", pa.string(), nullable=True),)
# New data with season column
new_table = pa.Table.from_pylist([
    {"id": 1, "name": "mango","season": "summer"},
    {"id": 2, "name": "banana","season": "all"},
    {"id": 3, "name": "orange","season": "winter"},
   
],schema=new_schema)
table.overwrite(new_table)

Now querying again,will show the updated data

In [None]:
df = table.scan().to_pandas()
df.head()