# PyIceberg üêç + Polars üêª‚Äç‚ùÑÔ∏è Guide
Hey, welcome! 

This guide gets you started with Polars and PyIceberg on Tabular.

### Installation:
- clone this repo (just do it, it takes 2 seconds)
- cd into this folder
- install these python packages: `pyiceberg pyarrow numpy polars`. If you use pipenv like I do (`pip install pipenv`), you can just run `pipenv install` and it'll handle everything

See, that wasn't so bad.

### Tabular Requirements:
- head over to app.tabular.io and log in (or signup if you don't already have an account)
- go to connections > security > service account and hit the big + button to create a new credential
- assign your service account credential to a role that has the correct access for what you want to do (if you don't know, `EVERYONE` is a pretty safe default)
- copy that credential!
- come back here and create a `.env` file in this directory (`guides/pyiceberg/.env`). Edit it to look like below and make sure to SAVE IT.
```
TABULAR_CREDENTIAL=t-asdf:1234
```
‚¨ÜÔ∏è replace `t-asdf:1234` with the tabular credential you just created. 

Good job! Now we're ready to get down to business üí™

### Starting Jupyter Lab:
- Seriously, make sure you save that env file. 
- pipenv users can just run `pipenv run jupyter lab` to fire up jupyter lab. pipenv will load up your credential for you and all will be well
- if this is scary, you can ignore the `.env` file and just paste your credential in plaintext directly in this notebook -- but you should feel bad about your craftsmanship.


*One last note* -- you definitely don't have the weather data that I have in your warehouse. It's just an example. Connect to whatever data you want to.

Happy building! üßä

In [5]:
import os, time

from pyiceberg.catalog import load_catalog
import polars as pl

# You'll need a tabular credential. Member credential or service account will work fine
TABULAR_CREDENTIAL       = os.environ['TABULAR_CREDENTIAL']
TABULAR_TARGET_WAREHOUSE = 'enterprise_data_warehouse' # replace this with your tabular warehouse name
TABULAR_CATALOG_URI      = 'https://api.tabular.io/ws' # unless you're a single tenant user, you don't need to change this

catalog_properties = {
    'uri':        TABULAR_CATALOG_URI,
    'credential': TABULAR_CREDENTIAL,
    'warehouse':  TABULAR_TARGET_WAREHOUSE
}
catalog = load_catalog(**catalog_properties)

In [29]:
# load the weather data üåû
tbl = catalog.load_table("batch_raw.serverless_weather_raw")
df = pl.scan_iceberg(tbl).unnest('main')

# Get the average temp by city over the last couple hours
few_hours_ago = int(time.time() - 2*60*60) # 2 hours ago
df_indy = pl.SQLContext(frame=df).execute(
    f"""
    select 
        name as city, 
        avg(temp) as avg_recent_temp_f,
        max(dt) as data_last_loaded_at

    from frame

    where dt > {few_hours_ago}

    group by name

    order by data_last_loaded_at desc
    """
)

df_indy_with_time = df_indy.with_columns([
    pl.col("data_last_loaded_at").cast(pl.Int64) * 1000000
]).with_columns([
    pl.col(
        "data_last_loaded_at"
    ).cast(
        pl.Datetime
    ).dt.convert_time_zone(
        'America/New_York'
    ).dt.strftime("%Y-%m-%d %I:%M:%S")
])


print('Polars üêª‚Äç‚ùÑÔ∏è average temp by city over the last few hours:')
df_indy_with_time.collect().glimpse

Polars üêª‚Äç‚ùÑÔ∏è average temp by city over the last few hours:


<bound method DataFrame.glimpse of shape: (9, 3)
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ city         ‚îÜ avg_recent_temp_f ‚îÜ data_last_loaded_at ‚îÇ
‚îÇ ---          ‚îÜ ---               ‚îÜ ---                 ‚îÇ
‚îÇ str          ‚îÜ f64               ‚îÜ str                 ‚îÇ
‚ïû‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï™‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï™‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï°
‚îÇ Tokyo        ‚îÜ 53.383636         ‚îÜ 2024-02-20 12:56:33 ‚îÇ
‚îÇ New York     ‚îÜ 34.834091         ‚îÜ 2024-02-20 12:56:18 ‚îÇ
‚îÇ Paris        ‚îÜ 49.161364         ‚îÜ 2024-02-20 12:54:46 ‚îÇ
‚îÇ St Louis     ‚îÜ 54.140952         ‚îÜ 2024-02-20 12:51:49 ‚îÇ
‚îÇ Nashville    ‚îÜ 56.29             ‚îÜ 2024-02-20 12:51:32 ‚îÇ
‚îÇ San Jose     ‚îÜ 54.96             ‚îÜ 2024-02-20 12:51:15 ‚î