# PyIceberg 🐍 + Polars 🐻‍❄️ Guide
Hey, welcome! 

This guide gets you started with Polars and PyIceberg on Tabular.

### Installation:
- clone this repo (just do it, it takes 2 seconds)
- cd into this folder
- install these python packages: `pyiceberg pyarrow numpy polars`. If you use pipenv like I do (`pip install pipenv`), you can just run `pipenv install` and it'll handle everything

See, that wasn't so bad.

### Tabular Requirements:
- head over to app.tabular.io and log in (or signup if you don't already have an account)
- go to connections > security > service account and hit the big + button to create a new credential
- assign your service account credential to a role that has the correct access for what you want to do (if you don't know, `EVERYONE` is a pretty safe default)
- copy that credential!
- come back here and create a `.env` file in this directory (`guides/pyiceberg/.env`). Edit it to look like below and make sure to SAVE IT.
```
TABULAR_CREDENTIAL=t-asdf:1234
```
⬆️ replace `t-asdf:1234` with the tabular credential you just created. 

Good job! Now we're ready to get down to business 💪

### Starting Jupyter Lab:
- Seriously, make sure you save that env file. 
- pipenv users can just run `pipenv run jupyter lab` to fire up jupyter lab. pipenv will load up your credential for you and all will be well
- if this is scary, you can ignore the `.env` file and just paste your credential in plaintext directly in this notebook -- but you should feel bad about your craftsmanship.


*One last note* -- you definitely don't have the weather data that I have in your warehouse. It's just an example. Connect to whatever data you want to.

Happy building! 🧊

In [None]:
import os, time

from pyiceberg.catalog import load_catalog
import polars as pl

# You'll need a tabular credential. Member credential or service account will work fine
TABULAR_CREDENTIAL       = os.environ['TABULAR_CREDENTIAL']
TABULAR_TARGET_WAREHOUSE = 'enterprise_data_warehouse' # replace this with your tabular warehouse name
TABULAR_CATALOG_URI      = 'https://api.tabular.io/ws' # unless you're a single tenant user, you don't need to change this

catalog_properties = {
    'uri':        TABULAR_CATALOG_URI,
    'credential': TABULAR_CREDENTIAL,
    'warehouse':  TABULAR_TARGET_WAREHOUSE
}
catalog = load_catalog(**catalog_properties)

In [40]:
# load the weather data 🌞
tbl = catalog.load_table("batch_raw.serverless_weather_raw")
df = pl.scan_iceberg(tbl).unnest('main')

# Get the average temp by city over the last couple hours
few_hours_ago = int(time.time() - 2*60*60) # 2 hours ago
df_indy = pl.SQLContext(frame=df).execute(
    f"""
    select 
        name as city, 
        avg(temp) as avg_recent_temp_f

    from frame

    where dt > {few_hours_ago}

    group by name
    """
)

print('Polars 🐻‍❄️ average temp by city over the last few hours:')
df_indy.collect()

Polars 🐻‍❄️ average temp by city over the last few hours:


city,avg_recent_temp_f
str,f64
"""New York""",37.430455
"""London""",45.571818
"""Sydney""",71.984762
"""Indianapolis""",44.572273
"""San Jose""",59.684545
"""St Louis""",53.871364
"""Tokyo""",61.255714
"""Paris""",46.866087
"""Nashville""",55.564091
