<div style="text-align: right;">
  <img src="https://raw.githubusercontent.com/exasol/ai-lab/refs/heads/main/assets/Exasol_Logo_2025_Dark.svg" style="width:200px; margin: 10px;" />
</div>

## GPU Resource Considerations

In this tutorial we will learn about how to manage access to GPU's from UDF's.

## Open Secure Configuration Storage


In [1]:
import os
%run ../utils/access_store_ui.ipynb
display(get_access_store_ui('../'))

Output()

Box(children=(Box(children=(Label(value='Configuration Store', layout=Layout(border_bottom='solid 1px', borderâ€¦

### Instantiate ScriptLanguagesContainer

The following cell creates an instance of class `ScriptLanguageContainer` from the notebook-connector,
which enables us to use the Script Language Container (SLC).

In [2]:
from exasol.nb_connector.slc import ScriptLanguageContainer
slc = ScriptLanguageContainer(secrets=ai_lab_config, name="gpu_slc")

### Import Python modules

In [211]:
from exasol.nb_connector.connections import open_pyexasol_connection
import textwrap
import pandas as pd

### Check for Database compatibility

This notebooks works only correct on an Exasol Database version 2025.2.0 or later.

In [208]:
from packaging.version import Version
with open_pyexasol_connection(ai_lab_config, compression=True, schema=ai_lab_config.db_schema) as conn:
    res = conn.execute("SELECT PARAM_VALUE FROM EXA_METADATA WHERE PARAM_NAME='databaseProductVersion';").fetchall();
dbVersion = Version(res[0][0])
if dbVersion < Version("2025.2.0"):
    popup_message(f"This tutorial will not work correctly with the used database: The used queries here will not show the correct result.")

### How GPU resources are managed

In each select/sub-select statement, only a single UDF call is allowed to exclusively use all the available GPUs. The resource reservation by the GPU resource management happens per UDF call and individually for each UDF call.

When multiple queries that each contain a single GPU-accelerated UDF are executed concurrently, the executions (UDF calls) are serialized. Each UDF call will wait to execute until an exclusive resource usage is made possible by accelerator resources being freed up by other UDF calls.

This means that simultaneous execution of multiple GPU-accelerated UDF calls as part of a single select/sub-select statement is not possible. All UDF calls except one will in this case either fail or fall back to CPU usage, depending on the configuration.

To simulate two simultaneous SQL queries, we create first a long running UDF:

In [209]:
# TEMPORARYYYYYYYYYYYYYYYYYYYYY
alter_session = textwrap.dedent("ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON3=builtin_python3 R=builtin_r JAVA=builtin_java CUSTOM_SLC_GPU_SLC=localzmq+protobuf:///bfsdefault/default/container/template-Exasol-8-python-3.10-cuda-conda-release-custom_slc_gpu_slc/?lang=python#buckets/bfsdefault/default/container/template-Exasol-8-python-3.10-cuda-conda-release-custom_slc_gpu_slc/exaudf/exaudfclient'")

In [215]:
sql = f"""
CREATE OR REPLACE {slc.language_alias} SCALAR SCRIPT
long_running_gpu(id INTEGER)
EMITS (ID INT, START_TIME VARCHAR(1000), END_TIME VARCHAR(1000), NVIDIA_VISIBLE_DEVICES VARCHAR(10000)) AS
 %perInstanceRequiredAcceleratorDevices GpuNvidia|None;

import time

def timestamp():
    now = datetime.datetime.now().time()
    return str(now)
    
def run(ctx):
    start_timestamp = timestamp() 
    time.sleep(5)
    nvidia_vis_devices = os.getenv('NVIDIA_VISIBLE_DEVICES', "<invalid>")
    end_timestamp = timestamp()
    ctx.emit(ctx.id, start_timestamp, end_timestamp,  nvidia_vis_devices)
/
"""

with open_pyexasol_connection_with_lang_definitions(ai_lab_config, compression=True, schema=ai_lab_config.db_schema) as connection:
    connection.execute(alter_session)
    connection.execute(sql)

Now we execute this UDF in 2 different pyexasol connections, which run in parallel:

In [216]:
import multiprocessing

def run_long_running_udf(id):
    with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as local_conn:
        local_conn.execute(alter_session)
        sql = f"SELECT long_running_gpu({id});"
        result = local_conn.execute(sql)
        df = pd.DataFrame(result.fetchall(), columns=result.column_names())
        return df

with multiprocessing.Pool(processes=2) as pool:
    results = pool.map(run_long_running_udf, [1, 2])

res = pd.concat(results, ignore_index=True)
res.sort_values(by='START_TIME')


Unnamed: 0,ID,START_TIME,END_TIME,NVIDIA_VISIBLE_DEVICES
1,2,15:21:21.524198,15:21:26.549229,all
0,1,15:21:26.770926,15:21:31.795908,all


As you can see in the result above, the UDFs were executed sequentially.

### Multiple GPU accelerated UDF calls in single query 

Using more than one GPU accelerated UDF call in a single query will either raise an error or fall back to CPU usage.

Let's define a UDF which requires a GPU and then use the UDF twice in the same query.

In [219]:
sql = textwrap.dedent(f"""
CREATE OR REPLACE {slc.language_alias} SCALAR SCRIPT
gpu_required()
RETURNS VARCHAR(10000) AS
 %perInstanceRequiredAcceleratorDevices GpuNvidia;

def run(ctx):
    return os.getenv('NVIDIA_VISIBLE_DEVICES', "<invalid>")
/
""")
with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(alter_session)
    conn.execute(sql)

try:
    with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
        conn.execute(alter_session)
        stmt = conn.execute("SELECT gpu_required(), gpu_required();")
except Exception as e:
    print(e)



(
    message     =>  Accelerator device request failed for a user defined function (requested types: GpuNvidia). Impossible Accelerator Device Request: Insufficient accelerator resources to execute all user defined functions concurrently. (Session: 1849953905434492928)
    dsn         =>  ec2-54-246-49-57.eu-west-1.compute.amazonaws.com:8563
    user        =>  sys
    schema      =>  AI_LAB
    session_id  =>  1849953905434492928
    code        =>  22069
    query       =>  SELECT gpu_required(), gpu_required()
)



The same problem can happen with only one UDF which requires an GPU, and other UDFs with CPU fallback, as execution order is unpredictable.

In [221]:
sql = textwrap.dedent(f"""
CREATE OR REPLACE {slc.language_alias} SCALAR SCRIPT
gpu_optional()
RETURNS VARCHAR(10000) AS
 %perInstanceRequiredAcceleratorDevices GpuNvidia|None;

def run(ctx):
    return os.getenv('NVIDIA_VISIBLE_DEVICES', "<invalid>")
/
""")

with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(alter_session)
    conn.execute(sql)

try:
    with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
        conn.execute(alter_session)
        stmt = conn.execute("SELECT gpu_required(), gpu_required();")
except Exception as e:
    print(e)


(
    message     =>  Accelerator device request failed for a user defined function (requested types: GpuNvidia). Impossible Accelerator Device Request: Insufficient accelerator resources to execute all user defined functions concurrently. (Session: 1849954128010149888)
    dsn         =>  ec2-54-246-49-57.eu-west-1.compute.amazonaws.com:8563
    user        =>  sys
    schema      =>  AI_LAB
    session_id  =>  1849954128010149888
    code        =>  22069
    query       =>  SELECT gpu_required(), gpu_required()
)



If your query contains only UDFs that support CPU fallback, one of the UDFs will be assigned the GPU accelerator. However, it is not possible to predict which one.

In [222]:
with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(alter_session)
    res = conn.export_to_pandas("SELECT gpu_optional() as udf1, gpu_optional() as udf2, gpu_optional() as udf3, gpu_optional() as udf4;")
res

Unnamed: 0,UDF1,UDF2,UDF3,UDF4
0,all,<invalid>,<invalid>,<invalid>


### GPU usage across UDF instances

If there are multiple UDF instances of a single UDF call within the same query, all those instances will have access to all (GPU) accelerators. To reduce the risk of running out of GPU memory, we recommend that you use UDF instance limiting to control how many accelerators each instance can use, and that you configure the instance limit based on the number of accelerators and the use case.

#### Scalar UDFs
For Scalar UDFs there is no mechanism to assign UDF to a certain GPU. Hence, we recommend to use Scalar UDF only to read some GPU device information, but do not any processing on the GPUs. A good example is to execute the `nvidia-smi` tool in a scalar UDF.

In [231]:
sql = textwrap.dedent(f"""
CREATE OR REPLACE {slc.language_alias} SCALAR SCRIPT
gpu_nvidia_smi()
RETURNS VARCHAR(10000) AS
 %perInstanceRequiredAcceleratorDevices GpuNvidia;

import subprocess

def run(ctx):
    cmd = ["nvidia-smi"] #List GPU's
    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        text=True,
    )
    exit_code = process.wait()
    
    if exit_code != 0:
        raise Exception(f"nvidia-smi returned non-zero exit code: '{{process.stderr}}'")
    return process.stdout.read()

/
""")
with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(alter_session)
    conn.execute(sql)

with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(alter_session)
    res = conn.execute("SELECT gpu_nvidia_smi();").fetchall()
print(res[0][0])

Thu Nov 27 16:00:25 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
| N/A   27C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

#### Set UDFs

For set UDF it is possible to generate a group per GPU and pass this `GPU id` to the UDF as parameter.

For this purpose we create a table with some European cities:

In [232]:
with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(textwrap.dedent(f"""
    CREATE TABLE IF NOT EXISTS cities (
        id INTEGER,
        city_name VARCHAR(1000),
        country VARCHAR(1000),
        population INTEGER
    );
    """))
    conn.execute(textwrap.dedent(f"""
    INSERT INTO cities (id, city_name, country, population) VALUES
        (1, 'Berlin', 'Germany', 3645000),
        (2, 'Munich', 'Germany', 1600000),
        (3, 'Frankfurt', 'Germany', 800000),
        (4, 'Paris', 'France', 2148000),
        (5, 'Toulouse', 'France', 500000),
        (6, 'Marseille', 'France', 900000),
        (7, 'Madrid', 'Spain', 3223000),
        (8, 'Barcelona', 'Spain', 1700000),
        (9, 'Valencia', 'Spain', 800000);
    """))


Now we need the number of available GPUs. This can be achieved by reading the `EXA_METADATA` parameter `acceleratorDeviceGpuNvidiaCount`.

In [233]:
with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    nAvailableGPU = conn.execute("SELECT PARAM_VALUE FROM EXA_METADATA WHERE PARAM_NAME='acceleratorDeviceGpuNvidiaCount'").fetchall()[0][0]
nAvailableGPU

'1'

In the next step we create UDF with the option `perNodeAndCallInstanceLimit` set to the number of available GPUs. Thus, we avoid that two different UDF instances access the same GPU at the same time.

In [237]:
sql = textwrap.dedent(f"""
CREATE OR REPLACE {slc.language_alias} SET SCRIPT
gpu_country_population(population INTEGER, country VARCHAR(1000), gpu_id INTEGER)
EMITS (country VARCHAR(1000), population_sum INTEGER) AS
%perNodeAndCallInstanceLimit {nAvailableGPU};
%perInstanceRequiredAcceleratorDevices GpuNvidia;
import torch

def run(ctx):
    country = None
    populations = list()
    device = torch.device(f'cuda:{{gpu_id}}')

    while(ctx.next()):
         if country != ctx.country:
             if populations:
                tensor = torch.tensor(populations, dtype=torch.int32)
                tensor_gpu = tensor.to(device)
                ctx.emit(country, torch.sum(tensor_gpu, dim=0).item())
             country = ctx.country
             populations = list()
         populations.append(ctx.population)
/
""")

with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(alter_session)
    conn.execute(sql)


Now we write a simple agreggation query which will create 3 UDF instances, one per country in the table.

In [240]:
sql = textwrap.dedent(f"""
WITH COUNTRIES AS
(
SELECT DISTINCT(country) FROM CITIES
),
COUNTRY_IDS AS
(
SELECT ROW_NUMBER() OVER (ORDER BY country) AS COUNTRY_ID, country FROM COUNTRIES
),
GPU_IDS AS
(
SELECT COUNTRY_ID MOD 1 as GPU_ID, COUNTRY FROM COUNTRY_IDS 
),
ORDERED_CITIES AS
(
SELECT * FROM CITIES ORDER BY COUNTRY
)
SELECT gpu_country_population(ORDERED_CITIES.POPULATION, ORDERED_CITIES.COUNTRY, GPU_IDS.GPU_ID) FROM GPU_IDS, ORDERED_CITIES WHERE ORDERED_CITIES.COUNTRY = GPU_IDS.COUNTRY GROUP BY GPU_ID, iproc();
""")

with open_pyexasol_connection_with_lang_definitions(ai_lab_config, schema=ai_lab_config.db_schema, compression=True) as conn:
    conn.execute(alter_session)
    conn.execute(sql)


ExaQueryError: 
(
    message     =>  VM error: F-UDF-CL-LIB-1127: F-UDF-CL-SL-PYTHON-1002: F-UDF-CL-SL-PYTHON-1026: ExaUDFError: F-UDF-CL-SL-PYTHON-1114: Exception during run 
GPU_COUNTRY_POPULATION:8 run
NameError: name 'gpu_id' is not defined
 (Session: 1849960754452889600)
    dsn         =>  ec2-54-246-49-57.eu-west-1.compute.amazonaws.com:8563
    user        =>  sys
    schema      =>  AI_LAB
    session_id  =>  1849960754452889600
    code        =>  22002
    query       =>  WITH COUNTRIES AS
(
SELECT DISTINCT(country) FROM CITIES
),
COUNTRY_IDS AS
(
SELECT ROW_NUMBER() OVER (ORDER BY country) AS COUNTRY_ID, country FROM COUNTRIES
),
GPU_IDS AS
(
SELECT COUNTRY_ID MOD 1 as GPU_ID, COUNTRY FROM COUNTRY_IDS 
),
ORDERED_CITIES AS
(
SELECT * FROM CITIES ORDER BY COUNTRY
)
SELECT gpu_country_population(ORDERED_CITIES.POPULATION, ORDERED_CITIES.COUNTRY, GPU_IDS.GPU_ID) FROM GPU_IDS, ORDERED_CITIES WHERE ORDERED_CITIES.COUNTRY = GPU_IDS.COUNTRY GROUP BY GPU_ID, iproc()
)


The following is a more generic approach, with which you can assign UDF instances to GPU's, on multiple nodes.

In [179]:
sql = textwrap.dedent(f""" 
WITH CITY_TABLE_WITH_ID AS
(
SELECT rowid FROM CITY_TABLE_WITH_ID
)
SELECT gpu_city(id, city_name, population, country) FROM CITY_TABLE_WITH_ID
GROUP BY iproc(), id
""")

res = conn.export_to_pandas(sql)
res

Unnamed: 0,"GPU_CITY(CITY_TABLE_WITH_ID.ID,CITY_TABLE_WITH_ID.CITY_NAME,CITY_TABLE_WITH_ID.POPULATION,CITY_TABLE_WITH_ID.COUNTRY)"
0,0
