# LLM Queries in DuckDB

This notebook walks through how to call LLMs directly as a UDF in a DuckDB database using [vLLM](https://github.com/vllm-project/vllm) as the inference engine.

## Initialize the LLM Engine

In [1]:
import llmsql
from llmsql.llm.vllm import vLLM
from vllm import EngineArgs

args = EngineArgs(model="TheBloke/Llama-2-13B-chat-GPTQ")

# Initialize llmsql
llmsql.init(vLLM(engine_args=args))


Starting vLLM engine...
INFO 04-10 18:22:10 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='TheBloke/Llama-2-13B-chat-GPTQ', tokenizer='TheBloke/Llama-2-13B-chat-GPTQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 04-10 18:22:11 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-10 18:22:11 selector.py:25] Using XFormers backend.
INFO 04-10 18:22:14 weight_utils.py:177] Using model weights format ['*.safetensors']
INFO 04-10 18:22:17 model_runner.py:104] Loading model weights took 6.8127 GB
INFO 04-10 18:22:22 gpu_executor.py:94] # GPU blocks: 433, # CPU blocks: 327
INFO 04-10 18:22:25 model_runner.py:791] Capturing the model for CUDA graphs. This may lead t

## Load the movies dataset as a DuckDB table

In [None]:
# Make sure you import duckdb from llmsql
from llmsql.duckdb import duckdb

# Create a table from the movies dataset
conn = duckdb.connect(database=':memory:', read_only=False)
conn.execute("CREATE TABLE movies AS SELECT * FROM read_csv('movies_small.csv')")

<duckdb.duckdb.DuckDBPyConnection at 0x70c7c578af30>

In [None]:
# View the table and fields in the table

print(conn.sql("SHOW TABLES"))

print(conn.sql("DESCRIBE movies"))

┌─────────┐
│  name   │
│ varchar │
├─────────┤
│ movies  │
└─────────┘

┌──────────────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│     column_name      │ column_type │  null   │   key   │ default │  extra  │
│       varchar        │   varchar   │ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ rotten_tomatoes_link │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ review_content       │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ movie_title          │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ movie_info           │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ id                   │ BIGINT      │ YES     │ NULL    │ NULL    │ NULL    │
└──────────────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘



## Run the LLM Queries

In [None]:
query = "SELECT LLM('Recommend movies for the user based on {movie_info} and {review_content}', movie_info, review_content) FROM movies LIMIT 1"

In [None]:
query_result = conn.execute(query).fetchall()

InvalidInputException: Invalid Input Error: Python exception occurred while executing the UDF: RuntimeError: CUDA out of memory. Tried to allocate 270.00 MiB. GPU 0 has a total capacty of 14.58 GiB of which 241.56 MiB is free. Including non-PyTorch memory, this process has 14.33 GiB memory in use. Of the allocated memory 12.21 GiB is allocated by PyTorch, and 677.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:1438 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x70c9040ba617 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x30f6c (0x70c904158f6c in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x3139e (0x70c90415939e in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x3175e (0x70c90415975e in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x16c1461 (0x70c88e3a9461 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x14 (0x70c88e3a1674 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, c10::optional<c10::Device>, c10::optional<c10::MemoryFormat>) + 0x111 (0x70c862256061 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #7: at::detail::empty_cuda(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x31 (0x70c862256331 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #8: at::native::empty_cuda(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x20 (0x70c8623833c0 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #9: <unknown function> + 0x2d403a9 (0x70c86416e3a9 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #10: <unknown function> + 0x2d4048b (0x70c86416e48b in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #11: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0xe7 (0x70c88f2cf277 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x295eaef (0x70c88f646aef in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1a3 (0x70c88f3133e3 in /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::empty(c10::ArrayRef<long>, c10::TensorOptions, c10::optional<c10::MemoryFormat>) + 0x23d (0x70c8138017cd in /home/ray/anaconda3/lib/python3.9/site-packages/vllm/_C.cpython-39-x86_64-linux-gnu.so)
frame #15: gptq_gemm(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, int) + 0x2dd (0x70c8137fd9bd in /home/ray/anaconda3/lib/python3.9/site-packages/vllm/_C.cpython-39-x86_64-linux-gnu.so)
frame #16: <unknown function> + 0x95a02 (0x70c813816a02 in /home/ray/anaconda3/lib/python3.9/site-packages/vllm/_C.cpython-39-x86_64-linux-gnu.so)
frame #17: <unknown function> + 0x91713 (0x70c813812713 in /home/ray/anaconda3/lib/python3.9/site-packages/vllm/_C.cpython-39-x86_64-linux-gnu.so)
frame #18: /home/ray/anaconda3/bin/python() [0x507387]
frame #19: _PyObject_MakeTpCall + 0x2ec (0x4f073c in /home/ray/anaconda3/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x5263 (0x4ecc93 in /home/ray/anaconda3/bin/python)
frame #21: /home/ray/anaconda3/bin/python() [0x4e6b2a]
frame #22: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/ray/anaconda3/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x685 (0x4e80b5 in /home/ray/anaconda3/bin/python)
frame #24: /home/ray/anaconda3/bin/python() [0x4f8123]
frame #25: /home/ray/anaconda3/bin/python() [0x505121]
frame #26: _PyEval_EvalFrameDefault + 0x3e14 (0x4eb844 in /home/ray/anaconda3/bin/python)
frame #27: /home/ray/anaconda3/bin/python() [0x4e6b2a]
frame #28: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/ray/anaconda3/bin/python)
frame #29: /home/ray/anaconda3/bin/python() [0x505121]
frame #30: _PyEval_EvalFrameDefault + 0x3e14 (0x4eb844 in /home/ray/anaconda3/bin/python)
frame #31: /home/ray/anaconda3/bin/python() [0x4e6b2a]
frame #32: _PyObject_FastCallDictTstate + 0x13e (0x4effae in /home/ray/anaconda3/bin/python)
frame #33: _PyObject_Call_Prepend + 0x66 (0x502d86 in /home/ray/anaconda3/bin/python)
frame #34: /home/ray/anaconda3/bin/python() [0x5cbd93]
frame #35: _PyObject_MakeTpCall + 0x2ec (0x4f073c in /home/ray/anaconda3/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x5263 (0x4ecc93 in /home/ray/anaconda3/bin/python)
frame #37: /home/ray/anaconda3/bin/python() [0x4f8123]
frame #38: /home/ray/anaconda3/bin/python() [0x505121]
frame #39: _PyEval_EvalFrameDefault + 0x3e14 (0x4eb844 in /home/ray/anaconda3/bin/python)
frame #40: /home/ray/anaconda3/bin/python() [0x4e6b2a]
frame #41: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/ray/anaconda3/bin/python)
frame #42: /home/ray/anaconda3/bin/python() [0x505121]
frame #43: _PyEval_EvalFrameDefault + 0x3e14 (0x4eb844 in /home/ray/anaconda3/bin/python)
frame #44: /home/ray/anaconda3/bin/python() [0x4e6b2a]
frame #45: _PyObject_FastCallDictTstate + 0x13e (0x4effae in /home/ray/anaconda3/bin/python)
frame #46: _PyObject_Call_Prepend + 0x66 (0x502d86 in /home/ray/anaconda3/bin/python)
frame #47: /home/ray/anaconda3/bin/python() [0x5cbd93]
frame #48: _PyObject_MakeTpCall + 0x2ec (0x4f073c in /home/ray/anaconda3/bin/python)
frame #49: _PyEval_EvalFrameDefault + 0x5263 (0x4ecc93 in /home/ray/anaconda3/bin/python)
frame #50: /home/ray/anaconda3/bin/python() [0x4f8123]
frame #51: /home/ray/anaconda3/bin/python() [0x505121]
frame #52: _PyEval_EvalFrameDefault + 0x3e14 (0x4eb844 in /home/ray/anaconda3/bin/python)
frame #53: /home/ray/anaconda3/bin/python() [0x4e6b2a]
frame #54: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/ray/anaconda3/bin/python)
frame #55: /home/ray/anaconda3/bin/python() [0x505121]
frame #56: _PyEval_EvalFrameDefault + 0x3e14 (0x4eb844 in /home/ray/anaconda3/bin/python)
frame #57: /home/ray/anaconda3/bin/python() [0x4e6b2a]
frame #58: _PyObject_FastCallDictTstate + 0x13e (0x4effae in /home/ray/anaconda3/bin/python)
frame #59: _PyObject_Call_Prepend + 0xe0 (0x502e00 in /home/ray/anaconda3/bin/python)
frame #60: /home/ray/anaconda3/bin/python() [0x5cbd93]
frame #61: _PyObject_MakeTpCall + 0x2ec (0x4f073c in /home/ray/anaconda3/bin/python)
frame #62: _PyEval_EvalFrameDefault + 0x4b5a (0x4ec58a in /home/ray/anaconda3/bin/python)
frame #63: /home/ray/anaconda3/bin/python() [0x4e6b2a]


At:
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/gptq.py(208): apply_weights
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py(215): forward
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/llama.py(75): forward
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/llama.py(223): forward
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/llama.py(271): forward
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/llama.py(345): forward
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner.py(663): execute_model
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py(115): decorate_context
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/worker/worker.py(221): execute_model
  /home/ray/anaconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py(115): decorate_context
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/executor/gpu_executor.py(114): execute_model
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/engine/llm_engine.py(676): step
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/entrypoints/llm.py(218): _run_engine
  /home/ray/anaconda3/lib/python3.9/site-packages/vllm/entrypoints/llm.py(190): generate
  /home/ray/default_cld_cv8egzp1tm3uvi738tt5bycjmm/LLM-SQL-Demo/llmsql/llm/vllm.py(51): execute
  /home/ray/default_cld_cv8egzp1tm3uvi738tt5bycjmm/LLM-SQL-Demo/llmsql/duckdb/__init__.py(14): llm_udf
