# LLM Queries in DuckDB

This notebook walks through how to call LLMs directly as a UDF in a DuckDB database using [vLLM](https://github.com/vllm-project/vllm) as the inference engine.

## Initialize the LLM Engine

In [1]:
import llmsql
from llmsql.llm.vllm import vLLM
from vllm import EngineArgs

args = EngineArgs(model="TheBloke/Llama-2-13B-chat-GPTQ")

# Initialize llmsql
llmsql.init(vLLM(engine_args=args))


Starting vLLM engine...
INFO 04-11 12:31:13 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='TheBloke/Llama-2-13B-chat-GPTQ', tokenizer='TheBloke/Llama-2-13B-chat-GPTQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 04-11 12:31:14 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-11 12:31:14 selector.py:25] Using XFormers backend.
INFO 04-11 12:31:17 weight_utils.py:177] Using model weights format ['*.safetensors']
INFO 04-11 12:31:20 model_runner.py:104] Loading model weights took 6.8127 GB
INFO 04-11 12:31:25 gpu_executor.py:94] # GPU blocks: 433, # CPU blocks: 327
INFO 04-11 12:31:28 model_runner.py:791] Capturing the model for CUDA graphs. This may lead t

## Load the movies dataset as a DuckDB table

In [2]:
# Make sure you import duckdb from llmsql
from llmsql.duckdb import duckdb

# Create a table from the movies dataset
conn = duckdb.connect(database=':memory:', read_only=False)
conn.execute("CREATE TABLE movies AS SELECT * FROM read_csv('movies_small.csv')")
conn.execute("CREATE TABLE movies_small as SELECT * FROM movies LIMIT 10")

<duckdb.duckdb.DuckDBPyConnection at 0x7141999169b0>

In [3]:
# View the table and fields in the table

print(conn.sql("SHOW TABLES"))

print(conn.sql("DESCRIBE movies_small"))

┌──────────────┐
│     name     │
│   varchar    │
├──────────────┤
│ movies       │
│ movies_small │
└──────────────┘

┌──────────────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│     column_name      │ column_type │  null   │   key   │ default │  extra  │
│       varchar        │   varchar   │ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ rotten_tomatoes_link │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ review_content       │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ movie_title          │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ movie_info           │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ id                   │ BIGINT      │ YES     │ NULL    │ NULL    │ NULL    │
└──────────────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘



## Run the LLM Queries

In [4]:
query = "SELECT LLM('Summarize the {review_content}. Return just the summary and nothing else.', review_content) FROM movies_small"

In [5]:
query_result = conn.execute(query).fetchall()


No chat template is defined for this tokenizer - using the default template for the CachedLlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

In [7]:
query_result

[("  Sure! Here's the summary of the review content:\n\nNo",),
 ('  Sure! Here is the summary of the review content:\n\n"ted',),
 ('  Sure! Here\'s the summary of the review content:\n\n"',),
 ('  Sure! Here\'s the summary of the review content:\n\n"',),
 ('  Sure! Here\'s the summary of the review content:\n\n"',),
 ('  Sure! Based on the provided data, the summary of the review content is',),
 ("  Sure, I'd be happy to help! Based on the data provided",),
 ("  Sure, I'd be happy to help! Here is the summary of",),
 ('  Sure! Here\'s the summary of the review content:\n\n"',),
 ('  Sure! Based on the provided JSON data, the summary of the review content',)]