# LLM Queries in DuckDB

This notebook walks through how to call LLMs directly as a UDF in a DuckDB database using [vLLM](https://github.com/vllm-project/vllm) as the inference engine.

## Initialize the LLM Engine

Choose between OpenAI or vLLM with a quantized version of Llama-3 8B

In [1]:
# Uncomment the below code to initialize llmsql with OpenAI
# import llmsql
# from llmsql.llm.openai import OpenAI


# llmsql.init(OpenAI(base_url="https://api.openai.com/v1", api_key="<INSERT_OPENAI_KEY>"))


# Uncomment the below code to initialize llmsql with vLLM

import llmsql
from llmsql.llm.vllm import vLLM
from vllm import EngineArgs
args = EngineArgs(model="TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ")

llmsql.init(vLLM(engine_args=args))


Starting vLLM engine...
INFO 04-25 14:57:57 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ', speculative_config=None, tokenizer='TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


INFO 04-25 14:57:58 utils.py:608] Found nccl from library /home/ray/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 04-25 14:57:58 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-25 14:57:58 selector.py:33] Using XFormers backend.
INFO 04-25 14:58:00 weight_utils.py:193] Using model weights format ['*.safetensors']
INFO 04-25 14:58:02 model_runner.py:173] Loading model weights took 5.3472 GB
INFO 04-25 14:58:08 gpu_executor.py:119] # GPU blocks: 2602, # CPU blocks: 2048
INFO 04-25 14:58:10 model_runner.py:976] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-25 14:58:10 model_runner.py:980] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed

## Load the movies dataset as a DuckDB table

In [2]:
# Make sure you import duckdb from llmsql
from llmsql.duckdb import duckdb

# Create a table from the movies dataset
conn = duckdb.connect(database=':memory:', read_only=False)
conn.execute("CREATE TABLE movies AS SELECT * FROM read_csv('../movies_small.csv')")
conn.execute("CREATE TABLE movies_limit as SELECT * FROM movies WHERE review_content IS NOT NULL LIMIT 20")

<duckdb.duckdb.DuckDBPyConnection at 0x75b4c9e173f0>

In [3]:
# View the table and fields in the table

print(conn.sql("SHOW TABLES"))

print(conn.sql("DESCRIBE movies_limit"))

┌──────────────┐
│     name     │
│   varchar    │
├──────────────┤
│ movies       │
│ movies_limit │
└──────────────┘

┌──────────────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│     column_name      │ column_type │  null   │   key   │ default │  extra  │
│       varchar        │   varchar   │ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ rotten_tomatoes_link │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ review_content       │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ movie_title          │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ movie_info           │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ id                   │ BIGINT      │ YES     │ NULL    │ NULL    │ NULL    │
└──────────────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘



## Run sample LLM Queries

### LLMs in Projection Queries

In [4]:
query = (
    "SELECT review_content, LLM('Given a movie review as {review_content}, classify the review as either POSITIVE, NEGATIVE, or NEUTRAL." 
    "Respond with just the category and no other text.', review_content) FROM movies_limit")
query_result = conn.execute(query).fetchall()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

In [5]:
# Print first 5 results
for result in query_result[:5]:
    print(f"Movie review: {result[0]}")
    print(f"Sentiment: {result[1]}\n")

Movie review: It's a series of routines within a routine formula, and the result is as tedious as it sounds.
Sentiment: NEGATIVE

Movie review: A vulgar exercise of terror that, despite its defects, manages to stand out from its delectable predecessors. [Full Review in Spanish]
Sentiment: POSITIVE

Movie review: After the Thin Man hasn't quite the spontaneity and charm of the original, but it's good mystery-comedy, the dialogue bright, the handling expert, and the principals as ingratiating as ever.
Sentiment: POSITIVE

Movie review: You never really get angry at it. You just want to shake it up because the elements for a first-class comedy thriller are all there. It's simply .that everything is always ten per cent off.
Sentiment: NEUTRAL

Movie review: An excellent film.
Sentiment: POSITIVE



In [6]:
query = (
    "SELECT LLM('Given {movie_title}, {movie_info} and a movie review as {review_content}, extract all character names that are mentioned."
    "Respond with just the character names and no other text. If there are no characters, respond with just None', movie_title, movie_info, review_content) FROM movies_limit")
query_result = conn.execute(query).fetchall()

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

In [7]:
for result in query_result[:5]:
    print(f"Character names: {result[0]}")

Character names: None
Character names: None
Character names: Nick Charles, Nora, Selma, Robert, David Graham
Character names: Gene Wilder, Jill Clayburgh, Richard Pryor
Character names: Dana Andrews, Lee J. Cobb, Henry Harvey


### Filter Queries

We can also use the LLM query as filters, possibly in combination with projections

In [8]:
filter_query = (
    "SELECT movie_title FROM movies_limit WHERE "
    "LLM('Given a movie review as {review_content}, classify the review as either POSITIVE, NEGATIVE, or NEUTRAL. " 
    "Respond with just the category and no other text.', review_content) == 'POSITIVE'")
query_result = conn.execute(filter_query).fetchall()

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

In [9]:
# Print first 5 results
for result in zip(query_result[:5]):
    print(f"{result[0][0]}")

Amityville: The Awakening
After the Thin Man
Boomerang!
American Gun
White Fang


In [10]:
query = (
    "SELECT movie_title, LLM('Given {movie_title} and {movie_info}, determine if this movie is suitable for kids.', movie_title, movie_info) "
    "FROM movies_limit WHERE "
    "LLM('Given a movie review as {review_content}, classify the review as either POSITIVE, NEGATIVE, or NEUTRAL." 
    "Respond with just the category and no other text.', review_content) == 'POSITIVE'"
)
query_result = conn.execute(query).fetchall()

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

In [11]:
# Print first 5 results
for result in query_result[:5]:
    print(f"Movie title: {result[0]}")
    print(f"Suitable for kids: {result[1]}\n")

Movie title: Amityville: The Awakening
Suitable for kids: No, this movie is not suitable for kids.

Movie title: After the Thin Man
Suitable for kids: No

Movie title: Silver Streak
Suitable for kids: Not suitable for kids.

Movie title: Boomerang!
Suitable for kids: No

Movie title: American Gun
Suitable for kids: Based on the movie title and information, I would say the movie is NOT suitable



### Aggregate Query

LLM queries can also be used in aggregates.

In [12]:
query = (
    "SELECT movie_title, " 
        "AVG(CAST(LLM("
            "'Given a movie review as {review_content}, score the movie either as 1, 2, 3, with 3 as the highest. Return just the score and nothing else.', movie_title, movie_info) "
        "AS integer)) "
    "FROM movies_limit GROUP BY movie_title "
)
query_result = conn.execute(query).fetchall()

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

In [13]:
# Print first 5 results
for result in query_result[:5]:
    print(f"Movie title: {result[0]}")
    print(f"Average review score: {result[1]}\n")

Movie title: Amityville: The Awakening
Average review score: 2.3333333333333335

Movie title: Silent Night, Deadly Night
Average review score: 2.0

Movie title: Hoodlum
Average review score: 2.0

Movie title: American Gun
Average review score: 2.0

Movie title: Armstrong
Average review score: 2.0



In [14]:
query = "SELECT movie_title from movies_limit"

: 