# LLM Queries in DuckDB

This notebook walks through how to call LLMs directly as a UDF in a DuckDB database using [vLLM](https://github.com/vllm-project/vllm) as the inference engine.

## Initialize the LLM Engine

Choose between OpenAI or vLLM with a quantized version of Llama-3 8B

In [1]:
# Uncomment the below code to initialize llmsql with OpenAI
# import llmsql
# from llmsql.llm.openai import OpenAI


# llmsql.init(OpenAI(base_url="https://api.openai.com/v1", api_key="<INSERT_OPENAI_KEY>"))


# Uncomment the below code to initialize llmsql with vLLM

import llmsql
from llmsql.llm.vllm import vLLM
from vllm import EngineArgs
args = EngineArgs(model="TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ")

llmsql.init(vLLM(engine_args=args))


Starting vLLM engine...
INFO 04-25 15:23:16 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ', speculative_config=None, tokenizer='TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


INFO 04-25 15:23:16 utils.py:608] Found nccl from library /home/ray/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 04-25 15:23:17 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-25 15:23:17 selector.py:33] Using XFormers backend.
INFO 04-25 15:23:19 weight_utils.py:193] Using model weights format ['*.safetensors']
INFO 04-25 15:23:21 model_runner.py:173] Loading model weights took 5.3472 GB
INFO 04-25 15:23:27 gpu_executor.py:119] # GPU blocks: 2602, # CPU blocks: 2048
INFO 04-25 15:23:29 model_runner.py:976] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-25 15:23:29 model_runner.py:980] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed

## Load the movies dataset as a DuckDB table

In [31]:
import pandas as pd

# Create a table from the movies dataset
df = pd.read_csv("../movies_small.csv")
df = df[df["review_content"].notnull()]
df = df[:20]

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20 entries, 1 to 26
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   rotten_tomatoes_link  20 non-null     object
 1   review_content        20 non-null     object
 2   movie_title           20 non-null     object
 3   movie_info            20 non-null     object
 4   id                    20 non-null     int64 
dtypes: int64(1), object(4)
memory usage: 960.0+ bytes


## Run sample LLM Queries

In [22]:
from llmsql.pandas import query

### LLMs in Projection Queries

In [4]:
prompt = "Given a movie review as {review_content}, classify the review as either POSITIVE, NEGATIVE, or NEUTRAL. Respond with just the category and no other text."
result = query(prompt, df)

Processed prompts: 100%|██████████| 20/20 [00:02<00:00,  7.82it/s]


In [5]:
# Print first 5 results
for result in list(zip(df["review_content"], result))[:5]:
    print(f"Movie review: {result[0]}")
    print(f"Sentiment: {result[1]}\n")

Movie review: It's a series of routines within a routine formula, and the result is as tedious as it sounds.
Sentiment: NEGATIVE

Movie review: A vulgar exercise of terror that, despite its defects, manages to stand out from its delectable predecessors. [Full Review in Spanish]
Sentiment: POSITIVE

Movie review: After the Thin Man hasn't quite the spontaneity and charm of the original, but it's good mystery-comedy, the dialogue bright, the handling expert, and the principals as ingratiating as ever.
Sentiment: POSITIVE

Movie review: You never really get angry at it. You just want to shake it up because the elements for a first-class comedy thriller are all there. It's simply .that everything is always ten per cent off.
Sentiment: NEUTRAL

Movie review: An excellent film.
Sentiment: POSITIVE



In [6]:
prompt = (
    "Given {movie_title}, {movie_info} and a movie review as {review_content}, extract all character names that are mentioned. "
    "Respond with just the character names and no other text. If there are no characters, respond with just None")
result = query(prompt, df)

Processed prompts:   0%|          | 0/20 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processed prompts: 100%|██████████| 20/20 [00:04<00:00,  4.81it/s]


In [7]:
for characters in result[:5]:
    print(f"Character names: {characters}")

Character names: Jon Hamm
Character names: James Coburn, Virginia Madsen
Character names: Aaron Eckhart, Bill
Character names: Lilyan Chauvin, Gilmer McCormick
Character names: Annibal Ramirez, Carlos Sanchez, Jack Shaw, Amos


### Filter Queries

We can also use the LLM query as filters, possibly in combination with projections

In [12]:
prompt = (
    "Given a movie review as {review_content}, classify the review as either POSITIVE, NEGATIVE, or NEUTRAL. "
    "Respond with just the category and no other text.")
result = query(prompt, df)
filtered_movies = df[[r=="POSITIVE" for r in result]]["movie_title"]

Processed prompts: 100%|██████████| 20/20 [00:00<00:00, 29.71it/s]


In [16]:
# Print first 5 results
for result in zip(filtered_movies[:5]):
    print(f"{result[0]}")

Amityville: The Awakening
After the Thin Man
Silver Streak
Boomerang!
Hoodlum


In [17]:
prompt = (
    "Given a movie review as {review_content}, classify the review as either POSITIVE, NEGATIVE, or NEUTRAL. "
    "Respond with just the category and no other text.")
result = query(prompt, df)
filtered_movies = df[[r=="POSITIVE" for r in result]]


prompt = "Given {movie_title} and {movie_info}, determine if this movie is suitable for kids. "
result = query(prompt, filtered_movies)

Processed prompts: 100%|██████████| 20/20 [00:00<00:00, 30.23it/s]
Processed prompts: 100%|██████████| 6/6 [00:00<00:00, 10.99it/s]


In [19]:
# Print first 5 results
for title, kids in list(zip(filtered_movies["movie_title"], result))[:5]:
    print(f"Movie title: {title}")
    print(f"Suitable for kids: {kids}\n")

Movie title: Amityville: The Awakening
Suitable for kids: No, this movie is not suitable for kids.

Movie title: After the Thin Man
Suitable for kids: unsuitable

Movie title: Boomerang!
Suitable for kids: unsuitable

Movie title: Hoodlum
Suitable for kids: No

Movie title: Meet Bill
Suitable for kids: Not suitable for kids



### Aggregate Query

LLM queries can also be used in aggregates.

In [43]:
prompt = "Given a movie review as {review_content}, score the movie either as 1, 2, 3, with 3 as the highest. Return just the score and nothing else."
df["review_score"] = [int(score) for score in query(prompt, df)]
grouped_df = df.groupby("movie_title")["review_score"].mean()

Processed prompts: 100%|██████████| 20/20 [00:00<00:00, 35.40it/s]


In [44]:
grouped_df

movie_title
After the Thin Man            2.0
Alexandra's Project           2.0
American Gun                  2.0
Amityville: The Awakening     2.0
Armstrong                     3.0
Badland                       2.0
Boomerang!                    2.5
Death in Love                 2.0
Hoodlum                       2.0
Meet Bill                     3.0
Silent Night, Deadly Night    2.0
Silver Streak                 3.0
Stolen                        2.0
The Assignment                3.0
The Jazz Singer               3.0
White Fang                    2.0
Name: review_score, dtype: float64