# Getting Started with sqlite_vec

This is just a toy notebook to play around with the new `sqlite_vec` extension which allows one to store vectors (e.g. embeddings) in a sqlite database and perform standard operations (such as getting the most similiar vector from a database).

In this example, I we are using an artificial, toy dataset with a tweet/statement and a sentiment (harmful / not harmful).

The goal of this notebook is to classify a new sentence using the `sqlite_vec` extension.

Find out more about sqlite_vec [here](https://github.com/asg017/sqlite-vec).

## Installation of `sqlite_vec` and `openai`
If necessary, you need to install `sqlite_vec` and `openai` (used to encode embeddings) first. You can do so by running the following cell:

In [2]:
!pip install sqlite_vec openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Imports
Now we import the required libraries for our demo

In [3]:
import sqlite3
import sqlite_vec

## Using sqlite_vec as extension ("setup")

In [4]:
db = sqlite3.connect(":memory:") # We have an in-memory database
db.enable_load_extension(True) # We enable loading extensions, so we can add sqlite_vec
sqlite_vec.load(db) # We connect our database with sqlite_vec
db.enable_load_extension(False) # We disable loading extensions (security)

## Create a small sentiment dataset and load it into the database
Notice, that this example dataset was generated with GPT-4o and the following query:
``
``


In [5]:
dataset = [
    ["Just saw the most beautiful sunset. Life is good! #blessed", "not hateful"],
    ["I can't believe how stupid some people can be. Ugh.", "not hateful"],
    ["I hate it when people chew with their mouths open. So gross!", "not hateful"],
    ["Why are people so dumb these days? It's embarrassing.", "not hateful"],
    ["Wow, can't believe how ignorant some folks are. Smh.", "not hateful"],
    ["You're the reason this world is going to hell. So pathetic.", "hateful"],
    ["I wish all these idiots would just disappear. World would be better.", "hateful"],
    ["These fools need to learn their place. So sick of them.", "hateful"],
    ["How are people so trash? They really need to stop existing.", "hateful"],
    ["Just saw a dog on my way to work. Made my day!", "not hateful"],
    ["I hope everyone has a wonderful day today! #positivevibes", "not hateful"],
    ["Why are some people so annoying? Can't stand them.", "not hateful"],
    ["Stop being so lazy and actually do something with your life.", "hateful"],
    ["You're such a waste of space. Just go away.", "hateful"],
    ["If you can't contribute anything, then just stay out of the way.", "hateful"],
    ["Sending love and light to everyone today! #grateful", "not hateful"],
    ["Can't believe the hate I'm seeing online. It's so toxic.", "not hateful"],
    ["Why do people spread so much negativity? It's sad.", "not hateful"],
    ["People like you make me lose faith in humanity.", "hateful"],
    ["You don't belong here. Just leave.", "hateful"],
    ["You're the reason why this place sucks. Go away.", "hateful"],
    ["Woke up feeling amazing! Ready to take on the day.", "not hateful"],
    ["Just finished a great workout. Feeling strong!", "not hateful"],
    ["I can't stand people who think they're better than everyone else.", "not hateful"],
    ["It's disgusting how some people think they're entitled to everything.", "hateful"],
    ["Why are you even here? No one wants you around.", "hateful"],
    ["If you're not going to help, then just stay out of the way.", "hateful"],
    ["You're such a joke. No one takes you seriously.", "hateful"],
    ["I love how supportive everyone has been. Thank you!", "not hateful"],
    ["It's amazing how kind people can be. Restores my faith in humanity.", "not hateful"],
    ["Some people are just too sensitive these days.", "not hateful"],
    ["Why do people get offended by everything? It's ridiculous.", "not hateful"],
    ["You're so weak, it's pathetic.", "hateful"],
    ["No one cares about your opinion. Just stop talking.", "hateful"],
    ["I don't understand why people are so rude for no reason.", "not hateful"],
    ["Why do you even try? You're never going to be good enough.", "hateful"],
    ["You're a failure and everyone knows it.", "hateful"],
    ["Stop pretending to be something you're not. It's embarrassing.", "hateful"],
    ["Had a great day with friends! Can't wait to do it again.", "not hateful"],
    ["So grateful for all the love and support I've received.", "not hateful"],
    ["Why can't people just mind their own business?", "not hateful"],
    ["You're so full of yourself. It's disgusting.", "hateful"],
    ["No one wants to hear what you have to say. Just go away.", "hateful"],
    ["You're just a burden to everyone around you.", "hateful"],
    ["Stop acting like you're better than everyone else.", "hateful"],
    ["Had the best time at the concert last night!", "not hateful"],
    ["Can't believe how amazing this weekend has been.", "not hateful"],
    ["Why are people so obsessed with being perfect? It's exhausting.", "not hateful"],
    ["You're such a loser. No one wants to be around you.", "hateful"],
    ["If you disappeared, no one would even notice.", "hateful"],
    ["You're so annoying, it's unbelievable.", "hateful"],
    ["Why can't people just be happy for each other?", "not hateful"],
    ["You're nothing but a waste of space.", "hateful"],
    ["Why are you always so negative? It's bringing everyone down.", "not hateful"],
    ["You're such a disgrace. Just leave already.", "hateful"],
    ["No one likes you. Just face the truth.", "hateful"],
    ["You're not worth anyone's time.", "hateful"],
    ["Stop being so selfish and think about others for once.", "hateful"],
    ["Had an amazing dinner with my family tonight. So grateful.", "not hateful"],
    ["Can't believe how lucky I am to have such great friends.", "not hateful"],
    ["You're so fake, it's pathetic.", "hateful"],
    ["No one respects you, and for good reason.", "hateful"],
    ["You're such a failure, it's sad.", "hateful"],
    ["Stop trying so hard. It's not going to work.", "hateful"],
    ["Feeling so blessed today. Life is good!", "not hateful"],
    ["I love how supportive everyone has been lately.", "not hateful"],
    ["You're nothing but a joke to everyone.", "hateful"],
    ["You're just a waste of resources.", "hateful"],
    ["No one wants you here. Just leave.", "hateful"],
    ["Had a fantastic day at the beach. So relaxing.", "not hateful"],
    ["Can't believe how rude some people can be.", "not hateful"],
    ["Why do people feel the need to be so hateful?", "not hateful"],
    ["You're the worst person I've ever met.", "hateful"],
    ["If you left, no one would miss you.", "hateful"],
    ["You're just a burden on society.", "hateful"],
    ["Stop pretending like you matter. You don't.", "hateful"],
    ["Had a great time with my family today. So much love.", "not hateful"],
    ["Feeling so grateful for everything I have.", "not hateful"],
    ["You're so annoying. No one likes you.", "hateful"],
    ["Why are you even here? No one wants you around.", "hateful"],
    ["You're just a waste of everyone's time.", "hateful"],
    ["Why can't people just be kind to each other?", "not hateful"],
    ["You're such a disappointment.", "hateful"],
    ["No one cares about you. Just stop trying.", "hateful"],
    ["Why do people have to be so cruel?", "not hateful"],
    ["You're just a nuisance to everyone.", "hateful"],
    ["Had an amazing weekend with friends. Can't wait to do it again.", "not hateful"],
    ["Why can't people just mind their own business?", "not hateful"],
    ["You're so pathetic. Just go away.", "hateful"],
    ["No one respects you, and for good reason.", "hateful"],
    ["Feeling so blessed to have such amazing friends.", "not hateful"],
    ["You're such a failure. No one cares about you.", "hateful"],
    ["Can't believe how kind people can be. It restores my faith in humanity.", "not hateful"],
    ["You're so fake, it's embarrassing.", "hateful"],
    ["No one wants you here. Just leave.", "hateful"],
    ["You're just a waste of resources.", "hateful"],
    ["Had an amazing time at the concert last night. So much fun!", "not hateful"],
    ["You're such a loser. No one wants to be around you.", "hateful"]
]


In [None]:
# Now we create a table and load the data
create_table_sql = """
    CREATE TABLE sentences(
        id INTEGER PRIMARY KEY,
        sentence TEXT, -- The sentence which we want to embed
        sentiment TEXT
    )
"""

insert_sentence_sql = """
    INSERT INTO sentences(sentence, sentiment) VALUES (?, ?)
"""
cursor = db.cursor()
cursor.execute(create_table_sql)

# Next we insert the data
cursor.executemany(insert_sentence_sql, dataset)
cursor.close()

## Embed the sentences using OpenAI Embeddings
Now we embed the sentences of our example dataset using the openai model `text-embedding-3-small` (which is pretty cheap and fast).

In [31]:
import openai

# Set your key here
OPENAI_KEY = "YOUR_API_KEY_HERE"

client = openai.OpenAI(api_key=OPENAI_KEY)
embeddings = [data.embedding for data in client.embeddings.create(input = [datapoint[0] for datapoint in dataset], model = 'text-embedding-3-small').data]

## Create the embedding table using a virtual table
We are using a virtual table here with `vec0`, the virtual table which is implemented with the `sqlite_vec` extension. 

Notice, that virtual table does NOT mean it is temporary. Instead, on a virtual table one can invoke callback methods of the virtual table object, such as calculating a distance function.

In [8]:
# Now we create the vector table
import struct
from typing import List


create_vector_table_sql = """
    CREATE VIRTUAL TABLE sentences_embeddings USING vec0(
          id INTEGER PRIMARY KEY,
          sentence_embedding FLOAT[1536]
    );
"""

insert_embedding_sql = """
    INSERT INTO sentences_embeddings(sentence_embedding) VALUES (?)
"""

def serialize(vector: List[float]) -> bytes:
    """serializes a list of floats into a compact "raw bytes" format"""
    return struct.pack("%sf" % len(vector), *vector)

cursor = db.cursor()
cursor.execute(create_vector_table_sql)
cursor.executemany(insert_embedding_sql, [(serialize(embedding), ) for embedding in embeddings])
cursor.close()

## Classify a new sentence
Now we create a new sample which we want to classify with the help of our sqlite database.
First, we embed it again using the same model as previously (`text-embedding-3-small`) and then we use the

```
sentence_embedding MATCH ([1, 0.231, 0.31232, ...])
AND k = 3
```

syntax.

Like this we get the three (determined by the `k` parameter in the query) closest embeddings to our new sample.
We then can just return the same classification of the closest sentences in our database.


In [17]:
# Now we try to classify one new sentence
sentence_to_classify = "I hate you and I hope you never talk to me again!"
sentence_to_classify_embedded = client.embeddings.create(input = sentence_to_classify, model = 'text-embedding-3-small').data[0].embedding


In [27]:
get_closest_sentiment_from_database_sql = """
    SELECT sentence, sentiment, distance
    FROM sentences_embeddings
    LEFT JOIN sentences ON sentences.id = sentences_embeddings.id
    WHERE sentences_embeddings.sentence_embedding MATCH ?
    AND k = 3 -- kNN
    ORDER BY distance
"""

cursor = db.cursor()
resultset = cursor.execute(get_closest_sentiment_from_database_sql, (serialize(sentence_to_classify_embedded),))

In [28]:
for result in resultset:
    print(result)
cursor.close()

("You're the worst person I've ever met.", 'hateful', 1.043708086013794)
("You're so annoying. No one likes you.", 'hateful', 1.1006056070327759)
("You're such a loser. No one wants to be around you.", 'hateful', 1.1094571352005005)


As a result, we would classify our new sample as hateful.