# Setup Postgres

We install postgres and its dev tools (necessary to build lantern from source). We also start postgres, and set up a user 'postgres' with password 'postgres' and create a database called 'ourdb'




In [None]:
# We install postgres and its dev tools
!sudo apt-get -y -qq update
!sudo apt-get -y -qq install postgresql postgresql-server-dev-all
#  Start postgres
!sudo service postgresql start

# Create user, password, and db
!sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'postgres';"
!sudo -u postgres psql -U postgres -c 'DROP DATABASE IF EXISTS ourdb;'
!sudo -u postgres psql -U postgres -c 'CREATE DATABASE ourdb;'

debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 26.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package logrotate.
(Reading database ... 120874 files and directories currently installed.)
Preparing to unpack .../00-logrotate_3.19.0-1ubuntu1.1_amd64.deb ...
Unpacking logrotate (3.19.0-1ubuntu1.1) ...
Selecting previously unselected package netbase.
Preparing to unpack .../01-netbase_6.3_all.deb ...
Unpacking netbase (6.3) ...
Selecting previously unselected package python3-yaml.
Preparing to unpack .../02-python3-yaml_5.4.1-1ubuntu1_amd64.deb ...
Unpacking python3-yaml (5.4.1-1ubuntu1) ...
Selecting previous

# Install Lantern and build it from source

In [None]:
!git clone --recursive https://github.com/lanterndata/lantern.git

Cloning into 'lantern'...
remote: Enumerating objects: 2562, done.[K
remote: Counting objects: 100% (1342/1342), done.[K
remote: Compressing objects: 100% (414/414), done.[K
remote: Total 2562 (delta 1068), reused 1003 (delta 922), pack-reused 1220[K
Receiving objects: 100% (2562/2562), 578.18 KiB | 3.85 MiB/s, done.
Resolving deltas: 100% (1698/1698), done.
Submodule 'third_party/hnswlib' (https://github.com/ngalstyan4/hnswlib) registered for path 'third_party/hnswlib'
Submodule 'third_party/usearch' (https://github.com/ngalstyan4/usearch) registered for path 'third_party/usearch'
Cloning into '/content/lantern/third_party/hnswlib'...
remote: Enumerating objects: 1723, done.        
remote: Counting objects: 100% (333/333), done.        
remote: Compressing objects: 100% (40/40), done.        
remote: Total 1723 (delta 306), reused 293 (delta 293), pack-reused 1390        
Receiving objects: 100% (1723/1723), 530.50 KiB | 9.15 MiB/s, done.
Resolving deltas: 100% (1097/1097), done.

In [None]:
# We build lantern from source
%cd lantern
!mkdir build
%cd build
!pwd
!cmake ..
!make install

/content/lantern
/content/lantern/build
/content/lantern/build
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

[0m
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type: 
-- Found pg_config as /usr/bin/pg_config
-- Found postgres binary at /usr/lib/postgresql/14/bin/postgres
-- PostgreSQL version PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) fou

# Lantern Quickstart

Here, we use the `psycopg2` library to interact with Postgres from Python.
The first step is to obtain a `conn` object to the postgres instance on our machine. We use the `connect` function and specify the user, password, and database name we used from earlier. The `host` and `port` parameters also let `psycopg2` know how to connect to postgres, and are the default values if you're running postgres locally.  

NOTE: if at any time you encounter an error while executing a query, you should call `conn.rollback()` to restore the database to the most recent transaction.

In [None]:
import psycopg2

# We use the dbname, user, and password that we specified above
conn = psycopg2.connect(
    dbname="ourdb",
    user="postgres",
    password="postgres",
    host="localhost",
    port="5432" # default port for Postgres
)

# Enabling the Lantern extension
Lantern is a postgres extension, and so we need to tell postgres about it so that it can use it when we perform vector searches! So, we use the `CREATE EXTENSION` statement, and use `IF NOT EXISTS` to avoid throwing an error if the extension is already loaded.

In [None]:
# Get a new cursor
cursor = conn.cursor()

# Execute the query to load the Lantern extension in
cursor.execute("CREATE EXTENSION IF NOT EXISTS lantern;")

# Commit the transaction
conn.commit()

# Close the cursor
cursor.close()

# Creating a Table

Let's create a simple table called `small_world` with three columns: an `id` column of the type `INTEGER`, and an array of real numbers (of the type `REAL[]`).

Note that although we specify "3" in the type of `vector` (by writing `REAL[3]`), this is actually only just syntactic sugar in postgres, and postgres will NOT enforce this length! This is done in postgres by design.

In [None]:
cursor = conn.cursor()

create_table_query = "CREATE TABLE small_world (id integer, vector real[3]);"

cursor.execute(create_table_query)

conn.commit()
cursor.close()

# Inserting Data

Let's insert some data! We insert a few vectors into our table using the `INSERT` statement. As pointed out earlier, just because we specified `REAL[3]` during the creation of our table does not mean that inserting a vector with length other than 3 will fail here.

In [None]:
cursor = conn.cursor()

# Let's insert a vector [0,0,0] with id 0 (note that postgres uses {} braces)
cursor.execute("INSERT INTO small_world (id, vector) VALUES (0, '{0, 0, 0}');")

# Now let's insert some more vectors
v1 = [0, 0, 1]
v2 = [0, 1, 1]
v3 = [1, 1, 1]
v4 = [2, 0, 1]

cursor.execute("INSERT INTO small_world (id, vector) VALUES (%s, %s), (%s, %s), (%s, %s), (%s, %s);", (1, v1, 2, v2, 3, v3, 4, v4))

conn.commit()
cursor.close()

# Creating an Index
In order to perform queries, we need to specify an index. In postgres, an index is a specialized way to store data that speeds up and allows for new ways to interact with your data. The `hnsw` index is from lantern, and it allows for blazingly fast vector search.

Note that we can specify options and parameters to our index creation. For example, we can specify the distance method that is used, which is how we calculate the distance between two vectors when we ultimately search for a vector's nearest neighbors. The default, as used below, is `l2sq`, which is the squared L-2 distance (which is the squared "Euclidean" distance, that you might be familiar with).

In [None]:
cursor = conn.cursor()

cursor.execute("CREATE INDEX ON small_world USING hnsw (vector);")

# We can also specify additional parameters to the index like this:
"""CREATE INDEX ON small_world USING hnsw (vector dist_l2sq_ops)
WITH
(M=2, ef_construction=10, ef=4, dim=3);"""

conn.commit()
cursor.close()

# Vector Search!
Now that we have created an index, we can start doing a nearest-neightbor vector search!

However, we first need to set `enable_seqscan` to false. The details of this can be elaborated upon elsewhere, but the gist of it is that we need postgres to use the index that we created above when performing queries (like with `SELECT`). By disabling this postgres runtime variable, we make sure that postgres always uses our index, which allows us to perform vector search using lantern.

Then, we do a search for the 3 nearest neighbors from our table to the vector [0,0,0]. Note that this "target" vector ([0,0,0]) does not need to be in our index. It is simply the vector from which we compute the distance from to find its nearest neightbors.

Since our index was built to use the L2-squared distance (squared Euclidean distance), and so that is the distance that is used during the search below. Note that `l2sq_dist` found in the first part of the statement only recomputes the distance with the neighbors to show up in the query for our convenience! The actual search, which occurs in the second half of the SQL statement, performs search in the index which was configured to use L2-squared distance when we built it above. Hence, we see this distance being reflected in the print statements below.

In [None]:
cursor = conn.cursor()

# We only need to set this at the beginning of a session
cursor.execute("SET enable_seqscan = false;")
cursor.execute("SELECT id, l2sq_dist(vector, ARRAY[0,0,0]) AS dist, vector FROM small_world ORDER BY vector <-> ARRAY[0,0,0] LIMIT 3;")

record = cursor.fetchone()
while record:
    print(f"Vector {record[2]} with ID {record[0]} has a L2-squared distance of {record[1]} from [0,0,0]")
    record = cursor.fetchone()

cursor.close()

Vector [0.0, 0.0, 0.0] with ID 0 has a L2-squared distance of 0.0 from [0,0,0]
Vector [0.0, 0.0, 1.0] with ID 1 has a L2-squared distance of 1.0 from [0,0,0]
Vector [0.0, 1.0, 1.0] with ID 2 has a L2-squared distance of 2.0 from [0,0,0]


# Conclusion
That's how you get up and running using Lantern and `psycopg2`! Feel free to explore more of our tutorials and demos.

### Cleanup

In [None]:
# Close the postgres connection
conn.close()