Skip to content

Commit

Permalink
Merge branch 'pg-15' into 'master'
Browse files Browse the repository at this point in the history
REFACTOR postgresql-15.3

See merge request oelmekki/postgres-350d!1
  • Loading branch information
oelmekki committed Jul 10, 2023
2 parents 60767d5 + afecdbe commit 9968740
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 41 deletions.
12 changes: 12 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
image: docker:latest
services:
- docker:dind

stages:
- test

testing_image:
stage: test
script:
- docker build -t pg-350d .
- ./test.sh
10 changes: 5 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
FROM postgres:9.6
FROM postgres:15.3
MAINTAINER Olivier El Mekki <olivier@el-mekki.com>

RUN apt-get update && apt-get install -y build-essential curl postgresql-server-dev-9.6
RUN curl https://ftp.postgresql.org/pub/source/v9.6.0/postgresql-9.6.0.tar.bz2 -o /postgresql-9.6.0.tar.bz2
RUN cd / && tar xvf postgresql-9.6.0.tar.bz2
RUN cd /postgresql-9.6.0/contrib/cube && sed -i 's/#define CUBE_MAX_DIM (100)/#define CUBE_MAX_DIM (350)/' cubedata.h && \
RUN apt update && apt install -y build-essential curl postgresql-server-dev-15
RUN curl https://ftp.postgresql.org/pub/source/v15.3/postgresql-15.3.tar.bz2 -o /postgresql-15.3.tar.bz2
RUN cd / && tar xvf postgresql-15.3.tar.bz2
RUN cd /postgresql-15.3/contrib/cube && sed -i 's/#define CUBE_MAX_DIM (100)/#define CUBE_MAX_DIM (350)/' cubedata.h && \
USE_PGXS=true make && USE_PGXS=true make install
51 changes: 15 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,30 @@
# pg350d

Docker build of postgresql-9.6 changing the dimension limit for the cube extension, raising it to 350.
Docker build of postgresql-15.3 changing the dimension limit for the cube
extension, raising it to 350.

This is needed to be able to work with words embedding with postgres.
This is needed to be able to work with words embedding or other machine
learning related vectors with postgres.

You can easily generate a build for your own need in term of dimensions by editing this dockerfile.
> Note: since pg350d was released, there has been some efforts to support
> machine learning friendly vectors in [pgvector](https://github.com/pgvector/pgvector),
> which supports up to 16k dimensions.
You can easily generate a build for your own need in term of dimensions by
editing this dockerfile.

## What is the problem again?

The cube extension, which you'll use to perform operations on vectors, has a hard limit of 100 dimensions per vector.


## But I can create vectors with more than 100 dimensions!

Yup, I managed to do it too. With `INSERT` and `UPDATE`, the hard limit seems not to be properly checked.

The problem happens (at least) when you try to import a dump. It will fail saying that you can't have
vectors with more than 100 dimensions.

If you are using 101+ dimensions vectors with postgres currently, know that you won't be able to restore
your backups (and not upgrade postgres if you usually do so through dump/import) :)



## Download

The image dockerhub page is [here](https://hub.docker.com/r/oelmekki/pg350d/).

To pull it:

```
docker pull oelmekki/pg350d:9.6
```

The cube extension, which you'll use to perform operations on vectors, has
a hard limit of 100 dimensions per vector.

## Is it safe?

Patching the hardcoded limit is [the recommended way in postgres doc](https://www.postgresql.org/docs/9.5/static/cube.html#AEN169535).

I've been using it for several months on my main business, and didn't encounter any problem so far.


## Variants

If you want more than 350d and don't want to change it yourself, [@lisitsky made a 2000d variant](https://github.com/lisitsky/postgres-2kd).
Patching the hardcoded limit is [the recommended way in postgres
doc](https://www.postgresql.org/docs/current/cube.html#id-1.11.7.20.9).

I've been using it for a few years in production, and didn't encounter any
problem.

## How to raise postgresql's cube extension dimensions limit?

Expand Down
19 changes: 19 additions & 0 deletions test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
##!/usr/bin/env bash

echo "Starting database…"
ID=$(docker run --rm -e POSTGRES_HOST_AUTH_METHOD=trust -d pg-350d)
sleep 10

echo "Running test…"
IP=$(docker inspect $ID | grep '"IPAddress"' | head -n 1 | awk '{ print $2 }' | sed 's/[",]//g')
psql -U postgres -h $IP -f ./test.sql | grep "ERROR"
ERR=$(test "$?" != "1")

if [[ -n "$ERR" ]]; then
echo "$ERR"
else
echo "Success."
fi

docker stop $ID &> /dev/null
exit $ERR
6 changes: 6 additions & 0 deletions test.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
CREATE EXTENSION cube;
CREATE TABLE vectors(vector cube);
-- vector of 350 dimensions
INSERT INTO vectors(vector) VALUES(cube(ARRAY[
0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1
]));

0 comments on commit 9968740

Please sign in to comment.