Skip to content

Config: Add support for PostgreSQL #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sokoow opened this issue Oct 24, 2018 · 99 comments
Open

Config: Add support for PostgreSQL #47

sokoow opened this issue Oct 24, 2018 · 99 comments
Labels
help wanted Help with this would be much appreciated! idea Feedback wanted / feature request priority Supported by early sponsors or popular demand

Comments

@sokoow
Copy link

sokoow commented Oct 24, 2018

Nice idea lads, I totally support it. Were youever wondering to switch to postgres ? For the deployment size I'm predicting this going to have, mysql might be a bit suboptimal choice :)


Details on possible implementation strategies can be found in this comment:

@lastzero
Copy link
Member

Not right now, but in general: anything to store a few tables will do... as simple and stable as possible... many developers are familiar with mysql, so that's my default when I start a new project. Tooling is also good.

sqlite is a very lean option, but obviously - if you run multiple processes or want to directly access / backup your data - it doesn't scale well or at all.

@lastzero lastzero added the idea Feedback wanted / feature request label Oct 24, 2018
@lastzero lastzero added the declined Cannot be merged or implemented at this time label Nov 17, 2018
@lastzero
Copy link
Member

It became clear that we have to build a single binary for distribution to reach broad adoption. Differences between SQL dialects are too large to have them abstracted away by our current ORM library, for example when doing date range queries. They are already different between MySQL and sqlite.

For those reasons we will not implement Postgres support for our MVP / first release. If you have time & energy, you are welcome to help us. I will close this issue for now, we can revisit it later when there is time and enough people want this 👍

@sokoow
Copy link
Author

sokoow commented Nov 17, 2018

ok fair point - I was raising this because cost of maintenance and troubleshooting at scale is much lower with postgres, and lots of succesfull projects have this support. so, from what you wrote about differences, it seems that you don't have pluggable orm-like generic read/write storage methods just yet, right ?

@lastzero
Copy link
Member

@sokoow We do use GORM, but it doesn't help with search queries that use database specific SQL.

If you like to dive into the subject, DATEDIFF is a great example: MySQL and SQL Server use DATEDIFF(), Postgres seems to prefer DATE_PART() whereas sqlite only has julianday().

It goes even deeper when you look into how tables are organized. You can't abstract and optimize at the same time. We want to provide the best performance to our users.

See

@sokoow
Copy link
Author

sokoow commented Nov 17, 2018

No that's a fair point, you're not the first project that has this challenge - something to think about on higher abstraction level.

@LeKovr
Copy link

LeKovr commented Nov 17, 2018

If you have time & energy, you are welcome to help us.

I guess it won't be so hard, so I would try

@lastzero
Copy link
Member

Getting it to work somehow at a single point in time is not hard, getting it to work with decent performance, finding developers who are comfortable with it and constantly maintaining the code is incredibly hard.

Keep in mind: You also need to maintain continuous integration infrastructure and effectively run all tests with every database.

@LeKovr
Copy link

LeKovr commented Nov 20, 2018

Ofcourse, tests might be same for every supported database and this might be solved within #60.
Also, sqlite support will probably entail some architectural changes (like search using Bleve and db driver dependent sql queries). It won't be hard to add postgresql support after that. And may be you'll find "developers who are comfortable with it" by this time

@lastzero
Copy link
Member

@LeKovr Did you see how we renamed the label from "rejected" to "descoped"? 😉 Yes indeed, later it might be great to add support for additional databases, if users actually need it in practice. Maybe everyone will be happy with an embedded database if we do it well. It is hard to predict.

What I meant was that if you change some code that involves SQL you might feel uncomfortable because you only have experience with one database, so you end up doing nothing. And that can be very dangerous for a project.

@LeKovr
Copy link

LeKovr commented Nov 20, 2018

@lastzero, You are right. May be later. There are more important things to do by now

@bobobo1618
Copy link

I had a quick look and it looks like the queries at least are trivial to add. The biggest problem is the models. The varbinary and datetime types are hard-coded into the models but don't exist in PostgreSQL, so the migration fails.

I'm not sure what the solution is here. I'd guess that the solution is to use the types Gorm expects (e.g. []byte instead of string when you want a column filled with bytes) but there's probably a good reason why it wasn't done that way to start with.

I'll play with it some more and see. It'd be nice to put everything in my PostgreSQL DB instead of SQLite.

@LeKovr
Copy link

LeKovr commented Jul 9, 2020

The varbinary and datetime types are hard-coded into the models

may be create domain varbinary... may helps

@bobobo1618
Copy link

All of the varbinary have different lengths and seem to have different purposes, so I don't think that'll help unfortunately.

@lastzero
Copy link
Member

lastzero commented Jul 9, 2020

Yes, we use binary for plain ASCII, especially when strings need to be sorted, indexed or compared and should not be normalized in any way.

@bobobo1618
Copy link

Shouldn't that be the case by default for string fields? I know MySQL does some stupid stuff with character encodings but it shouldn't modify plain ASCII, right?

@lastzero
Copy link
Member

lastzero commented Jul 9, 2020

But it uses 4 BYTES per ASCII character, so the index becomes very big. Also when you compare strings, it's somewhat more complex with unicode than just to compare bytes. I'm aware you can PROBABLY do the same with VARCHAR with the right settings and enough time to test, but it was hard to see business value in such experiments.

@bobobo1618
Copy link

But it uses 4 BYTES per ASCII character

As far as I can tell looking at the SQLite docs, the MySQL docs and PostgreSQL docs, that isn't the case at all. A varchar uses a 1-4 byte prefix depending on the size of the field but each byte of payload consumes one byte of storage.

Also when you compare strings, it's somewhat more complex with unicode than just to compare bytes.

But we're not storing unicode, we're storing ASCII in a field that could contain unicode. I don't think any of those edge-cases apply here.

I'm aware you can PROBABLY do the same with VARCHAR with the right settings and enough time to test, but it was hard to see business value in such experiments.

Fair enough.

Also, queries aren't so straightforward after all. The queries extensively use 0 and 1 instead of false and true, which isn't supported by PostgreSQL (and as a side note, makes the query more difficult to read, since you don't know if it's meant to be a boolean comparison or an integer comparison).

I managed to do a little bit of cleanup of that and managed to get something working at least.

@lastzero
Copy link
Member

lastzero commented Jul 9, 2020

Not in the index, check again. Maybe also not in memory when comparing.

@bobobo1618
Copy link

I couldn't find documentation so I just ran a quick test to see.

import sqlite3
c = sqlite3.connect('test.db')
c.execute('CREATE TABLE things (integer PRIMARY KEY, testcolumn varchar(32))')
c.execute('CREATE INDEX test_idx ON things(testcolumn)')
for x in range(0, 10000):
    c.execute('INSERT INTO things(testcolumn) VALUES (?)', (hex(x * 472882049 % 15485867),))
c.commit()

Which resulted in 79.3k of actual data:

SELECT SUM(length(testcolumn)) FROM things;
79288

I analyzed it with sqlite3_analyzer.

Table:

Bytes of storage consumed......................... 167936
Bytes of payload.................................. 109288      65.1%
Bytes of metadata................................. 50517       30.1%

Index:

Bytes of storage consumed......................... 163840
Bytes of payload.................................. 129160      78.8%
Bytes of metadata................................. 30476       18.6%

So for 79288 bytes of actual data sitting in the column, we have 109288 bytes total for the data itself (1.38 bytes per byte) and 129160 for the index (1.63 bytes per byte).

I repeated the test with varbinary(32) instead of varchar(32) and got precisely the same result, down to the exact number of bytes.

So I don't see any evidence that a varchar consumes more space in an index than a varbinary.

@lastzero
Copy link
Member

lastzero commented Jul 9, 2020

You'll find some information on this page: https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-conversion.html

You might also want to read this and related RFCs: https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings

Note that Microsoft, as far as I know, still uses UCS-2 instead of UTC-8 in Windows, for all the reasons I mentioned. Maybe they switched to UTF-16. Their Linux database driver for SQL Server used null terminated strings, guess how well this works with UCS-2. Not at all.

For MySQL, we use 4 byte UTF8, which needs 4 bytes in indexes unless somebody completely refactored InnoDB in the meantime. Note that the MySQL manual was wrong on InnoDB for a long time, insisting that MySQL doesn't know or support indexed organized tables while InnoDB ONLY uses index organized tables.

When you're done with this, enjoy learning about the four Unicode normalization forms: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization

Did you know there's a difference between Linux und OS X? Apple uses decomposed, so you need to convert all strings when copying files. Their bundled command line tools were not compiled with iconv support, so you had to compile it yourself. Some of this still not fixed until today.

@lastzero
Copy link
Member

lastzero commented Jul 9, 2020

Note that Sqlite ignores VARBINARY and probably also VARCHAR to some degree. It uses dynamic typing. That's why all string keys are prefixed with at least once non-numeric character. It would convert the value to INT otherwise and comparisons with binary data or strings would fail:

SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statements that work on statically typed databases should work the same way in SQLite. However, the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases.

See https://www.sqlite.org/datatype3.html

@bobobo1618
Copy link

I'm aware of Unicode encodings and some of the important differences between them. I still don't see anything in the docs indicating that using a varchar containing ASCII will consume 4 bytes in an index but I'll take your word for it.

To be clear, in case there's some miscommunication going on, my assumption is that even if the column is switched to varchar, plain ASCII (i.e. the first 128 unicode code points, which are all encoded with 8 bits) will still be stored in it. That being the case, 1 character = 1 byte and comparisons are bog-standard string comparisons.

In other news, here's a PoC of PostgreSQL mostly working. It's intended as an overview of the work that needs to be done, not as a serious proposal.

@bobobo1618
Copy link

Actually on string vs. []byte it occurred to me, if you only want to store ASCII here and don't want to treat this like a thing that's semantically like a string, is it a bad thing to use a []byte type? Is it the hassle of converting to/from strings when dealing with other APIs that's offputting?

With []byte, gorm will choose an appropriate type for each DB by default.

@bobobo1618
Copy link

bobobo1618 commented Jul 9, 2020

Ah, looks like the string vs. []byte is mostly solved by gorm V2 anyhow. You'll just be able to put type:bytes in the tag and it'll handle it for you.

@lastzero
Copy link
Member

lastzero commented Jul 10, 2020

See https://mathiasbynens.be/notes/mysql-utf8mb4

The InnoDB storage engine has a maximum index length of 767 bytes, so for utf8 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8 columns with indexes longer than 191 characters, you will need to index a smaller number of characters when using utf8mb4. (Because of this, I had to change some indexed VARCHAR(255) columns to VARCHAR(191).)

Maybe we can switch to []byte in Go. Let's revisit this later, there are a ton of items on our todo with higher priority and that's by far not the only change we need to support other databases.

Edit: As you can see in the code, I already implemented basic support for multiple dialects when we added Sqlite. For Postgres there's more to consider, especially data types. Sqlite pretty much doesn't care. Bool columns and date functions might also need attention. I'm fully aware Postgres is very popular in the GIS community, so it will be worth adding when we have the resources needed to implement and maintain it (see next comment).

@lastzero
Copy link
Member

We also need to consider the impact on testing and continuous integration when adding support for additional storage engines and APIs. That's often underestimated and causes permanent overhead. From a contributor's perspective, it might just be a one time pull request. Anyhow, we value your efforts and feedback! Just so that you see why we're cautious.

@keif888
Copy link
Contributor

keif888 commented Feb 26, 2025

I have a blocking issue, and my knowledge of PostgreSQL consists of what I have read in the documentation in the last couple of days.
MariaDB for PhotoPrism is using specific character sets and collations.
They are not deterministic, and are case insensitive.
The default collations for PostgreSQL are all deterministic.
This is causing some queries to fail.

MariaDB startup specifies the following two settings in the startup command for dealing with character strings:

--character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

Can someone please let me know how to setup the equivalent in PostgreSQL?

@pashagolub
Copy link

pashagolub commented Feb 26, 2025

@keif888 do you need these params for a new database(s) or for the whole instance (all databases)?

We can specify default params for the whole instance, so every database created will inherit those. Or we can control those setting per database level.

Where can I find the startup command for MariaDB to guess the best choice?

@pashagolub
Copy link

pashagolub commented Feb 26, 2025

Fortunately I had only completed 4 with consolidated SQL statements. The 5th one required separate statements as PostgreSQL can NOT do a MAX on a bytea column

Postgres can do MAX but either we need to type cast explicitly, or to create a custom aggregate

pgwatch=# with vals(b) as (
  values ('one'::bytea), ('two'), ('three')
)
select max(b::text) from vals;
   max
----------
 \x74776f
(1 row)

But I feel something is terribly wrong if we're trying to get max of bytea

@keif888
Copy link
Contributor

keif888 commented Feb 27, 2025

@pashagolub the MariaDB Docker startup is here:

command: --port=4001 --innodb-strict-mode=1 --innodb-buffer-pool-size=256M --transaction-isolation=READ-COMMITTED --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci --max-connections=512 --innodb-rollback-on-timeout=OFF --innodb-lock-wait-timeout=120

The max of a bytea I solved as per this --> MAX(convert_from(m.thumb,'UTF8'))

My understanding is that MAX of a bytea is required because PhotoPrism was developed needing both case insensitive and case sensitive matching. And m.thumb (above) is an example of a case sensitive match. I used convert_from to return the value back to a string before doing the MAX so as to match what SQLite and MariaDB are doing. (I do need to retest this to make sure that it really is achieving the same result).

Where case sensitive is needed the developers used VARBINARY(), and where case insensitive is needed they used VARCHAR().
See where MarkerName is VARCHAR, and Thumb is VARBINARY.
Gorm V1

Definition DBMS Result
MarkerName string gorm:"type:VARCHAR(160);" SQLite marker_name VARCHAR(160)
MarkerName string gorm:"type:VARCHAR(160);" MariaDB marker_name VARCHAR(160)
MarkerName string gorm:"type:VARCHAR(160);" PostgreSQL marker_name VARCHAR(160)
Thumb string gorm:"type:VARBINARY(128);index;default:'';" SQLite thumb VARBINARY(128)
Thumb string gorm:"type:VARBINARY(128);index;default:'';" MariaDB thumb VARBINARY(128)
Thumb string gorm:"type:VARBINARY(128);index;default:'';" PostgreSQL error

Gorm V2

Definition DBMS Result
MarkerName string gorm:"size:160;" SQLite marker_name text
MarkerName string gorm:"size:160;" MariaDB marker_name VARCHAR(160)
MarkerName string gorm:"size:160;" PostgreSQL marker_name VARCHAR(160)
Thumb string gorm:"type:bytes;size:128;index;default:'';" SQLite thumb blob
Thumb string gorm:"type:bytes;size:128;index;default:'';" MariaDB thumb VARBINARY(128)
Thumb string gorm:"type:bytes;size:128;index;default:'';" PostgreSQL thumb bytea

@lastzero
Copy link
Member

@keif888 Is this the query you are having trouble with?

switch DbDialect() {
case MySQL:
res = Db().Exec(`UPDATE subjects LEFT JOIN (
SELECT m.subj_uid, m.q, MAX(m.thumb) AS marker_thumb
FROM markers m
JOIN files f ON f.file_uid = m.file_uid AND f.deleted_at IS NULL
JOIN photos p ON ?
WHERE m.subj_uid <> '' AND m.subj_uid IS NOT NULL
AND m.marker_invalid = 0 AND m.thumb IS NOT NULL AND m.thumb <> ''
GROUP BY m.subj_uid, m.q
) b ON b.subj_uid = subjects.subj_uid
SET thumb = marker_thumb WHERE ?`,
photosJoin,
condition,
)

If so, don't worry about the MAX(), because from what I can see/remember, it's just a way to make sure that the markers.thumb value assigned to subject.thumb is deterministic and not empty (so that the thumb doesn't break or change all the time):

  1. Given enough time to think about it, there are probably other (better) ways to solve this problem (as long as the query is deterministic and no empty value is set as thumb, it doesn't seem to matter too much how the thumb is selected).

  2. It is planned (and strongly requested by our users) that the cover images can be set manually from the UI (so the queries to set them automatically will become less important), see Albums: Add "Set as Album Cover" action to change the cover #383.

You may find similar patterns elsewhere and are welcome to suggest improvements or solve the same problem in a different way for PostgreSQL/SQLite! In this case, please add a code comment so we can find the relevant queries later to check them and also use them for MariaDB if possible.

@keif888
Copy link
Contributor

keif888 commented Feb 27, 2025

@lastzero Yes that was the one I was looking at. That one was easy to make work as it does in the other DBMS'.
The harder one was searchPhotos as the way that MariaDB and SQLite handle GROUP BY is not the same as PostgreSQL. That is going now, and hopefully in a way that doesn't add maintenance nightmares.

I am working to get PostgreSQL working the same way that MariaDB does, before trying to refactor the way that existing SQL statements work.

There a quite a few differences between the SQL DML engines in the 3 DBMS', which makes the complex queries difficult.

BTW: I had to change the PostgreSQL version that I had chosen from 17-alpine to 16-alpine as the PhotoPrism container is using Ubuntu 16.6-0ubuntu0.24.10.1, and that was preventing backup and restore from working.

@keif888
Copy link
Contributor

keif888 commented Feb 28, 2025

An now another nasty issue.
After many hours I haven't found a way around this one, yet...

Gorm is returning timestamp's with the time.Local instead of time.UTC.
I have the server running as UTC, the connection string with TimeZone=UTC.
The times are added to the server correctly.
Just when GO gets them back, they have the wrong timezone attached.

eg.

photo_test.go:1397: 
     |         	Error Trace:	/go/src/github.com/photoprism/photoprism/internal/entity/photo_test.go:1397
     |         	Error:      	Not equal: 
     |         	            	expected: time.Date(2016, time.November, 11, 8, 7, 18, 0, time.UTC)
     |         	            	actual  : time.Date(2016, time.November, 11, 8, 7, 18, 0, time.Local)
     |         	            	
     |         	            	Diff:
     |         	Test:       	TestPhoto_UnscopedSearch/Ok

There is a fix for this in the pgx driver, which Gorm is using, but Gorm is unable to utilise that fix from what I can discover.

The fix is to add code similar to this. See also comment here.

conn.TypeMap().RegisterType(&pgtype.Type{
			Name:  "timestamptz",
			OID:   pgtype.TimestamptzOID,
			Codec: &pgtype.TimestamptzCodec{ScanLocation: time.UTC},
		})

BUT, that has to be done if you are using pgx directly, which Gorm doesn't.
It uses it via database/sql or via pgx's stdlib, and neither of those allow access to TypeMap().

There is a possibility that I can do something similar to this Gorm test, but it's using database/sql, and I need to use pgx directly.

I have tried changing the server's timezone, the databases timezone, the connection string's timezone. None of these change the returned value (always Local). PostgreSQL is working as designed.

There is an issue on Gorm similar to it here. It's open, but I think it has 2 issues confused. 1st issue is incorrect timezone in connection string so PsotegreSQL was changing the timestamp on the way in, and the 2nd is the one that we have with pgx marking it as Local.

As an FYI: The test works if I add a .UTC() as shown below (and as per a comment in the issue), but there is no way that is an acceptable solution.

		assert.Equal(t, photo1.TakenAt, photo.TakenAt.UTC())

@keif888
Copy link
Contributor

keif888 commented Mar 1, 2025

I have raised an issue against gorm for the timestamp location <> UTC.
I replicated it in the go-gorm playground.

go-gorm/gorm#7377

@keif888
Copy link
Contributor

keif888 commented Mar 1, 2025

Good News

I worked out how to get a pgxpool into Gorm, so I have the timestamptz working as UTC now.
I've included the work around in the issue noted above, and added it to my branch. That makes the internal/entity tests all pass now.
147 fixed tests done.

Only have the collation issue to solve now, as I'm assuming that will fix the 26 api tests that failed (crossed fingers)
123 failed tests to go.

Bad News

Collation in PostgreSQL can be set at a server or database level, but.... Only if it is deterministic. And we need DETERMINISTIC=false, which can only be done on by creating a collation within a database and applying that to table columns/indexes.

These don't work

  • 16-Alpine
  • 16-Bookworm
POSTGRES_INITDB_ARGS: "--locale-provider=icu --icu-locale=und-u-ks-level2" doesn't error...

postgres-1      | Using language tag "und-u-ks-level2" for ICU locale "und-u-ks-level2".
postgres-1      | The database cluster will be initialized with this locale configuration:
postgres-1      |   provider:    icu
postgres-1      |   ICU locale:  und-u-ks-level2
postgres-1      |   LC_COLLATE:  en_US.utf8
postgres-1      |   LC_CTYPE:    en_US.utf8
postgres-1      |   LC_MESSAGES: en_US.utf8
postgres-1      |   LC_MONETARY: en_US.utf8
postgres-1      |   LC_NUMERIC:  en_US.utf8
postgres-1      |   LC_TIME:     en_US.utf8
postgres-1      | The default database encoding has accordingly been set to "UTF8".
postgres-1      | The default text search configuration will be set to "english".

BUT, it works in deterministic fashion, so Aaa <> AAA.
Ditto for:

CREATE DATABASE acceptance OWNER acceptance TEMPLATE "template0" LOCALE_PROVIDER "icu" ICU_LOCALE "und-u-ks-level2";

This is what the collation string means

und-u <-- Unicode
ks <-- case sensitive strength
level2 <-- case insensitive, accent sensitive

This works, but...

Only way, and I've read the documentation now, is to create a database, create a collation, and then on every varchar column in the database add the COLLATION.

CREATE COLLATION utf8unicodeci (PROVIDER=icu, locale='@colStrength=secondary', DETERMINISTIC=false);
-- This creates a collation that is und-u-ks-level2
create table test_coll (
col1 varchar(30)
, col2 varchar(30) COLLATE utf8unicodeci
, col3 varchar(30) COLLATE utf8unicodeci);

insert into test_coll values ('asdf','asdf','asdf'), ('asdF','asdF','asdf'), ('asdf','asdf','asdF');
select * from test_coll where col1 = 'asdf';
-- 2 records returned as col1 doesn't have a COLLATE
select * from test_coll where col2 = 'ASDF';
-- 3 records returned as col2 has the COLLATE

The problem is that Gorm doesn't have a way to do that only for PostgreSQL. Unless I modify gorm.io/driver/postgres to have an option to add the collation's when migrating. And that's a lot of reflection code that does my head in when I try and work on it.

And what I am doing about it

Other option is to change every varchar based equality/like check to be lower cased.
eg. Any query that does a like or = against structs like this, which are varchar in the databases:

	AlbumTitle       string         `gorm:"size:160;index;" json:"Title" yaml:"Title"`
	AlbumLocation    string         `gorm:"size:160;" json:"Location" yaml:"Location,omitempty"`
	AlbumCategory    string         `gorm:"size:100;index;" json:"Category" yaml:"Category,omitempty"`
	AlbumCaption     string         `gorm:"size:1024;" json:"Caption" yaml:"Caption,omitempty"`

I am working down that path now, as I can't see modifying the driver being a simple change.
But, it is making the code base messy for want of a better term.
eg.
keif888@7867b5f

	// Filter by title.
	if txt.NotEmpty(frm.Title) {
		likeString := "photos.photo_title"
		if entity.DbDialect() == entity.Postgres {
			likeString = "lower(photos.photo_title)"
		}
		where, values := OrLike(likeString, frm.Title)
		s = s.Where(where, values...)
	}

And I'm concerned that I will miss some if they are buried in First and Find etc.
We will see how the unit testing ends up.

@lastzero
Copy link
Member

lastzero commented Mar 1, 2025

@keif888 Leaving aside case-sensitivity, Unicode is inherently "non-deterministic" when it comes to comparisons, since there are (often many) different encodings for the same characters:

  • Even if you focus only on UTF-8, the encoding can be composed and decomposed, and there are equivalent characters.
  • A common solution to this is to index/store the values in canonical form, so that when you compare them, you can do so byte by byte. When you look at it this way, the comparisons are completely deterministic and straightforward, so to say they're "non-deterministic" would be misleading.
  • Only if you play dumb and store the supposed Unicode strings as bytes without any validation or normalization, you will not be able to sort/compare them properly according to Unicode rules.
  • Now, I know that MariaDB's utf8mb4_unicode_ci collation goes a step further and also supports expansions, i.e. when a character is compared as equal to combinations of other characters. But that's nice to have, and if it's slow or complicated with PostgreSQL, it's certainly something our users can live with.

@keif888
Copy link
Contributor

keif888 commented Mar 2, 2025

Status Report

I have created a draft pull request as all unit tests now pass, and PhotoPrism starts and seems to work without errors and missing photos.

PhotoPrism starts, and allows adding photos. Rescanning finds and adds photos. Search works (flower and Flower both find the same set of flowers for example.
Database encoding is UTF8.
Collation is the OOTB, which in my case defaulted to en_US.utf8 from memory.

To Do:

  • Find out how to get psql into docker without having to run the command manually every time I restart the docker container
  • Investigate Inconsistencies
  • Continue investigation of SQL errors (unique key, foreign key violations) generated from unit tests to ensure they are all from FirstOrCreate functionality, or deliberate error condition testing
  • Create performance data creation functions for PostgreSQL
  • Run performance tests
  • Rerun all unit tests against all three DBMS' just in case

@pashagolub
Copy link

Find out how to get psql into docker without having to run the command manually every time I restart the docker container

What do you mean by this? psql is the part of Postgres, so it's the part of Docker container. What command do you need to run manually? Why do you need to run that command after container restart?

@keif888
Copy link
Contributor

keif888 commented Mar 3, 2025

FYI: I have fixed the issue around postgresql-client being missing. There is a need to ensure that it's included in the base photoprism:develop image though.

@pashagolub
I used the wrong terms above regards psql and containers.
To clear up any confusion.

PhotoPrism development has a number of services that are initiated from a compose.yaml file.

  • photoprism service
  • DBMS's (one or more of)
    • postgres service
    • mariadb service
  • traefik service
  • dummy-webdav service
  • dummy-oidc service
  • dummy-ldap service
  • keycloak service

The photoprism app within the photoprism service has to be able to communicate with the postgres service via command line tools for backup and restore.
Some make commands (within make terminal) which run within the photoprism service need the psql command.

Specific commands needed are:

I had a much larger post following the above, with links to everything I had done, and realised that there was an option to call specific make file targets via the PHOTOPRISM_INIT environment setting for photoprism.
So I have updated the compose.postgres.yaml adding postgresql in the list of items to init, and after rebuilding the photoprism service, it is all working.

For those interested:
I had to run the following:

docker compose -f compose.postgres.yaml build
make docker-postgresql

The docker compose build is needed to ensure that the updated makefile and scripts are included in the service, and on restart I saw the following:

photoprism-1    | init: postgresql
photoprism-1    | /scripts/install-postgresql.sh postgresql-client
photoprism-1    | Installing "postgresql-client" distribution packages for AMD64...
photoprism-1    | Get:1 https://deb.nodesource.com/node_22.x nodistro InRelease [12.1 kB]
photoprism-1    | Get:2 https://dl.google.com/linux/chrome/deb stable InRelease [1825 B]
photoprism-1    | Get:3 http://security.ubuntu.com/ubuntu oracular-security InRelease [126 kB]
photoprism-1    | Get:4 http://archive.ubuntu.com/ubuntu oracular InRelease [265 kB]
photoprism-1    | Get:5 https://deb.nodesource.com/node_22.x nodistro/main amd64 Packages [5206 B]
photoprism-1    | Get:6 https://dl.google.com/linux/chrome/deb stable/main amd64 Packages [1211 B]
photoprism-1    | Get:7 http://security.ubuntu.com/ubuntu oracular-security/universe amd64 Packages [167 kB]
photoprism-1    | Get:8 http://archive.ubuntu.com/ubuntu oracular-updates InRelease [126 kB]
photoprism-1    | Get:9 http://security.ubuntu.com/ubuntu oracular-security/main amd64 Packages [277 kB]
photoprism-1    | Get:10 http://archive.ubuntu.com/ubuntu oracular-backports InRelease [126 kB]
photoprism-1    | Get:11 http://security.ubuntu.com/ubuntu oracular-security/multiverse amd64 Packages [10.4 kB]
photoprism-1    | Get:12 http://security.ubuntu.com/ubuntu oracular-security/restricted amd64 Packages [142 kB]
photoprism-1    | Get:13 http://archive.ubuntu.com/ubuntu oracular/main amd64 Packages [1835 kB]
photoprism-1    | Get:14 http://archive.ubuntu.com/ubuntu oracular/universe amd64 Packages [19.6 MB]
photoprism-1    | Get:15 http://archive.ubuntu.com/ubuntu oracular/multiverse amd64 Packages [308 kB]
photoprism-1    | Get:16 http://archive.ubuntu.com/ubuntu oracular/restricted amd64 Packages [67.0 kB]
photoprism-1    | Get:17 http://archive.ubuntu.com/ubuntu oracular-updates/main amd64 Packages [392 kB]
photoprism-1    | Get:18 http://archive.ubuntu.com/ubuntu oracular-updates/restricted amd64 Packages [148 kB]
photoprism-1    | Get:19 http://archive.ubuntu.com/ubuntu oracular-updates/universe amd64 Packages [232 kB]
photoprism-1    | Get:20 http://archive.ubuntu.com/ubuntu oracular-updates/multiverse amd64 Packages [11.4 kB]
photoprism-1    | Get:21 http://archive.ubuntu.com/ubuntu oracular-backports/universe amd64 Packages [5417 B]
photoprism-1    | Fetched 23.8 MB in 8s (2943 kB/s)
photoprism-1    | Reading package lists...
photoprism-1    | debconf: unable to initialize frontend: Dialog
photoprism-1    | debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
photoprism-1    | debconf: falling back to frontend: Readline
photoprism-1    | debconf: unable to initialize frontend: Readline
photoprism-1    | debconf: (This frontend requires a controlling tty.)
photoprism-1    | debconf: falling back to frontend: Teletype
photoprism-1    | dpkg-preconfigure: unable to re-open stdin: 
photoprism-1    | Selecting previously unselected package libpq5:amd64.
(Reading database ... 53983 files and directories currently installed.)
photoprism-1    | Preparing to unpack .../libpq5_16.6-0ubuntu0.24.10.1_amd64.deb ...
photoprism-1    | Unpacking libpq5:amd64 (16.6-0ubuntu0.24.10.1) ...
photoprism-1    | Selecting previously unselected package postgresql-client-common.
photoprism-1    | Preparing to unpack .../postgresql-client-common_262_all.deb ...
photoprism-1    | Unpacking postgresql-client-common (262) ...
photoprism-1    | Selecting previously unselected package postgresql-client-16.
photoprism-1    | Preparing to unpack .../postgresql-client-16_16.6-0ubuntu0.24.10.1_amd64.deb ...
photoprism-1    | Unpacking postgresql-client-16 (16.6-0ubuntu0.24.10.1) ...
photoprism-1    | Selecting previously unselected package postgresql-client.
photoprism-1    | Preparing to unpack .../postgresql-client_16+262_all.deb ...
photoprism-1    | Unpacking postgresql-client (16+262) ...
photoprism-1    | Setting up postgresql-client-common (262) ...
photoprism-1    | Setting up libpq5:amd64 (16.6-0ubuntu0.24.10.1) ...
photoprism-1    | Setting up postgresql-client-16 (16.6-0ubuntu0.24.10.1) ...
photoprism-1    | update-alternatives: using /usr/share/postgresql/16/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
photoprism-1    | Setting up postgresql-client (16+262) ...
photoprism-1    | Processing triggers for libc-bin (2.40-1ubuntu3.1) ...
photoprism-1    | Done.

@pashagolub
Copy link

BTW: I had to change the PostgreSQL version that I had chosen from 17-alpine to 16-alpine as the PhotoPrism container is using Ubuntu 16.6-0ubuntu0.24.10.1, and that was preventing backup and restore from working.

Sorry. I'm trying to catch up but you're too fast for me. :) Would you please elaborate on this? Because I don't see Ubuntu 16 here

@pashagolub
Copy link

Another thing I want to emphasize. It's better to use a Postgres packages to install packages not system shipped, e.g.

...
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
    && apt-get -qy install curl gnupg postgresql-common apt-transport-https lsb-release \
    && sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y \
    && curl -L "https://www.postgresql.org/media/keys/ACCC4CF8.asc" | apt-key add - \
    && apt-get -qy install postgresql-17 postgresql-plpython3-17 postgresql-17-pg-qualstats \
    && apt-get purge -y --auto-remove \
    && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
...

@keif888
Copy link
Contributor

keif888 commented Mar 3, 2025

Hi,

I have the output of the os version and psql version from my photoprism service below.

OS Version

photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.10"
NAME="Ubuntu"
VERSION_ID="24.10"
VERSION="24.10 (Oracular Oriole)"
VERSION_CODENAME=oracular
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=oracular
LOGO=ubuntu-logo

psql version

photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ psql --version
psql (PostgreSQL) 16.6 (Ubuntu 16.6-0ubuntu0.24.10.1)
photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ 

Ubuntu includes the postgresql-client in their list of packages (I'm probably mangling the reality here, I'm not a linux expert), but it's the 16.6 version. And that can not connect to the alpine-17 postgresql service as it's a lower version.
So I chose to use alpine-16 instead. It's still a supported version.
When the ubuntu version is updated for the photoprism service, then (assuming that ubuntu has changed their postgresql version) the compose.postgres.yaml file can be updated to alpine-17.

@pashagolub
Copy link

I believe, we don't need to rely on any packages shipped with OS. We are in charge of what and how to be used. That said, it's simple to specify the exact version of client we want to have

@keif888
Copy link
Contributor

keif888 commented Mar 3, 2025

Yes, and that part I would leave to people that know what they are doing. I know that you have to deal with keys to allow apt to reference other repositories, but then it all starts getting a bit fuzzy. I am in no way an expert on linux or postgresql.

@lastzero
Copy link
Member

lastzero commented Mar 3, 2025

There should already be an install script in /scripts/dist for MariaDB binaries, so you could also add one for PostgreSQL with a fallback to the default system packages.

@keif888
Copy link
Contributor

keif888 commented Mar 3, 2025

Preliminary benchmarks of sqlite vs postgres, mariadb vs postres and sqlite vs mariadb.
I create a database of 100k randomly generated photos for each of the DBMS, and then execute some benchmarks against it.
All databases are managed by gorm, so they have the same table, foreign key and index structures.
There is no tuning of the postgres service (I haven't read up on how to do that yet).
Overall postgresql is faster than sqlite, and slower than mariadb.

photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ go run golang.org/x/perf/cmd/benchstat storage/Benchmark100k.sqlite.txt storage/Benchmark100k.postgres.txt 
goos: linux
goarch: amd64
pkg: github.com/photoprism/photoprism/internal/performancetest
cpu: AMD Ryzen 7 5700X 8-Core Processor             
                                │ storage/Benchmark100k.sqlite.txt │   storage/Benchmark100k.postgres.txt   │
                                │              sec/op              │    sec/op      vs base                 │
100k/CreateDeleteAlbum-4                              5.036m ±  6%   28.101m ±  1%  +458.02% (p=0.000 n=10)
100k/ListAlbums-4                                     287.5m ±  6%    179.1m ±  5%   -37.69% (p=0.000 n=10)
100k/CreateDeleteCamera-4                             3.766m ± 11%    1.286m ±  6%   -65.86% (p=0.000 n=10)
100k/CreateDeleteCellAndPlace-4                       7.165m ± 16%   14.739m ±  3%  +105.70% (p=0.000 n=10)
100k/FileRegenerateIndex-4                            4.790m ±  7%    1.762m ±  3%   -63.21% (p=0.000 n=10)
100k/CreateDeletePhoto-4                              60.58m ±  7%    43.29m ±  7%   -28.55% (p=0.000 n=10)
100k/ListPhotos-4                                    291.25m ±  3%    74.86m ± 11%   -74.30% (p=0.000 n=10)
geomean                                               22.90m          17.70m         -22.69%
photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ go run golang.org/x/perf/cmd/benchstat storage/Benchmark100k.sqlite.txt storage/Benchmark100k.mariadb.txt 
goos: linux
goarch: amd64
pkg: github.com/photoprism/photoprism/internal/performancetest
cpu: AMD Ryzen 7 5700X 8-Core Processor             
                                │ storage/Benchmark100k.sqlite.txt │  storage/Benchmark100k.mariadb.txt   │
                                │              sec/op              │    sec/op     vs base                │
100k/CreateDeleteAlbum-4                              5.036m ±  6%   2.658m ±  4%  -47.22% (p=0.000 n=10)
100k/ListAlbums-4                                     287.5m ±  6%   113.2m ±  3%  -60.62% (p=0.000 n=10)
100k/CreateDeleteCamera-4                             3.766m ± 11%   1.521m ±  4%  -59.61% (p=0.000 n=10)
100k/CreateDeleteCellAndPlace-4                       7.165m ± 16%   3.744m ±  7%  -47.75% (p=0.000 n=10)
100k/FileRegenerateIndex-4                            4.790m ±  7%   1.257m ± 11%  -73.76% (p=0.000 n=10)
100k/CreateDeletePhoto-4                              60.58m ±  7%   32.33m ±  2%  -46.63% (p=0.000 n=10)
100k/ListPhotos-4                                     291.2m ±  3%   478.6m ±  5%  +64.31% (p=0.000 n=10)
geomean                                               22.90m         11.88m        -48.14%
photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ go run golang.org/x/perf/cmd/benchstat storage/Benchmark100k.mariadb.txt storage/Benchmark100k.postgres.txt 
goos: linux
goarch: amd64
pkg: github.com/photoprism/photoprism/internal/performancetest
cpu: AMD Ryzen 7 5700X 8-Core Processor             
                                │ storage/Benchmark100k.mariadb.txt │   storage/Benchmark100k.postgres.txt   │
                                │              sec/op               │    sec/op      vs base                 │
100k/CreateDeleteAlbum-4                               2.658m ±  4%   28.101m ±  1%  +957.27% (p=0.000 n=10)
100k/ListAlbums-4                                      113.2m ±  3%    179.1m ±  5%   +58.21% (p=0.000 n=10)
100k/CreateDeleteCamera-4                              1.521m ±  4%    1.286m ±  6%   -15.46% (p=0.000 n=10)
100k/CreateDeleteCellAndPlace-4                        3.744m ±  7%   14.739m ±  3%  +293.66% (p=0.000 n=10)
100k/FileRegenerateIndex-4                             1.257m ± 11%    1.762m ±  3%   +40.24% (p=0.000 n=10)
100k/CreateDeletePhoto-4                               32.33m ±  2%    43.29m ±  7%   +33.88% (p=0.000 n=10)
100k/ListPhotos-4                                     478.56m ±  5%    74.86m ± 11%   -84.36% (p=0.000 n=10)
geomean                                                11.88m          17.70m         +49.06%
photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ 

@keif888
Copy link
Contributor

keif888 commented Mar 3, 2025

There should already be an install script in /scripts/dist for MariaDB binaries, so you could also add one for PostgreSQL with a fallback to the default system packages.

I cloned that to create one for PostgreSQL.
I have updated it to get the latest version of postgresql-client, and updated the yaml file to 17-alpine.
https://github.com/keif888/photoprism/blob/PostgreSQL/scripts/dist/install-postgresql.sh
https://github.com/keif888/photoprism/blob/PostgreSQL/compose.postgres.yaml

Performance comparison:

photoprism@250221-oracular:/go/src/github.com/photoprism/photoprism$ go run golang.org/x/perf/cmd/benchstat storage/Benchmark100k.postgres.txt storage/Benchmark100k.postgres-17.txt 
goos: linux
goarch: amd64
pkg: github.com/photoprism/photoprism/internal/performancetest
cpu: AMD Ryzen 7 5700X 8-Core Processor             
                                │ storage/Benchmark100k.postgres.txt │ storage/Benchmark100k.postgres-17.txt │
                                │               sec/op               │    sec/op      vs base                │
100k/CreateDeleteAlbum-4                                28.10m ±  1%    23.37m ±  1%  -16.85% (p=0.000 n=10)
100k/ListAlbums-4                                       179.1m ±  5%    183.7m ±  1%        ~ (p=0.075 n=10)
100k/CreateDeleteCamera-4                               1.286m ±  6%    1.286m ± 16%        ~ (p=0.796 n=10)
100k/CreateDeleteCellAndPlace-4                         14.74m ±  3%    15.94m ± 16%   +8.16% (p=0.023 n=10)
100k/FileRegenerateIndex-4                              1.762m ±  3%    1.869m ±  8%   +6.02% (p=0.043 n=10)
100k/CreateDeletePhoto-4                                43.29m ±  7%    44.80m ±  8%        ~ (p=0.143 n=10)
100k/ListPhotos-4                                       74.86m ± 11%    70.73m ± 17%        ~ (p=0.165 n=10)
geomean                                                 17.70m          17.59m         -0.63%

@keif888
Copy link
Contributor

keif888 commented Mar 7, 2025

Status Report

All unit tests pass.
Latest fixes from gorm2 branch merged in.
No unexpected SQL/gorm errors are being reported.
Inconsistency with MariaDB is an issue within gorm. It does execute the update, but doesn't report the number of records affected correctly.
Pull is ready for review.

lastzero added a commit that referenced this issue Mar 29, 2025


Signed-off-by: Michael Mayer <michael@photoprism.app>
lastzero added a commit that referenced this issue Mar 29, 2025
Signed-off-by: Michael Mayer <michael@photoprism.app>
lastzero added a commit that referenced this issue Apr 2, 2025
Signed-off-by: Michael Mayer <michael@photoprism.app>
lastzero added a commit that referenced this issue Apr 2, 2025
Signed-off-by: Michael Mayer <michael@photoprism.app>
lastzero added a commit that referenced this issue Apr 2, 2025
Signed-off-by: Michael Mayer <michael@photoprism.app>
lastzero added a commit that referenced this issue Apr 9, 2025
Signed-off-by: Michael Mayer <michael@photoprism.app>
lastzero added a commit that referenced this issue Apr 9, 2025
Signed-off-by: Michael Mayer <michael@photoprism.app>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Help with this would be much appreciated! idea Feedback wanted / feature request priority Supported by early sponsors or popular demand
Projects
Status: Help Wanted
Development

No branches or pull requests