-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Config: Add support for PostgreSQL #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not right now, but in general: anything to store a few tables will do... as simple and stable as possible... many developers are familiar with mysql, so that's my default when I start a new project. Tooling is also good. sqlite is a very lean option, but obviously - if you run multiple processes or want to directly access / backup your data - it doesn't scale well or at all. |
It became clear that we have to build a single binary for distribution to reach broad adoption. Differences between SQL dialects are too large to have them abstracted away by our current ORM library, for example when doing date range queries. They are already different between MySQL and sqlite. For those reasons we will not implement Postgres support for our MVP / first release. If you have time & energy, you are welcome to help us. I will close this issue for now, we can revisit it later when there is time and enough people want this 👍 |
ok fair point - I was raising this because cost of maintenance and troubleshooting at scale is much lower with postgres, and lots of succesfull projects have this support. so, from what you wrote about differences, it seems that you don't have pluggable orm-like generic read/write storage methods just yet, right ? |
@sokoow We do use GORM, but it doesn't help with search queries that use database specific SQL. If you like to dive into the subject, It goes even deeper when you look into how tables are organized. You can't abstract and optimize at the same time. We want to provide the best performance to our users. See |
No that's a fair point, you're not the first project that has this challenge - something to think about on higher abstraction level. |
I guess it won't be so hard, so I would try |
Getting it to work somehow at a single point in time is not hard, getting it to work with decent performance, finding developers who are comfortable with it and constantly maintaining the code is incredibly hard. Keep in mind: You also need to maintain continuous integration infrastructure and effectively run all tests with every database. |
Ofcourse, tests might be same for every supported database and this might be solved within #60. |
@LeKovr Did you see how we renamed the label from "rejected" to "descoped"? 😉 Yes indeed, later it might be great to add support for additional databases, if users actually need it in practice. Maybe everyone will be happy with an embedded database if we do it well. It is hard to predict. What I meant was that if you change some code that involves SQL you might feel uncomfortable because you only have experience with one database, so you end up doing nothing. And that can be very dangerous for a project. |
@lastzero, You are right. May be later. There are more important things to do by now |
I had a quick look and it looks like the queries at least are trivial to add. The biggest problem is the models. The I'm not sure what the solution is here. I'd guess that the solution is to use the types Gorm expects (e.g. I'll play with it some more and see. It'd be nice to put everything in my PostgreSQL DB instead of SQLite. |
may be |
All of the |
Yes, we use binary for plain ASCII, especially when strings need to be sorted, indexed or compared and should not be normalized in any way. |
Shouldn't that be the case by default for string fields? I know MySQL does some stupid stuff with character encodings but it shouldn't modify plain ASCII, right? |
But it uses 4 BYTES per ASCII character, so the index becomes very big. Also when you compare strings, it's somewhat more complex with unicode than just to compare bytes. I'm aware you can PROBABLY do the same with VARCHAR with the right settings and enough time to test, but it was hard to see business value in such experiments. |
As far as I can tell looking at the SQLite docs, the MySQL docs and PostgreSQL docs, that isn't the case at all. A
But we're not storing unicode, we're storing ASCII in a field that could contain unicode. I don't think any of those edge-cases apply here.
Fair enough. Also, queries aren't so straightforward after all. The queries extensively use I managed to do a little bit of cleanup of that and managed to get something working at least. |
Not in the index, check again. Maybe also not in memory when comparing. |
I couldn't find documentation so I just ran a quick test to see. import sqlite3
c = sqlite3.connect('test.db')
c.execute('CREATE TABLE things (integer PRIMARY KEY, testcolumn varchar(32))')
c.execute('CREATE INDEX test_idx ON things(testcolumn)')
for x in range(0, 10000):
c.execute('INSERT INTO things(testcolumn) VALUES (?)', (hex(x * 472882049 % 15485867),))
c.commit() Which resulted in 79.3k of actual data: SELECT SUM(length(testcolumn)) FROM things;
79288 I analyzed it with Table:
Index:
So for 79288 bytes of actual data sitting in the column, we have 109288 bytes total for the data itself (1.38 bytes per byte) and 129160 for the index (1.63 bytes per byte). I repeated the test with So I don't see any evidence that a |
You'll find some information on this page: https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-conversion.html You might also want to read this and related RFCs: https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings Note that Microsoft, as far as I know, still uses UCS-2 instead of UTC-8 in Windows, for all the reasons I mentioned. Maybe they switched to UTF-16. Their Linux database driver for SQL Server used null terminated strings, guess how well this works with UCS-2. Not at all. For MySQL, we use 4 byte UTF8, which needs 4 bytes in indexes unless somebody completely refactored InnoDB in the meantime. Note that the MySQL manual was wrong on InnoDB for a long time, insisting that MySQL doesn't know or support indexed organized tables while InnoDB ONLY uses index organized tables. When you're done with this, enjoy learning about the four Unicode normalization forms: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization Did you know there's a difference between Linux und OS X? Apple uses decomposed, so you need to convert all strings when copying files. Their bundled command line tools were not compiled with iconv support, so you had to compile it yourself. Some of this still not fixed until today. |
Note that Sqlite ignores VARBINARY and probably also VARCHAR to some degree. It uses dynamic typing. That's why all string keys are prefixed with at least once non-numeric character. It would convert the value to INT otherwise and comparisons with binary data or strings would fail:
|
I'm aware of Unicode encodings and some of the important differences between them. I still don't see anything in the docs indicating that using a To be clear, in case there's some miscommunication going on, my assumption is that even if the column is switched to In other news, here's a PoC of PostgreSQL mostly working. It's intended as an overview of the work that needs to be done, not as a serious proposal. |
Actually on With |
Ah, looks like the |
See https://mathiasbynens.be/notes/mysql-utf8mb4
Maybe we can switch to []byte in Go. Let's revisit this later, there are a ton of items on our todo with higher priority and that's by far not the only change we need to support other databases. Edit: As you can see in the code, I already implemented basic support for multiple dialects when we added Sqlite. For Postgres there's more to consider, especially data types. Sqlite pretty much doesn't care. Bool columns and date functions might also need attention. I'm fully aware Postgres is very popular in the GIS community, so it will be worth adding when we have the resources needed to implement and maintain it (see next comment). |
We also need to consider the impact on testing and continuous integration when adding support for additional storage engines and APIs. That's often underestimated and causes permanent overhead. From a contributor's perspective, it might just be a one time pull request. Anyhow, we value your efforts and feedback! Just so that you see why we're cautious. |
I have a blocking issue, and my knowledge of PostgreSQL consists of what I have read in the documentation in the last couple of days. MariaDB startup specifies the following two settings in the startup command for dealing with character strings:
Can someone please let me know how to setup the equivalent in PostgreSQL? |
@keif888 do you need these params for a new database(s) or for the whole instance (all databases)? We can specify default params for the whole instance, so every database created will inherit those. Or we can control those setting per database level. Where can I find the startup command for MariaDB to guess the best choice? |
Postgres can do MAX but either we need to type cast explicitly, or to create a custom aggregate pgwatch=# with vals(b) as (
values ('one'::bytea), ('two'), ('three')
)
select max(b::text) from vals;
max
----------
\x74776f
(1 row) But I feel something is terribly wrong if we're trying to get max of bytea |
@pashagolub the MariaDB Docker startup is here:
The max of a bytea I solved as per this --> MAX(convert_from(m.thumb,'UTF8')) My understanding is that MAX of a bytea is required because PhotoPrism was developed needing both case insensitive and case sensitive matching. And m.thumb (above) is an example of a case sensitive match. I used convert_from to return the value back to a string before doing the MAX so as to match what SQLite and MariaDB are doing. (I do need to retest this to make sure that it really is achieving the same result). Where case sensitive is needed the developers used VARBINARY(), and where case insensitive is needed they used VARCHAR().
Gorm V2
|
@keif888 Is this the query you are having trouble with? photoprism/internal/entity/query/covers.go Lines 269 to 283 in 4a4e45e
If so, don't worry about the
You may find similar patterns elsewhere and are welcome to suggest improvements or solve the same problem in a different way for PostgreSQL/SQLite! In this case, please add a code comment so we can find the relevant queries later to check them and also use them for MariaDB if possible. |
@lastzero Yes that was the one I was looking at. That one was easy to make work as it does in the other DBMS'. I am working to get PostgreSQL working the same way that MariaDB does, before trying to refactor the way that existing SQL statements work. There a quite a few differences between the SQL DML engines in the 3 DBMS', which makes the complex queries difficult. BTW: I had to change the PostgreSQL version that I had chosen from 17-alpine to 16-alpine as the PhotoPrism container is using Ubuntu 16.6-0ubuntu0.24.10.1, and that was preventing backup and restore from working. |
An now another nasty issue. Gorm is returning timestamp's with the time.Local instead of time.UTC. eg.
There is a fix for this in the pgx driver, which Gorm is using, but Gorm is unable to utilise that fix from what I can discover. The fix is to add code similar to this. See also comment here.
BUT, that has to be done if you are using pgx directly, which Gorm doesn't. There is a possibility that I can do something similar to this Gorm test, but it's using database/sql, and I need to use pgx directly. I have tried changing the server's timezone, the databases timezone, the connection string's timezone. None of these change the returned value (always Local). PostgreSQL is working as designed. There is an issue on Gorm similar to it here. It's open, but I think it has 2 issues confused. 1st issue is incorrect timezone in connection string so PsotegreSQL was changing the timestamp on the way in, and the 2nd is the one that we have with pgx marking it as Local. As an FYI: The test works if I add a .UTC() as shown below (and as per a comment in the issue), but there is no way that is an acceptable solution.
|
I have raised an issue against gorm for the timestamp location <> UTC. |
Good NewsI worked out how to get a pgxpool into Gorm, so I have the timestamptz working as UTC now. Only have the collation issue to solve now, as I'm assuming that will fix the 26 api tests that failed (crossed fingers) Bad NewsCollation in PostgreSQL can be set at a server or database level, but.... Only if it is deterministic. And we need DETERMINISTIC=false, which can only be done on by creating a collation within a database and applying that to table columns/indexes. These don't work
BUT, it works in deterministic fashion, so Aaa <> AAA.
This is what the collation string meansund-u <-- Unicode This works, but...Only way, and I've read the documentation now, is to create a database, create a collation, and then on every varchar column in the database add the COLLATION.
The problem is that Gorm doesn't have a way to do that only for PostgreSQL. Unless I modify gorm.io/driver/postgres to have an option to add the collation's when migrating. And that's a lot of reflection code that does my head in when I try and work on it. And what I am doing about itOther option is to change every varchar based equality/like check to be lower cased.
I am working down that path now, as I can't see modifying the driver being a simple change.
And I'm concerned that I will miss some if they are buried in First and Find etc. |
@keif888 Leaving aside case-sensitivity, Unicode is inherently "non-deterministic" when it comes to comparisons, since there are (often many) different encodings for the same characters:
|
Status ReportI have created a draft pull request as all unit tests now pass, and PhotoPrism starts and seems to work without errors and missing photos. PhotoPrism starts, and allows adding photos. Rescanning finds and adds photos. Search works (flower and Flower both find the same set of flowers for example. To Do:
|
What do you mean by this? psql is the part of Postgres, so it's the part of Docker container. What command do you need to run manually? Why do you need to run that command after container restart? |
FYI: I have fixed the issue around postgresql-client being missing. There is a need to ensure that it's included in the base photoprism:develop image though. @pashagolub PhotoPrism development has a number of services that are initiated from a compose.yaml file.
The photoprism app within the photoprism service has to be able to communicate with the postgres service via command line tools for backup and restore. Specific commands needed are:
I had a much larger post following the above, with links to everything I had done, and realised that there was an option to call specific make file targets via the PHOTOPRISM_INIT environment setting for photoprism. For those interested:
The docker compose build is needed to ensure that the updated makefile and scripts are included in the service, and on restart I saw the following:
|
Sorry. I'm trying to catch up but you're too fast for me. :) Would you please elaborate on this? Because I don't see Ubuntu 16 here |
Another thing I want to emphasize. It's better to use a Postgres packages to install packages not system shipped, e.g. ...
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
&& apt-get -qy install curl gnupg postgresql-common apt-transport-https lsb-release \
&& sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y \
&& curl -L "https://www.postgresql.org/media/keys/ACCC4CF8.asc" | apt-key add - \
&& apt-get -qy install postgresql-17 postgresql-plpython3-17 postgresql-17-pg-qualstats \
&& apt-get purge -y --auto-remove \
&& apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
... |
Hi, I have the output of the os version and psql version from my photoprism service below. OS Version
psql version
Ubuntu includes the postgresql-client in their list of packages (I'm probably mangling the reality here, I'm not a linux expert), but it's the 16.6 version. And that can not connect to the alpine-17 postgresql service as it's a lower version. |
I believe, we don't need to rely on any packages shipped with OS. We are in charge of what and how to be used. That said, it's simple to specify the exact version of client we want to have |
Yes, and that part I would leave to people that know what they are doing. I know that you have to deal with keys to allow apt to reference other repositories, but then it all starts getting a bit fuzzy. I am in no way an expert on linux or postgresql. |
There should already be an install script in /scripts/dist for MariaDB binaries, so you could also add one for PostgreSQL with a fallback to the default system packages. |
Preliminary benchmarks of sqlite vs postgres, mariadb vs postres and sqlite vs mariadb.
|
I cloned that to create one for PostgreSQL. Performance comparison:
|
Status ReportAll unit tests pass. |
Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
Nice idea lads, I totally support it. Were youever wondering to switch to postgres ? For the deployment size I'm predicting this going to have, mysql might be a bit suboptimal choice :)
Details on possible implementation strategies can be found in this comment:
The text was updated successfully, but these errors were encountered: