Skip to content

Conversation

@l-wang
Copy link
Owner

@l-wang l-wang commented Feb 27, 2025

No description provided.

jeff-davis and others added 30 commits February 20, 2025 10:21
Per buildfarm and reports, it seems that 9.X to 18 upgrades were
failing after commit 1fd1bd8 due to an incorrect regex. Loosen the
regex to accommodate older versions.

Reported-by: vignesh C <vignesh21@gmail.com>
Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CALDaNm3GUs+U8Nt4S=V5zmb+K8-RfAc03vRENS0teeoq0Lc6Tw@mail.gmail.com
Discussion: https://postgr.es/m/ea4cbbc1-c5a5-43d1-9618-8ff3f2155bfe@dunslane.net
Remove a number of (char *) casts that are unnecessary.  Or in some
cases, rewrite the code to make the purpose of the cast clearer.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
Commit b3f0be7 accidentally missed adding the oauth client test
binary to the relevant .gitignore.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2839306.1740082041@sss.pgh.pa.us
A patch touching this area of the code is under review, and this format
makes the readability of the code slightly harder to parse.

Extracted from a larger patch by the same author.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com
With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol.  However, it was not
possible to send these queries using libpq's pipeline mode.  This
feature has two advantages:
- Testing.  Pipeline tests were only possible with pgbench, using TAP
tests.  It now becomes possible to have more SQL tests that are able to
stress the backend with pipelines and extended queries.  More tests will
be added in a follow-up commit that were discussed on some other
threads.  Some external projects in the community had to implement their
own facility to work around this limitation.
- Emulation of custom workloads, with more control over the actions
taken by a client with libpq APIs.  It is possible to emulate more
workload patterns to bottleneck the backend with the extended query
protocol.

This patch adds six new meta-commands to be able to control pipelines:
* \startpipeline starts a new pipeline.  All extended queries are queued
until the end of the pipeline are reached or a sync request is sent and
processed.
* \endpipeline ends an existing pipeline.  All queued commands are sent
to the server and all responses are processed by psql.
* \syncpipeline queues a synchronisation request, without flushing the
commands to the server, equivalent of PQsendPipelineSync().
* \flush, equivalent of PQflush().
* \flushrequest, equivalent of PQsendFlushRequest()
* \getresults reads the server's results for the queries in a pipeline.
Unsent data is automatically pushed when \getresults is called.  It is
possible to control the number of results read in a single meta-command
execution with an optional parameter, 0 means that all the results
should be read.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com
Use RelationGetIndexExpressions() rather than rd_indexprs directly.

Author: Corey Huinker <corey.huinker@gmail.com>
Change backend launcher functions to take void * for binary data
instead of char *.  This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
The type argument wasn't actually really necessary.  It was a remnant
of converting the API of the gist strategy translation from using
opclass to using opfamily+opcintype (commits c09e5a6,
622f678).  For looking up the gist translation function, we used
the convention "amproclefttype = amprocrighttype = opclass's
opcintype" (see pg_amproc.h).  But each operator family should only
have one translation function, and getting the right type for the
lookup is sometimes cumbersome and fragile, so this is all
unnecessarily complicated.

To simplify this, change the gist stategy support procedure to take
"any", "any" as argument.  (This is arbitrary but seems intuitive.
The alternative of using InvalidOid as argument(s) upsets various DDL
commands, so it's not practical.)  Then we don't need opcintype for
the lookup, and we can remove it from all the API layers introduced by
commit c09e5a6.

This also adds some more documentation about the correct signature of
the gist support function and adds more checks in gistvalidate().
This was previously underspecified.  (It relied implicitly on
convention mentioned above.)

Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
NO INDENT is the default, and is added if no explicit indentation
flag was provided with XMLSERIALIZE().

Oversight in 483bdb2.

Author: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/bebd457e-5b43-46b3-8fc6-f6a6509483ba@uni-muenster.de
Backpatch-through: 16
Previously, a WARNING was issued at the time of defining a subscription
with origin=NONE only when the publisher subscribed to the same table from
other publishers, indicating potential data origination from different
origins. However, the publisher can subscribe to the partition ancestors
or partition children of the table from other publishers, which could also
result in mixed-origin data inclusion. So, give a WARNING in those cases
as well.

Reported-by: Sergey Tatarintsev <s.tatarintsev@postgrespro.ru>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/5eda6a9c-63cf-404d-8a49-8dcb116a29f3@postgrespro.ru
The bibliography entries for olsen93 and ong90 lacked links to
online copies.  While ong90 is available in digital form, the
olsen93 thesis is only available as a physical copy in the UCB
library.  To save people from searching for it, we still link
to it via the UCB library page.

Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxFcJYdRvzgt59N26XjFp2tFFUXu+VN+x8Uo0NbDUCMCbw@mail.gmail.com
This allows using text position search functions with nondeterministic
collations.  These functions are

- position, strpos
- replace
- split_part
- string_to_array
- string_to_table

which all use common internal infrastructure.

There was previously no internal implementation of this, so it was met
with a not-supported error.  This adds the internal implementation and
removes the error.

Unlike with deterministic collations, the search cannot use any
byte-by-byte optimized techniques but has to go substring by
substring.  We also need to consider that the found match could have a
different length than the needle and that there could be substrings of
different length matching at a position.  In most cases, we need to
find the longest such substring (greedy semantics), but this can be
configured by each caller.

Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org
Dumps from versions older than v16 do not know about NO INDENT in a
XMLSERIALIZE() clause.  This commit adjusts AdjustUpgrade.pm so as NO
INDENT is discarded in the contents of the new dump adjusted for
comparison when the old version is v15 or older.  This should be enough
to make the cross-version upgrade tests pass.

Per report from buildfarm member crake.  Oversight in 984410b.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/88b183f1-ebf9-4f51-9144-3704380ccae7@dunslane.net
Backpatch-through: 16
Previously the portlock logic, added in 9b4eafc, didn't actually work
properly when the tests were run via meson. 9b4eafc used the
MESON_BUILD_ROOT environment variable to determine the directory for the port
lock directory, but that's never set for running the tests.  That meant that
each test used its own portlock dir, unless the PG_TEST_PORT_DIR environment
variable was set.

Fix the problem by setting top_builddir for the environment. That's also used
for the autoconf/make build.

Backpatch back to 16, where meson support was added.

Reported-by: Zharkov Roman <r.zharkov@postgrespro.ru>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Backpatch-through: 16
Also simplify and correct data checksum wording in master now that it is
the default.  PG 13 did not have the awkward wording.

Reported-by: Felix <afripowered@gmail.com>

Reviewed-by: Laurenz Albe

Discussion: https://postgr.es/m/173928241056.707.3989867022954178032@wrigleys.postgresql.org

Backpatch-through: 14
The signedness of the 'char' type in C is
implementation-dependent. For instance, 'signed char' is used by
default on x86 CPUs, while 'unsigned char' is used on aarch
CPUs. Previously, we accidentally let C implementation signedness
affect persistent data. This led to inconsistent results when
comparing char data across different platforms.

This commit introduces a new 'default_char_signedness' field in
ControlFileData to store the signedness of the 'char' type. While this
change does not encourage the use of 'char' without explicitly
specifying its signedness, this field can be used as a hint to ensure
consistent behavior for pre-v18 data files that store data sorted by
the 'char' type on disk (e.g., GIN and GiST indexes), especially in
cross-platform replication scenarios.

Newly created database clusters unconditionally set the default char
signedness to true. pg_upgrade (with an upcoming commit) changes this
flag for clusters if the source database cluster has
signedness=false. As a result, signedness=false setting will become
rare over time. If we had known about the problem during the last
development cycle that forced initdb (v8.3), we would have made all
clusters signed or all clusters unsigned. Making pg_upgrade the only
source of signedness=false will cause the population of database
clusters to converge toward that retrospective ideal.

Bump catalog version (for the catalog changes) and PG_CONTROL_VERSION
(for the additions in ControlFileData).

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
…signedness.

With the newly added option --char-signedness, pg_resetwal updates the
default char signedness flag in the controlfile. This option is
primarily intended for an upcoming patch that pg_upgrade supports
preserving the default char signedness during upgrades, and is not
meant for manual operation.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
Commit 44fe30f introduced the 'default_char_signedness' field in
controlfile. Newly created database clusters always set this field to
'signed'.

This change ensures that pg_upgrade updates the
'default_char_signedness' to 'unsigned' if the source database cluster
has signedness=false. For source clusters from v17 or earlier, which
lack the 'default_char_signedness' information, pg_upgrade assumes the
source cluster was initialized on the same platform where pg_upgrade
is running. It then sets the 'default_char_signedness' value according
to the current platform's default character signedness.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
…ess of new cluster.

This change adds a new option --set-char-signedness to pg_upgrade. It
enables user to set arbitrary signedness during pg_upgrade. This helps
cases where user who knew they copied the v17 source cluster from
x86 (signedness=true) to ARM (signedness=false) can pg_upgrade
properly without the prerequisite of acquiring an x86 VM.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
…different architectures.

GIN and GiST indexes utilizing pg_trgm's opclasses store sorted
trigrams within index tuples. When comparing and sorting each trigram,
pg_trgm treats each character as a 'char[3]' type in C. However, the
char type in C can be interpreted as either signed char or unsigned
char, depending on the platform, if the signedness is not explicitly
specified. Consequently, during replication between different CPU
architectures, there was an issue where index scans on standby servers
could not locate matching index tuples due to the differing treatment
of character signedness.

This change introduces comparison functions for trgm that explicitly
handle signed char and unsigned char. The appropriate comparison
function will be dynamically selected based on the character
signedness stored in the control file. Therefore, upgraded clusters
can utilize the indexes without rebuilding, provided the cluster
upgrade occurs on platforms with the same character signedness as the
original cluster initialization.

The default char signedness information was introduced in 44fe30f,
so no backpatch.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
There is a race condition between "GRANT role" and "DROP ROLE",
which allows GRANT to install pg_auth_members entries that refer to
dropped roles.  (Commit 6566133 prevented that for the grantor
field, but not for the granted or grantee roles.)  We'll soon fix
that, at least in HEAD, but pg_dumpall needs to cope with the
situation in case of pre-existing inconsistency.  As pg_dumpall
stands, it will emit invalid commands like 'GRANT foo TO ""',
which causes pg_upgrade to fail.  Fix it to emit warnings and skip
those GRANTs, instead.

There was some discussion of removing the problem by changing
dumpRoleMembership's query to use JOIN not LEFT JOIN, but that
would result in silently ignoring such entries.  It seems better
to produce a warning.

Pre-v16 branches already coped with dangling grantor OIDs by simply
omitting the GRANTED BY clause.  I left that behavior as-is, although
it's somewhat inconsistent with the behavior of later branches.

Reported-by: Virender Singla <virender.cse@gmail.com>
Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com
Backpatch-through: 13
When nloops > 1, we now display two digits after the decimal point,
rather than none. This is important because what we print is actually
planstate->instrument->ntuples / nloops, and sometimes what you want
to know is planstate->instrument->ntuples. You can estimate that by
multiplying the displayed row count by the displayed nloops value, but
the fact that the displayed value is rounded makes that inexact. It's
still inexact even if we show these two extra decimal places, but less
so. Perhaps we will agree on a way to further improve this output later,
but for now this seems better than doing nothing.

Author: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Naeem Akhter <akhternaeem@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@percona.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Guillaume Lelarge <guillaume@lelarge.info>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: http://postgr.es/m/603c8f070905281830g2e5419c4xad2946d149e21f9d%40mail.gmail.com
Concurrently dropping either the granted role or the grantee
does not stop GRANT from completing, instead resulting in a
dangling role reference in pg_auth_members.  That's relatively
harmless in the short run, but inconsistent catalog entries
are not a good thing.

This patch solves the problem by adding the granted and grantee
roles as explicit shared dependencies of the pg_auth_members entry.
That's a bit indirect, but it works because the pg_shdepend code
applies the necessary locking and rechecking.

Commit 6566133 previously established similar handling for
the grantor column of pg_auth_members; it's not clear why it
didn't cover the other two role OID columns.

A side-effect of this approach is that DROP OWNED BY will now drop
pg_auth_members entries that mention the target role as either the
granted or grantee role.  That's clearly appropriate for the
grantee, since we'll drop its other privileges too.  It doesn't
seem too far out of line for the granted role, since we're
presumably about to drop it and besides we're removing all reasons
why it'd matter to be a member of it.  (One could argue that this
makes DropRole's code to auto-drop pg_auth_members entries
unnecessary, but I chose to leave it in place since perhaps some
people's workflows expect that to work without a DROP OWNED BY.)

Note to patch readers: CreateRole's first CommandCounterIncrement
call is now unconditional, because this change creates another
case in which it's needed, and it seemed to be more trouble than
it's worth to preserve that micro-optimization.

Arguably this is a bug fix, but the fact that it changes the
expected contents of pg_shdepend seems like not a great thing
to do in the stable branches, and perhaps we don't want the
change in DROP OWNED BY semantics there either.  On the other
hand, I opted not to force a catversion bump in HEAD, because
the presence or absence of these entries doesn't matter for
most purposes.

Reported-by: Virender Singla <virender.cse@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com
Per the buildfarm, these tests appear to be unstable in the wake of
commit ddb17e3. I'm not sure that
just hiding this output is the right way forward, because I think
there may be other test cases that will fail with lower probability
even after this fix. However, it's hard to tell right now, because
this is failing on a number of buildfarm animals. So let's try this
for now to either get a clearer picture of what else is broken, or
as a stopgap until we decide what the permanent fix should be, or
perhaps this will be the permanent fix after all.
To implement AIO writes, the backend initiating writes needs to transfer the
lock ownership to the AIO subsystem, so the lock held during the write can be
released in another backend.

Other backends need to be able to "complete" an asynchronously started IO to
avoid deadlocks (consider e.g. one backend starting IO for a buffer and then
waiting for a heavyweight lock held by another relation followed by the
current holder of the heavyweight lock waiting for the IO to complete).

To that end, this commit adds LWLockDisown() and LWLockReleaseDisowned(). If
code uses LWLockDisown() it's the code's responsibility to ensure that the
lock is released in case of errors.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
The 'cached-plan-inval' test suite, introduced in 525392d under
src/test/modules/delay_execution, aimed to verify that cached plan
invalidation triggers replanning after deferred locks are taken.
However, its ExecutorStart_hook-based approach relies on lock timing
assumptions that, in retrospect, are fragile. This instability was
exposed by failures on BF animal trilobite, which builds with
CLOBBER_CACHE_ALWAYS.

One option was to dynamically disable the cache behavior that causes
the test suite to fail by setting "debug_discard_caches = 0", but it
seems better to remove the suite. The risk of future failures due to
other cache flush hazards outweighs the benefit of catching real
breakage in the backend behavior it tests.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2990641.1740117879@sss.pgh.pa.us
(Initially the proposal was to keep \conninfo alone and add this feature
as \conninfo+, but we decided against keeping the original.)

Also display more fields than before, though not as many as were
suggested during the discussion.  In particular, we don't show 'role'
nor 'session authorization', for both which a case can probably be made.
These can be added as followup commits, if we agree to it.

Some (most?) reviewers actually reviewed rather different versions of
the patch and do not necessarily endorse the current one.

Co-authored-by: Maiquel Grassi <grassi@hotmail.com.br>
Co-authored-by: Hunaid Sohail <hunaidpgml@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Sami Imseih <simseih@amazon.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Luzanov <p.luzanov@postgrespro.ru>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Erik Wienhold <ewie@ewie.name>
Discussion: https://postgr.es/m/CP8P284MB24965CB63DAC00FC0EA4A475EC462@CP8P284MB2496.BRAP284.PROD.OUTLOOK.COM
michaelpq and others added 26 commits February 26, 2025 14:26
The original message did not mention where the checkpoint record LSN was
found, a control file or a backup_label file.  A couple of LOG messages
are generated before this FATAL check is reached, providing more details
about the way recovery is set up.  However, knowing this information in
this specific message is useful for debugging.  This is also useful for
instances where log_min_messages is set to FATAL or more, where LOG
messages do not show up.

Author: Benoit Lobréau <benoit.lobreau@dalibo.com>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/4ed10bc8-5513-4d8e-8643-8abcaa08336d@dalibo.com
This patch introduces the '--enable-two-phase' option to the
'pg_createsubscriber' utility, allowing users to enable two-phase commit
for all subscriptions during their creation.

Note that even without this option users can enable the two_phase option
for the subscriptions created by pg_createsubscriber. However, it requires
the subscription to be disabled first which could be inconvenient for
users.

When two-phase commit is enabled, prepared transactions are sent to the
subscriber at the time of 'PREPARE TRANSACTION', and they are processed as
two-phase transactions on the subscriber as well. If disabled, prepared
transactions are sent only when committed and are processed immediately by
the subscriber.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHv8RjLPdFP=kA5LNSmWZ=+GMXmO+LczvV6p9HJjsXxZz10KGA@mail.gmail.com
All the processes that generate WAL should call pgstat_report_wal() to
report all their statistics related to WAL, and this is already what
happens in the tree.  Keeping pgstat_report_wal() is confusing while the
other routine is encouraged.

This routine is not required since fc415ed, where it was lastly
used in pgstat_report_stat() before an equivalent callback existed.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z71oPkJJICrRB5Ws@paquier.xyz
This new structure contains the counters and the data related to the WAL
activity statistics gathered from WalUsage, separated into its own
structure so as it can be shared across more than one Stats structure in
pg_stat.h.

This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics.

Bump PGSTAT_FILE_FORMAT_ID.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
Using the Python Limited API would allow building PL/Python against
any Python 3.x version and using another Python 3.x version at run
time.  This commit does not activate that, but it prepares the code to
only use APIs supported by the Limited API.

Implementation details:

- Convert static types to heap types
  (https://docs.python.org/3/howto/isolating-extensions.html#heap-types).

- Replace PyRun_String() with component functions.

- Replace PyList_SET_ITEM() with PyList_SetItem().

Reviewed-by: Jakob Egger <jakob@eggerapps.at>
Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24@eisentraut.org
The test in 005_char_signedness.pl was missing a dash in the
--set-char-signedness option. Although the test didn't fail since it
doesn't check the error message, it resulted in an unexpected error
message instead of the intended one.

Oversight in 1aab680.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87tt8h1vb7.fsf@wibble.ilmari.org
This reverts commit c47e8df.

That commit makes the plpython tests crash with Python 3.6.* and
3.7.*.  It will need further investigation and testing, so revert for
now.
Previously we used attname for both table and index columns, but
that is problematic for indexes because their attnames are assigned
by internal rules that don't guarantee to preserve the names across
dump and reload.  (This is what's causing the remaining buildfarm
failures in cross-version-upgrade tests.)  Fortunately we can use
attnum instead, since there's no such thing as adding or dropping
columns in an existing index.  We met this same problem previously
with ALTER INDEX ... SET STATISTICS, and solved it the same way,
cf commit 5b6d13e.

In pg_restore_attribute_stats() itself, we accept either attnum or
attname, but the policy used by pg_dump is to always use attname
for tables and attnum for indexes.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/1457469.1740419458@sss.pgh.pa.us
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z77IkjmmfbFfNh3f@paquier.xyz
9d9b9d4 has added spinlocks to protect the fields in ProcSignal
flags, introducing a code path in ProcSignalInit() where a spinlock
could be released twice if the pss_pid field of a ProcSignalSlot is
found as already set.  Multiple spinlock releases have no effect with
most spinlock implementations, but this could cause the code to run into
issues when the spinlock is acquired concurrently by a different
process.

This sanity check on pss_pid generates a LOG that can be delayed until
after the spinlock is released as, like older versions up to v17, the
code expects the initialization of the ProcSignalSlot to happen even if
pss_pid is found incorrect.  The code is changed so as the old pss_pid
is read while holding the slot's spinlock, with the LOG from the sanity
check generated after releasing the spinlock, preventing the double
release.

Author: Maksim Melnikov <m.melnikov@postgrespro.ru>
Co-authored-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/dca47527-2d8b-4e3b-b5a0-e2deb73371a4@postgrespro.ru
This commit adds to pgstatfuncs.c a new routine called
pg_stat_wal_build_tuple(), helper routine for pg_stat_get_wal().  This
is in charge of filling one tuple based on the contents of
PgStat_WalStats retrieved from pgstats.

This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics, simplifying the main patch.  Note that
it is not possible for stats_reset to be NULL in pg_stat_wal; backend
statistics need to be able to handle this case.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm0zsFUYpe-tLha+-sp3K8KmBXu0o=LUN=8FFtxMLYikPA@mail.gmail.com
After commit f41d846, a process could acquire and use a replication
slot that had just been invalidated, leading to failures while accessing
WAL.

To ensure that we don't accidentally start using invalid slots, we must
perform the invalidation check after acquiring the slot or under the
spinlock where we associate the slot with a particular process. We choose
the earlier method to keep the code simple.

Reported-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CABdArM7J-LbGoMPGUPiFiLOyB_TZ5+YaZb=HMES0mQqzVTn8Gg@mail.gmail.com
The function in charge of freeing the memory from a result created by
PQescapeIdentifier() has to be PQfreemem(), to ensure that both
allocation and free come from libpq, but one spot in pg_amcheck was
missing that.

Oversight in b859d94.

Author: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAEudQArD_nKSnYCNUZiPPsJ2tNXgRmLbXGSOrH1vpOF_XtP0Vg@mail.gmail.com
Discussion: https://postgr.es/m/CAEudQArbTWVSbxq608GRmXJjnNSQ0B6R7CSffNnj2hPWMUsRNg@mail.gmail.com
Backpatch-through: 14
Previously the internal queue of buffers was capped at max_ios * 4,
though not less than io_combine_limit, at allocation time.  That was
done in the first version based on conservative theories about resource
usage and heuristics pending later work.  The configured I/O depth could
not always be reached with dense random streams generated by ANALYZE,
VACUUM, the proposed Bitmap Heap Scan patch, and also sequential streams
with the proposed AIO subsystem to name some examples.

The new formula is (max_ios + 1) * io_combine_limit, enough buffers for
the full configured I/O concurrency level using the full configured I/O
combine size, plus the buffers from one finished but not yet consumed
full-sized I/O.  Significantly more memory would be needed for high GUC
values if the client code requests a large per-buffer data size, but
that is discouraged (existing and proposed stream users try to keep it
under a few words, if not zero).

With this new formula, an intermediate variable could have overflowed
under maximum GUC values, so its data type is adjusted to cope.

Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
As spotted by Coverity, the calculation of ojrelid mixes signed and unsigned
types causes possible overflow and undefined behavior.  Instead of trying to
fix the expression, this commit eliminates the relied local variable.  The
explicit branching is used to replace the -1 value.  That, in turn, requires
changing the signature of the remove_rel_from_eclass() function.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/914330.1740330169%40sss.pgh.pa.us
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
pgbench wants to record the starting line number of each command
in its scripts.  It was computing that by scanning from the script
start and counting newlines, so that O(N^2) work had to be done
for an N-command script.  In a script with 50K lines, this adds
up to about 10 seconds on my machine.

To add insult to injury, the results were subtly wrong, because
expr_scanner_offset() scanned to find the NUL that flex inserts
at the end of the current token --- and before the first yylex
call, no such NUL has been inserted.  So we ended by computing the
script's last line number not its first one.  This was visible only
in case of \gset at the start of a script, which perhaps accounts
for the lack of complaints.

To fix, steal an idea from plpgsql and track the current lexer
ending position and line count as we advance through the script.
(It's a bit simpler than plpgsql since we can't need to back up.)
Also adjust a couple of other places that were invoking scans
from script start when they didn't really need to.  I made a new
psqlscan function psql_scan_get_location() that replaces both
expr_scanner_offset() and expr_scanner_get_lineno(), since in
practice expr_scanner_get_lineno() was only being invoked to find
the line number of the current lexer end position.

Reported-by: Daniel Vérité <daniel@manitou-mail.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/84a8a89e-adb8-47a9-9d34-c13f7150ee45@manitou-mail.org
ParseScript only needs the lineno for meta-commands, so let's not
bother computing it otherwise.  While this doesn't save much given
the previous patch, there's no point in doing unnecessary work.
While we're at it, avoid calling psql_scan_get_location() twice for
a meta-command.

One reason for making this change is that the line number computed
in ParseScript's main loop was actually wrong in most cases: it
would point just past the semicolon of the previous SQL command,
not at what the user thinks the current command's line number is.
We could add some code to skip whitespace before capturing the line
number, but it would be pretty pointless at present.  Just move the
call to avoid the temptation to rely on that value.  (Once we've
lexed the backslash, the computed line number will be right.)

This change also means that pgbench never inquires about the
location before it's lexed something, so that the care taken in
the previous patch to behave sanely in that case is unnecessary.
It seems best to keep that logic, though, as future callers
might depend on it.

Author: Daniel Vérité <daniel@manitou-mail.org>
Discussion: https://postgr.es/m/84a8a89e-adb8-47a9-9d34-c13f7150ee45@manitou-mail.org
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality.  A field amcanorder
already existed, this uses it more widely.  Fields amcanhash and
amcancrosscompare are added for the other purposes.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
This is a preparation step for allowing subscripting containers to
transform only a prefix of an indirection list and modify the list
in-place by removing the processed elements. Currently, all elements
are consumed, and the list is set to NIL after transformation.

In the following commit, subscripting containers will gain the
flexibility to stop transformation when encountering an unsupported
indirection and return the remaining indirections to the caller.
This change extends generic type subscripting to recognize dot
notation (.) in addition to bracket notation ([]). While this does not
yet provide full support for dot notation, it enables subscripting
containers to process it in the future.

For now, container-specific transform functions only handle
subscripting indices and stop processing when encountering dot
notation. It is up to individual containers to decide how to transform
dot notation in subsequent updates.
This is a preparation step for a future commit that will reuse the
aforementioned function.
This is a preparation step for a future commit that will reuse the
aforementioned function.
Now that we are allowing container generic subscripting to take dot
notation in the list of indirections, and it is transformed as a
String node.

For jsonb, we want to represent field accessors as String nodes in
refupperexprs for distinguishing from ordinary text subscripts which
can be needed for correct EXPLAIN.

Strings node is no longer a valid expression nodes, so added special
handling for them in walkers in nodeFuncs etc.
This patch introduces JSONB member access using dot notation, wildcard
access, and array subscripting with slicing, aligning with the JSON
simplified accessor specified in SQL:2023. Specifically, the following
syntax enhancements are added:

1. Simple dot-notation access to JSONB object fields
2. Wildcard dot-notation access to JSONB object fields
2. Subscripting for index range access to JSONB array elements

Examples:

-- Setup
create table t(x int, y jsonb);
insert into t select 1, '{"a": 1, "b": 42}'::jsonb;
insert into t select 1, '{"a": 2, "b": {"c": 42}}'::jsonb;
insert into t select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::jsonb;

-- Existing syntax predates the SQL standard:
select (t.y)->'b' from t;
select (t.y)->'b'->'c' from t;
select (t.y)->'d'->0 from t;

-- JSON simplified accessor specified by the SQL standard:
select (t.y).b from t;
select (t.y).b.c from t;
select (t.y).d[0] from t;

The SQL standard states that simplified access is equivalent to:
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)

where:
  VEP = <value expression primary>
  JC = <JSON simplified accessor op chain>

For example, the JSON_QUERY equivalents of the above queries are:

select json_query(y, 'lax $.b' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from t;
select json_query(y, 'lax $.b.c' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from t;
select json_query(y, 'lax $.d[0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from t;

Implementation details:

Extends the existing container subscripting interface to support
container-specific information, specifically a JSONPath expression for
jsonb.

During query transformation, detects dot-notation, wildcard access,
and sliced subscripting. If any of these accessors are present,
constructs a JSONPath expression representing the access chain.

During execution, if a JSONPath expression is present in
JsonbSubWorkspace, executes it via JsonPathQuery().

Does not transform accessors directly into JSON_QUERY during
transformation to preserve the original query structure for EXPLAIN
and CREATE VIEW.
@l-wang l-wang closed this Feb 27, 2025
l-wang pushed a commit that referenced this pull request Nov 10, 2025
If the number of sync requests is big enough, the palloc() call in
AbsorbSyncRequests() will attempt to allocate more than 1 GB of memory,
resulting in failure.  This can lead to an infinite loop in the checkpointer
process, as it repeatedly fails to absorb the pending requests.

This commit introduces the following changes to cope with this problem:
 1. Turn pending checkpointer requests array in shared memory into a bounded
    ring buffer.
 2. Limit maximum ring buffer size to 10M items.
 3. Make AbsorbSyncRequests() process requests incrementally in 10K batches.

Even #2 makes the whole queue size fit the maximum palloc() size of 1 GB.
of continuous lock holding.

This commit is for master only.  Simpler fix, which just limits a request
queue size to 10M, will be backpatched.

Reported-by: Ekaterina Sokolova <e.sokolova@postgrespro.ru>
Discussion: https://postgr.es/m/db4534f83a22a29ab5ee2566ad86ca92%40postgrespro.ru
Author: Maxim Orlov <orlovmg@gmail.com>
Co-authored-by:  Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
l-wang pushed a commit that referenced this pull request Nov 10, 2025
There've been a few complaints that it can be overly difficult to figure
out why the planner picked a Memoize plan.  To help address that, here we
adjust the EXPLAIN output to display the following additional details:

1) The estimated number of cache entries that can be stored at once
2) The estimated number of unique lookup keys that we expect to see
3) The number of lookups we expect
4) The estimated hit ratio

Technically postgres#4 can be calculated using #1, #2 and #3, but it's not a
particularly obvious calculation, so we opt to display it explicitly.
The original patch by Lukas Fittl only displayed the hit ratio, but
there was a fear that might lead to more questions about how that was
calculated.  The idea with displaying all 4 is to be transparent which
may allow queries to be tuned more easily.  For example, if #2 isn't
correct then maybe extended statistics or a manual n_distinct estimate can
be used to help fix poor plan choices.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com
l-wang pushed a commit that referenced this pull request Nov 10, 2025
truncate_useless_pathkeys() seems to have neglected to account for
PathKeys that might be useful for WindowClause evaluation.  Modify it so
that it properly accounts for that.

Making this work required adjusting two things:

1. Change from checking query_pathkeys to check sort_pathkeys instead.
2. Add explicit check for window_pathkeys

For #1, query_pathkeys gets set in standard_qp_callback() according to the
sort order requirements for the first operation to be applied after the
join planner is finished, so this changes depending on which upper
planner operations a particular query needs.  If the query has window
functions and no GROUP BY, then query_pathkeys gets set to
window_pathkeys.  Before this change, this meant PathKeys useful for the
ORDER BY were not accounted for in queries with window functions.

Because of #1, #2 is now required so that we explicitly check to ensure
we don't truncate away PathKeys useful for window functions.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrj3HTKmXoLMbUjTO=_MNMxM=cnuCSyBKidAVibmYPnrg@mail.gmail.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.