Skip to content

Commit

Permalink
Fix misbehavior in contrib/pg_trgm with an unsatisfiable regex.
Browse files Browse the repository at this point in the history
If the regex compiler can see that a regex is unsatisfiable
(for example, '$foo') then it may emit an NFA having no arcs.
pg_trgm's packGraph function did the wrong thing in this case;
it would access off the end of a work array, and with bad luck
could produce a corrupted output data structure causing more
problems later.  This could end with wrong answers or crashes
in queries using a pg_trgm GIN or GiST index with such a regex.

Fix by not trying to de-duplicate if there aren't at least 2 arcs.

Per bug #17830 from Alexander Lakhin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17830-57ff5f89bdb02b09@postgresql.org
  • Loading branch information
tglsfdc committed Mar 11, 2023
1 parent 53a53ea commit 7865280
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 10 deletions.
6 changes: 6 additions & 0 deletions contrib/pg_trgm/expected/pg_word_trgm.out
Original file line number Diff line number Diff line change
Expand Up @@ -1044,3 +1044,9 @@ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kaban
Waikala | 0.3
(89 rows)

-- test unsatisfiable pattern
select * from test_trgm2 where t ~ '.*$x';
t
---
(0 rows)

3 changes: 3 additions & 0 deletions contrib/pg_trgm/sql/pg_word_trgm.sql
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,6 @@ select t,word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <% t
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;

-- test unsatisfiable pattern
select * from test_trgm2 where t ~ '.*$x';
26 changes: 16 additions & 10 deletions contrib/pg_trgm/trgm_regexp.c
Original file line number Diff line number Diff line change
Expand Up @@ -1944,9 +1944,7 @@ packGraph(TrgmNFA *trgmNFA, MemoryContext rcontext)
arcsCount;
HASH_SEQ_STATUS scan_status;
TrgmState *state;
TrgmPackArcInfo *arcs,
*p1,
*p2;
TrgmPackArcInfo *arcs;
TrgmPackedArc *packedArcs;
TrgmPackedGraph *result;
int i,
Expand Down Expand Up @@ -2018,17 +2016,25 @@ packGraph(TrgmNFA *trgmNFA, MemoryContext rcontext)
qsort(arcs, arcIndex, sizeof(TrgmPackArcInfo), packArcInfoCmp);

/* We could have duplicates because states were merged. Remove them. */
/* p1 is probe point, p2 is last known non-duplicate. */
p2 = arcs;
for (p1 = arcs + 1; p1 < arcs + arcIndex; p1++)
if (arcIndex > 1)
{
if (packArcInfoCmp(p1, p2) > 0)
/* p1 is probe point, p2 is last known non-duplicate. */
TrgmPackArcInfo *p1,
*p2;

p2 = arcs;
for (p1 = arcs + 1; p1 < arcs + arcIndex; p1++)
{
p2++;
*p2 = *p1;
if (packArcInfoCmp(p1, p2) > 0)
{
p2++;
*p2 = *p1;
}
}
arcsCount = (p2 - arcs) + 1;
}
arcsCount = (p2 - arcs) + 1;
else
arcsCount = arcIndex;

/* Create packed representation */
result = (TrgmPackedGraph *)
Expand Down

0 comments on commit 7865280

Please sign in to comment.