Initial work on COPY progress - take two. #5

simi · 2020-06-14T00:55:42Z

COPY (SELECT * FROM test) TO '/tmp/ids';

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed 
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
 3347126 | 16384 | yr      |     0 | TO        | t    | f       |         3529943 |             24906226
(1 row)

COPY test FROM '/tmp/ids'

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed 
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
 3347126 | 16384 | yr      | 16385 | FROM      | t    | f       |       121591999 |            957218816
(1 row)

Since we're bypassing the buffer manager, we need to call PageSetChecksumInplace() directly. As reported by Justin Pryzby. In the passing, add RelationOpenSmgr() calls before all smgrwrite() and smgrextend() calls. Tom added one before the first smgrextend() call in commit c2bb287, which seems to be enough, but let's play it safe and do it before each one. That's how it's done in the similar code in nbtsort.c, too. Discussion: https://www.postgresql.org/message-id/20200920224446.GF30557@telsasoft.com

The standard order in PostgreSQL and other code is use strict first, but some code was uselessly inconsistent about this.

99% of this is docs, but also a couple of comments. No code changes. Justin Pryzby Discussion: https://postgr.es/m/20200919175804.GE30557@telsasoft.com

This completes the project of making all our derived files be pgindent-clean (or else explicitly excluded from indentation), so that no surprises result when running pgindent in a built-out development tree. Discussion: https://postgr.es/m/79ed5348-be7a-b647-dd40-742207186a22@2ndquadrant.com

The existing message about "a column definition list is only allowed for functions returning "record"" could be given in some cases where it was fairly confusing; in particular, a function with multiple OUT parameters *does* return record according to pg_proc. Break it down into a couple more cases to deliver a more on-point complaint. Per complaint from Bruce Momjian. Discussion: https://postgr.es/m/798909.1600562993@sss.pgh.pa.us

pgindent messes up entries in this file if their names match typedef names. While there's reason to avoid choosing conflicting names, we have some historical exceptions, and there's no guarantee that more duplicates won't appear in future. Since this is a derived file anyway, there's little harm in just excluding it. I said yesterday that all our derived files are pgindent-clean, or else explicitly excluded from indentation, but I'd forgotten about this one. Now that project is really done, as confirmed by a test run. Discussion: https://postgr.es/m/79ed5348-be7a-b647-dd40-742207186a22@2ndquadrant.com

Further experience says that the appending behavior offered by pg_get_line_append is useful to only a very small minority of callers. For most, the requirement to reset the buffer after each line is just an error-prone nuisance. Hence, invent another alternative call pg_get_line_buf, which takes care of that detail. Noted while reviewing a patch from Daniel Gustafsson. Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se

Use a StringInfo instead of a fixed-size buffer in parseServiceInfo(). While we've not heard complaints about the existing 255-byte limit, it certainly seems possible that complex cases could run afoul of it. Daniel Gustafsson Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se

pg_restore previously coped with overlength TOC-file lines using some complicated logic to ignore additional bufferloads. While this isn't wrong, since we don't expect that the interesting part of a line would run to more than a dozen or so bytes, it's more complex than it needs to be. Use a StringInfo instead of a fixed-size buffer so that we can process long lines as single entities and thus not need the extra logic. Daniel Gustafsson Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se

Due to flaws in commit 3347c98, using WaitLatch() without WL_LATCH_SET could cause an assertion failure or crash. Repair. While here, also add a check that the latch we're switching to belongs to this backend, when changing from one latch to another. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com

tsearch_readline() saves the string pointer it returns to the caller for possible use in the associated error context callback. However, the caller will usually pfree that string sometime before it next calls tsearch_readline(), so that there is a window where an ereport will try to print an already-freed string. The built-in users of tsearch_readline() happen to all do that pfree at the bottoms of their loops, so that the window is effectively empty for them. However, this is not documented as a requirement, and contrib/dict_xsyn doesn't do it like that, so it seems likely that third-party dictionaries might have live bugs here. The practical consequences of this seem pretty limited in any case, since production builds wouldn't clobber the freed string immediately, besides which you'd not expect syntax errors in dictionary files being used in production. Still, it's clearly a bug waiting to bite somebody. Fix by pstrdup'ing the string to be saved for the error callback, and then pfree'ing it next time through. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se

We failed to pass down the query string to check_new_partition_bound, so that its attempts to provide error cursor positions were for naught; one must have the query string to get parser_errposition to do anything. Adjust its API to require a ParseState to be passed down. Also, improve the logic inside check_new_partition_bound so that the cursor points at the partition bound for the specific column causing the issue, when one can be identified. That part is also for naught if we can't determine the query position of the column with the problem. Improve transformPartitionBoundValue so that it makes sure that const-simplified partition expressions will be properly labeled with positions. In passing, skip calling evaluate_expr if the value is already a Const, which is surely the most common case. Alexandra Wang, Ashwin Agrawal, Amit Langote; reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/CACiyaSopZoqssfMzgHk6fAkp01cL6vnqBdmTw2C5_KJaFR_aMg@mail.gmail.com Discussion: https://postgr.es/m/CAJV4CdrZ5mKuaEsRSbLf2URQ3h6iMtKD=hik8MaF5WwdmC9uZw@mail.gmail.com

Harmonize behavior by moving reponsibility for fsyncing directories down into slru.c. In 10 and later, only the multixact directories were missed (see commit 1b02be2), and in older branches all SLRUs were missed. Back-patch to all supported releases. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGLtsTUOScnNoSMZ-2ZLv%2BwGh01J6kAo_DM8mTRq1sKdSQ%40mail.gmail.com

Commit fbeb9da, which added the tsearch_readline APIs, left t_readline() in place as a compatibility measure. But that function has been unused and deprecated for twelve years now, so that seems like enough time to remove it. Doing so, and merging t_readline's code into tsearch_readline, aids in making several useful improvements: * The hard-wired 4K limit on line length in tsearch data files is removed, by using a StringInfo buffer instead of a fixed-size buffer. * We can buy back the per-line palloc/pfree added by 3ea7e95 in the common case where encoding conversion is not required. * We no longer need a separate pg_verify_mbstr call, as that functionality was folded into encoding conversion some time ago. (We could have done some of this stuff while keeping t_readline as a separate API, but there seems little point, since there's no reason for anyone to still be using t_readline directly.) Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se

Zhijie Hou Discussion: https://postgr.es/m/ce2cd951fe9b448a9cda99dc1a871fb9@G08CNEXMBPEKD05.g08.fujitsu.local

This makes it possible for code outside snapmgr.c to examine the contents of this data structure. This commit does not add any code which actually does so; a subsequent commit will make that change. Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar. Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com

You can use this to view the contents of the time to XID mapping which the server maintains when old_snapshot_threshold != -1. Being able to view that information may be interesting for users, and it's definitely useful for figuring out whether the mapping is being maintained correctly. It isn't, so that will need to be fixed in a subsequent commit. Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar. Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com

- add pg_stat_progress_copy system view

simi force-pushed the copy-progress branch 2 times, most recently from bb528a2 to f9cb4fe Compare June 20, 2020 17:27

simi force-pushed the copy-progress branch 3 times, most recently from 1dd84b5 to 5b1dddd Compare June 21, 2020 11:23

simi force-pushed the copy-progress branch from 5b1dddd to 64e0785 Compare September 20, 2020 21:40

hlinnaka and others added 20 commits September 21, 2020 14:50

Standardize order of use strict and use warnings in Perl code

80fc96e

The standard order in PostgreSQL and other code is use strict first, but some code was uselessly inconsistent about this.

Copy editing: fix a bunch of misspellings and poor wording.

9436041

99% of this is docs, but also a couple of comments. No code changes. Justin Pryzby Discussion: https://postgr.es/m/20200919175804.GE30557@telsasoft.com

Doc: sync lobj.sgml's copy of testlo.c with the latter file.

fc5f107

Zhijie Hou Discussion: https://postgr.es/m/ce2cd951fe9b448a9cda99dc1a871fb9@G08CNEXMBPEKD05.g08.fujitsu.local

Initial work on COPY progress.

01a187a

Enhance copy progress with more info.

6d62a8d

- add pg_stat_progress_copy system view

Support STDIN, STDOUT. Add docs.

8eff7b4

simi force-pushed the copy-progress branch from 64e0785 to 8eff7b4 Compare September 24, 2020 19:04

simi closed this Aug 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial work on COPY progress - take two. #5

Initial work on COPY progress - take two. #5

Uh oh!

simi commented Jun 14, 2020

Uh oh!

Uh oh!

Initial work on COPY progress - take two. #5

Initial work on COPY progress - take two. #5

Uh oh!

Conversation

simi commented Jun 14, 2020

Uh oh!

Uh oh!