Skip to content

Conversation

@rheaton
Copy link
Contributor

@rheaton rheaton commented Nov 16, 2020

In the cases when the connection string is parsable, we scrub and remove the password prior to logging in.
In the cases where the connection string is un-parsable, we have removed it from the log messages.

Additionally, this PR adds (C Unit tests)[https://libcheck.github.io/check/] that can be run via make test_unit

If we decide that we like these unit tests, we should integrate them into the travis builds.

Fixes #466

Copy link
Collaborator

@DimCitus DimCitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this PR! We have some implementation details to change, as per my comments. The most important is that you removed the connection string entirely from some error messages, leaving our users in a position where they won't be able to understand and fix their situations.

Other than that, we don't need too free stack allocated memory buffers, those buffer initial value should be { 0 }, and I think we can do better with the “scrubbing” API. I think for the sake of user sanity, showing either stars or unicode bullets (•) would be best, so that they know we did parse and use their password and just don't want to expose their secrets in the clear in the logs.

About unicode bullets, unfortunately I am not sure if we can use that freely. Many systems didn't get the memo from 1991 yet. Same with IPv6. Let's not get started on SQL:2016.

{
log_fatal("Failed to determine monitor hostname when parsing "
"Postgres URI \"%s\"", config->monitor_pguri);
"Postgres URI");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we remove the connection string in the error message? How is the user going to figure out what's happening there in a way they can fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it here because I assumed that it was un-parsable. Do you think in the cases of un-parsable connection strings we should log it, even if it may reveal private information?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we failed to parse the monitor Postgres URI, there is nothing that can work in the deployment, and we want the user to fix that. How can they fix the situation without us showing what's wrong?

So while I have some level of sympathy to hiding private information in the connection string, my personal opinion is that allowing easy understanding of what's wrong in order to fix it takes priority. We don't know if the URI contains sensitive information: we failed to parse it. Let's at least show what we failed to parse, hopefully making it obvious what is to be fixed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a possible compromise be to print the mangled URI if and only if the logging level is debug or verbose? Most folks don't run in that configuration in production, and that's the place we feel that inapprioriately logging credentials is most sensitive. This way, if someone really needs to see what pg_auto_failover is tripping over, they can re-run in verbose mode, pick up the error, fix any mangling and also rotate credentials at that time?

I feel that this may be the right way to go, in that the biggest concern is credentials being silently printed to logs without the operator realizing. This way, we only print sensitive information to the log when we know that someone is explicitly choosing to look at the log.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good first step would be a version of this PR that omits the controversial points, and then we can open a new PR for what's left in your audit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look and make the changes, I've been in the process of moving, so I haven't been very available lately. :)

Comment on lines 37 to 40
char pg_autoctl_argv0[MAXPGPATH];
char pg_autoctl_program[MAXPGPATH];

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that with producing a new binary file we might need to move things in a shared space. Why cli_do_tmux.c though? This at least requires some comments about it, and I guess a better choice of a common module...

"The monitor SSL setup is ready and your current "
"connection string is \"%s\", you might need to update it",
config->monitor_pguri);
"The monitor SSL setup is ready, you might need to update it");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why are we removing the connection string entirely here? If the user has to update the monitor URI in the configuration file, which URI should they write in there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it back for this one :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait, this one is actually kind of surprising to me -- were we not able to parse the string? What are we actually doing here (I would expect it to return false).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_pguri_info_key_vals might have other bugs than a failure in PQconninfoParse...

}


char scrubbedConnectionString[MAXCONNINFO];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a default value of { 0 }.

	char scrubbedConnectionString[MAXCONNINFO] = { 0 };



char scrubbedConnectionString[MAXCONNINFO];
parse_and_scrub_connection_string(newPgURI, scrubbedConnectionString);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API returns a boolean to signal if something went wrong, let's preferably use it, or if you believe we can bypass checking the result, add a (void) prefix before the function call to make that intend clear to readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where we can't parse/scrub the connection string, what do you think would be the best option? Maybe print "Un-parsable connection string" instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either the user is impacted by the error and we need to help them in their journey to setup our software correctly, or they are not impacted and should not be distracted by our internal limits. Are the users impacted by this failure here? How badly?

log_debug("pg exists: %s", postgresInstanceExists ? "yes" : "no");
log_debug("pg is primary: %s", postgresInstanceIsPrimary ? "yes" : "no");

PQfreemem(scrubbedConnectionString);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no need.

Comment on lines 22 to 34
char pg_autoctl_argv0[MAXPGPATH];
char pg_autoctl_program[MAXPGPATH];
int pgconnect_timeout = 2; /* see also POSTGRES_CONNECT_TIMEOUT */

char *ps_buffer; /* will point to argv area */
size_t ps_buffer_size; /* space determined at run time */
size_t last_status_len; /* use to minimize length of clobber */

Semaphore log_semaphore; /* allows inter-process locking */

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to solve that problem would be to copy/paste the same definitions in the other main file that you introduce, in order to minimise changes?

Comment on lines 542 to 535
if (conninfo == NULL)
{
log_error("Failed to parse pguri \"%s\": %s", pguri, errmsg);
log_error("Failed to parse pguri: %s", errmsg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we failed to parse a connection string. Which connection string? How is the user going to be able to fix their setup now, if we refrain from showing them any clue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with my comment above -- do you think it's worth printing it here, even if it may reveal a secret to the logs? I was torn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the user fix the Postgres URI that we fail to parse if they don't know which URI it is? The only caller to that function is in cli_enable_disable in update_monitor_connection_string, which means the configuration file might be “corrupted” somehow. Yes, I think it's worth printing what we got.

Comment on lines 871 to 876
/*
* remove_password_from_parameters is a non-exported function that will remove the keyword
* and value if the keyword matches "password" for scrubbing a connection string in logs.
*/
void
remove_password_from_parameters(KeyVal *parameters)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a better way to approach this: what about introducing a KeyVal *secrets to the buildPostgresURIfromPieces API? When building the connection string it is then easy to replace secret parameters with a sequence of stars, or even a unicode bullet • (see https://en.wikipedia.org/wiki/Bullet_(typography)).

The KeyVal *secrets in buildPostgresURIfromPieces would take as input the keyword names that are secrets and implement the computation of their associated “bullets” value in the secrets->values[index] and use that in the final URI.

Copy link
Contributor Author

@rheaton rheaton Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe we can use the existing overrides option in parse_pguri_info_key_vals as well -- let me try it.

Are there other secret parameters other than password?

Copy link
Contributor Author

@rheaton rheaton Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looks like overrides get set even if the original parameters didn't have the value -- I'll try your idea :)

Failing test:

test_parse_and_scrub_connection_string_with_no_password_in_it:0: Assertion 'scrubbedPguri == "postgres://admin@monitor.local:9999/testdb?"' failed: scrubbedPguri == "postgres://admin@monitor.local:9999/testdb?password=****", "postgres://admin@monitor.local:9999/testdb?" == "postgres://admin@monitor.local:9999/testdb?"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you did publish the new code that is failing the test, and I don't know if the test should be updated or the code fixed, here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We implemented an update that uses overrides.

Comment on lines 922 to 923
parseUri = parse_pguri_info_key_vals(pguri, &overrides, &uriParams);
if (!parseUri)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style issue: we would usually just test the return value from calling the function:

    if (!parse_pguri_info_key_vals(pguri, &overrides, &uriParams))
    {
        log_error(...);
        return false;
    }

For debugging purposes, it might be better to have the return value in a variable that we can inspect, in that case the variable name that we usually would prefer is success.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We updated to the inlined style.

@rheaton rheaton force-pushed the pw-in-log branch 5 times, most recently from 6b697e0 to 31fdcc0 Compare January 12, 2021 00:02
@rheaton
Copy link
Contributor Author

rheaton commented Jan 12, 2021

@DimCitus Hi Dimitri! I think we've addressed all of your concerns for this PR, but I am getting some flakiness on one of the python tests.

test_basic_operation.test_019_run_secondary is giving me the following issue:

======================================================================
ERROR: test_basic_operation.test_019_run_secondary
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/python/3.7.6/lib/python3.7/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/build/rheaton/pg_auto_failover/tests/test_basic_operation.py", line 226, in test_019_run_secondary
    assert node2.has_needed_replication_slots()
  File "/home/travis/build/rheaton/pg_auto_failover/tests/pgautofailover_utils.py", line 1480, in has_needed_replication_slots
    current_slots = self.list_replication_slot_names()
  File "/home/travis/build/rheaton/pg_auto_failover/tests/pgautofailover_utils.py", line 1460, in list_replication_slot_names
    raise e
  File "/home/travis/build/rheaton/pg_auto_failover/tests/pgautofailover_utils.py", line 1456, in list_replication_slot_names
    result = self.run_sql_query(query)
  File "/home/travis/build/rheaton/pg_auto_failover/tests/pgautofailover_utils.py", line 362, in run_sql_query
    with psycopg2.connect(self.connection_string()) as conn:
  File "/opt/python/3.7.6/lib/python3.7/site-packages/psycopg2/__init__.py", line 130, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL:  sorry, too many clients already

Do you know of connections that might be left open? The test passes when I run test_basic_operation in isolation, but fails when in suite (make run-test TEST=single or make run-test). Any ideas of where I should start hunting around for this? I'm not seeing anything intuitively that would have trigger this failure in this PR.

@rheaton rheaton force-pushed the pw-in-log branch 2 times, most recently from a3762ce to 95af913 Compare January 12, 2021 00:51
@DimCitus
Copy link
Collaborator

Hi folks, @rheaton,

I just did a review. There are still some small problems, that we could fix very easily (log messages, variable default assignment styles, etc). That said I would like to first review/finish/merge the password editing parts, and then in a separate PR think about the added unit tests. Can you please split the work there in two?

@rheaton
Copy link
Contributor Author

rheaton commented Jan 25, 2021

@DimCitus I will split it into another PR -- based off this branch. We did use the unit-tests to TDD some of the code and only added the missing ones for characterization tests to understand how things were working. I did not attempt to backfill all unit-tests at once.

rheaton pushed a commit to rheaton/pg_auto_failover that referenced this pull request Jan 25, 2021
- Tests added for hapostgres#512
- Some Characterization tests added
rheaton pushed a commit to rheaton/pg_auto_failover that referenced this pull request Jan 26, 2021
- Tests added for hapostgres#512
- Some Characterization tests added
@rheaton
Copy link
Contributor Author

rheaton commented Jan 26, 2021

@DimCitus I've pulled out the unit tests (and I'll deal with that PR once we are through this one 🙂). I'm still getting the nose test failure from test 019. Same issue as before. Any idea what could have been added that would introduce the "too many connections" issue? #512 (comment)

Additionally, let's get those other small changes in!

Thank you,
Rachel

@rheaton
Copy link
Contributor Author

rheaton commented Apr 13, 2021

@DimCitus I've made updates to this PR to make it as slim as possible. There is some more work to be done to completely close out the issue, but implementation probably should be discussed.
In the situation where there could be a failure to get the scrubbed uri, I've logged a message slightly differently. In most cases, the function won't return false.
Let me know if you need anything else updated. Thanks!

Copy link
Collaborator

@DimCitus DimCitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, I think with a pass of make indent and the Semaphore declaration clean-up we should be good to merge!

int semId;
} Semaphore;

Semaphore log_semaphore; /* allows inter-process locking */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change part of this PR?

Comment on lines 33 to 35
Semaphore log_semaphore; /* allows inter-process locking */


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I don't think this PR should change that.

@rheaton
Copy link
Contributor Author

rheaton commented Apr 19, 2021

@DimCitus I'm not seeing any changes with make indent and I've pushed up the removal of the unnecessary change

Copy link
Collaborator

@DimCitus DimCitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of whitespace changes that I am not comfortable with, still. The most important are around the static SQL string definitions, but also white lines that have no reason to be added or removed. It might be just a setup or tooling problem? you might have to use a different editor to finish up the PR unfortunately...

if (!parse_and_scrub_connection_string(config->monitor_pguri,
scrubbedConnectionString))
{
log_error("Monitor connection string is unparseable");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better written in the same style as the other errors messages, such as: "Failed to parse the monitor connection string".

Comment on lines 964 to 966
/* */
/* Look for a populated password connection parameter */
/* */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange formatting.

if (
strcmp(option->keyword, "password") == 0 &&
option->val != NULL &&
strcmp(option->val, "") != 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use IS_EMPTY_STRING_BUFFER here?

#define ERRCODE_OBJECT_IN_USE "55006"

static char * ConnectionTypeToString(ConnectionType connectionType);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this PR changes the white lines here, and in the following lines in this file?

Comment on lines 1556 to 1607
char *sqlTemplate =
/*
* We could simplify the writing of this query, but we prefer that it
* looks as much as possible like the query used in
* pgsql_replication_slot_maintain() so that we can maintain both
* easily.
*/
"WITH nodes(slot_name, lsn) as ("
" SELECT '" REPLICATION_SLOT_NAME_DEFAULT "_' || id, lsn"
" FROM (%s) as sb(id, lsn) "
"), \n"
"dropped as ("
" SELECT slot_name, pg_drop_replication_slot(slot_name) "
" FROM pg_replication_slots pgrs LEFT JOIN nodes USING(slot_name) "
" WHERE nodes.slot_name IS NULL "
" AND ( slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" OR slot_name ~ '" REPLICATION_SLOT_NAME_DEFAULT "' )"
" AND not active"
" AND slot_type = 'physical'"
"), \n"
"created as ("
"SELECT c.slot_name, c.lsn "
" FROM nodes LEFT JOIN pg_replication_slots pgrs USING(slot_name), "
" LATERAL pg_create_physical_replication_slot(slot_name, true) c"
" WHERE pgrs.slot_name IS NULL "
") \n"
"SELECT 'create', slot_name, lsn FROM created "
" union all "
"SELECT 'drop', slot_name, NULL::pg_lsn FROM dropped";
char *sqlTemplate =
/*
* We could simplify the writing of this query, but we prefer that it
* looks as much as possible like the query used in
* pgsql_replication_slot_maintain() so that we can maintain both
* easily.
*/
"WITH nodes(slot_name, lsn) as ("
" SELECT '" REPLICATION_SLOT_NAME_DEFAULT "_' || id, lsn"
" FROM (%s) as sb(id, lsn) "
"), \n"
"dropped as ("
" SELECT slot_name, pg_drop_replication_slot(slot_name) "
" FROM pg_replication_slots pgrs LEFT JOIN nodes USING(slot_name) "
" WHERE nodes.slot_name IS NULL "
" AND ( slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" OR slot_name ~ '" REPLICATION_SLOT_NAME_DEFAULT "' )"
" AND not active"
" AND slot_type = 'physical'"
"), \n"
"created as ("
"SELECT c.slot_name, c.lsn "
" FROM nodes LEFT JOIN pg_replication_slots pgrs USING(slot_name), "
" LATERAL pg_create_physical_replication_slot(slot_name, true) c"
" WHERE pgrs.slot_name IS NULL "
") \n"
"SELECT 'create', slot_name, lsn FROM created "
" union all "
"SELECT 'drop', slot_name, NULL::pg_lsn FROM dropped";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a whitespace change that we should not have in this PR, please fix.

Comment on lines 1626 to 1678
char *sqlTemplate =
"WITH nodes(slot_name, lsn) as ("
" SELECT '" REPLICATION_SLOT_NAME_DEFAULT "_' || id, lsn"
" FROM (%s) as sb(id, lsn) "
"), \n"
"dropped as ("
" SELECT slot_name, pg_drop_replication_slot(slot_name) "
" FROM pg_replication_slots pgrs LEFT JOIN nodes USING(slot_name) "
" WHERE nodes.slot_name IS NULL "
" AND slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" AND not active"
" AND slot_type = 'physical'"
"), \n"
"advanced as ("
"SELECT a.slot_name, a.end_lsn"
" FROM pg_replication_slots s JOIN nodes USING(slot_name), "
" LATERAL pg_replication_slot_advance(slot_name, lsn) a"
" WHERE nodes.lsn <> '0/0' and nodes.lsn >= s.restart_lsn "
"), \n"
"created as ("
"SELECT c.slot_name, c.lsn "
" FROM nodes LEFT JOIN pg_replication_slots pgrs USING(slot_name), "
" LATERAL pg_create_physical_replication_slot(slot_name, true) c"
" WHERE pgrs.slot_name IS NULL "
") \n"
"SELECT 'create', slot_name, lsn FROM created "
" union all "
"SELECT 'drop', slot_name, NULL::pg_lsn FROM dropped "
" union all "
"SELECT 'advance', slot_name, end_lsn FROM advanced ";
char *sqlTemplate =
"WITH nodes(slot_name, lsn) as ("
" SELECT '" REPLICATION_SLOT_NAME_DEFAULT "_' || id, lsn"
" FROM (%s) as sb(id, lsn) "
"), \n"
"dropped as ("
" SELECT slot_name, pg_drop_replication_slot(slot_name) "
" FROM pg_replication_slots pgrs LEFT JOIN nodes USING(slot_name) "
" WHERE nodes.slot_name IS NULL "
" AND slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" AND not active"
" AND slot_type = 'physical'"
"), \n"
"advanced as ("
"SELECT a.slot_name, a.end_lsn"
" FROM pg_replication_slots s JOIN nodes USING(slot_name), "
" LATERAL pg_replication_slot_advance(slot_name, lsn) a"
" WHERE nodes.lsn <> '0/0' and nodes.lsn >= s.restart_lsn "
"), \n"
"created as ("
"SELECT c.slot_name, c.lsn "
" FROM nodes LEFT JOIN pg_replication_slots pgrs USING(slot_name), "
" LATERAL pg_create_physical_replication_slot(slot_name, true) c"
" WHERE pgrs.slot_name IS NULL "
") \n"
"SELECT 'create', slot_name, lsn FROM created "
" union all "
"SELECT 'drop', slot_name, NULL::pg_lsn FROM dropped "
" union all "
"SELECT 'advance', slot_name, end_lsn FROM advanced ";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a whitespace change that we should not have in this PR, please fix.

Comment on lines 2424 to 2486
char *sql =
/*
* Make it so that we still have the current WAL LSN even in the case
* where there's no replication slot in use by any standby.
*
* When on the primary, we might have multiple standby nodes connected.
* We're good when at least one of them is either 'sync' or 'quorum'.
* We don't check individual replication slots, we take the "best" one
* and report that.
*/
"select pg_is_in_recovery(),"
" coalesce(rep.sync_state, '') as sync_state,"
" case when pg_is_in_recovery()"
" then coalesce(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())"
" else pg_current_wal_flush_lsn()"
" end as current_lsn,"
" pg_control_version, catalog_version_no, system_identifier"
" from (values(1)) as dummy"
" full outer join"
" (select pg_control_version, catalog_version_no, system_identifier "
" from pg_control_system()"
" )"
" as control on true"
" full outer join"
" ("
" select sync_state"
" from pg_replication_slots slot"
" join pg_stat_replication rep"
" on rep.pid = slot.active_pid"
" where slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" or slot_name = '" REPLICATION_SLOT_NAME_DEFAULT "' "
"order by case sync_state "
" when 'quorum' then 4 "
" when 'sync' then 3 "
" when 'potential' then 2 "
" when 'async' then 1 "
" else 0 end "
" desc limit 1"
" ) "
"as rep on true";
char *sql =
/*
* Make it so that we still have the current WAL LSN even in the case
* where there's no replication slot in use by any standby.
*
* When on the primary, we might have multiple standby nodes connected.
* We're good when at least one of them is either 'sync' or 'quorum'.
* We don't check individual replication slots, we take the "best" one
* and report that.
*/
"select pg_is_in_recovery(),"
" coalesce(rep.sync_state, '') as sync_state,"
" case when pg_is_in_recovery()"
" then coalesce(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())"
" else pg_current_wal_flush_lsn()"
" end as current_lsn,"
" pg_control_version, catalog_version_no, system_identifier"
" from (values(1)) as dummy"
" full outer join"
" (select pg_control_version, catalog_version_no, system_identifier "
" from pg_control_system()"
" )"
" as control on true"
" full outer join"
" ("
" select sync_state"
" from pg_replication_slots slot"
" join pg_stat_replication rep"
" on rep.pid = slot.active_pid"
" where slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" or slot_name = '" REPLICATION_SLOT_NAME_DEFAULT "' "
"order by case sync_state "
" when 'quorum' then 4 "
" when 'sync' then 3 "
" when 'potential' then 2 "
" when 'async' then 1 "
" else 0 end "
" desc limit 1"
" ) "
"as rep on true";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a whitespace change that we should not have in this PR, please fix.

Comment on lines 2608 to 2639
char *sql =
" select $1::pg_lsn <= flush_lsn, flush_lsn "
" from pg_replication_slots slot"
" join pg_stat_replication rep"
" on rep.pid = slot.active_pid"
" where ( slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" or slot_name = '" REPLICATION_SLOT_NAME_DEFAULT "') "
" and sync_state in ('sync', 'quorum') "
"order by flush_lsn desc limit 1";
char *sql =
" select $1::pg_lsn <= flush_lsn, flush_lsn "
" from pg_replication_slots slot"
" join pg_stat_replication rep"
" on rep.pid = slot.active_pid"
" where ( slot_name ~ '" REPLICATION_SLOT_NAME_PATTERN "' "
" or slot_name = '" REPLICATION_SLOT_NAME_DEFAULT "') "
" and sync_state in ('sync', 'quorum') "
"order by flush_lsn desc limit 1";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a whitespace change that we should not have in this PR, please fix.

static void reload_configuration(Monitor *monitor);
static bool monitor_ensure_configuration(Monitor *monitor);


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a whitespace change that we should not have in this PR, please fix.

@rheaton rheaton force-pushed the pw-in-log branch 2 times, most recently from e7b7cb7 to 6a1260a Compare April 20, 2021 19:24
@rheaton
Copy link
Contributor Author

rheaton commented Apr 20, 2021

Pushed up fixes. My indent was doing strange things. Appears to be working alright now!

Copy link
Collaborator

@DimCitus DimCitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple very small changes and we can merge!

}
else
{
log_info(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now notice we return false in that case. I suppose we should then either log_error or maybe log_warn if appropriate (not sure it would be).

if (!parse_and_scrub_connection_string(config->monitor_pguri,
scrubbedConnectionString))
{
log_error("Monitor connection string is unparseable");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another error message that would better read "Failed to parse the monitor connection string".

- Exceptions are unparsable monitor connection strings, and
connection strings that are too long
@rheaton
Copy link
Contributor Author

rheaton commented Apr 21, 2021

Updated

@DimCitus DimCitus merged commit cfa5b2c into hapostgres:master Apr 21, 2021
@DimCitus DimCitus added this to the Sprint 2021 W16 W17 milestone Apr 21, 2021
@DimCitus DimCitus added the enhancement New feature or request label Apr 21, 2021
@DimCitus
Copy link
Collaborator

Thanks for your contribution @rheaton ; and all the fixes you accepted to go through for this PR to get merged! I appreciate your work here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Passwords are logged in plain text

4 participants