Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash in SQL_ASCII DBs in PyString_AsString #43

Merged
merged 4 commits into from
May 11, 2024

Conversation

mfenniak
Copy link
Collaborator

When a PostgreSQL database is running in SQL_ASCII encoding, anything in multicorn that touches PyString_AsString or PyString_AsStringAndSize will encounter an encoding error because "SQL_ASCII" isn't a valid Python encoding; and then the errorCheck() function will attempt to get the error that occurred, call reportException, which calls PyString_AsString, resulting in infinite recursion until the process segfaults.

It seems that there's already a workaround in the form of the getPythonEncodingName name which translates SQL_ASCII to "ascii", used in some functions, but not in these ones; so I've applied those here.

Reproduction steps:

  1. Create a DB with initdb -E SQL_ASCII
  2. Run the a schema import from TestForeignDataWrapper, or run a DELETE command, for example:
CREATE EXTENSION multicorn;
CREATE SERVER multicorn_srv foreign data wrapper multicorn options (
    wrapper 'multicorn.testfdw.TestForeignDataWrapper'
);
CREATE FOREIGN TABLE testmulticorn (
    test1 character varying,
    test2 character varying
) server multicorn_srv options (
    option1 'option1'
);
DELETE FROM testmulticorn;

Behavior before this patch: segfault; when debugged this codepath can be found (for the schema import scenario, if I remember correctly):

#130945 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130946 0x00007fd6ff0fe6b2 in PyString_AsString (unicode=unicode@entry=0x7fd6fe8efbf0) at src/python.c:173
#130947 0x00007fd6ff0fe294 in reportException (pErrType=<optimized out>, pErrValue=<optimized out>, pErrTraceback=<optimized out>) at src/errors.c:50
... repeat a few thousand times ...
#130948 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130949 0x00007fd6ff0fe6b2 in PyString_AsString (unicode=unicode@entry=0x7fd6fe8efc70) at src/python.c:173
#130950 0x00007fd6ff0fe294 in reportException (pErrType=<optimized out>, pErrValue=<optimized out>, pErrTraceback=<optimized out>) at src/errors.c:50
#130951 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130952 0x00007fd6ff0fe6b2 in PyString_AsString (unicode=unicode@entry=0x7fd6fe8e78c0) at src/python.c:173
#130953 0x00007fd6ff0fe294 in reportException (pErrType=<optimized out>, pErrValue=<optimized out>, pErrTraceback=<optimized out>) at src/errors.c:50
#130954 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130955 0x00007fd6ff0fe7fb in getClass (className=className@entry=0x7fd6fe8a4450) at src/python.c:210
#130956 0x00007fd6ff0fe9b5 in getClassString (className=className@entry=0x558da58a7820 "dynamodbfdw.dynamodbfdw.DynamoFdw") at src/python.c:310
#130957 0x00007fd6ff1042f1 in multicorn_validator (fcinfo=<optimized out>) at src/multicorn.c:225
#130958 0x0000558da50c715a in FunctionCall2Coll ()
#130959 0x0000558da50c780d in OidFunctionCall2Coll ()
#130960 0x0000558da4de1c9d in transformGenericOptions ()
#130961 0x0000558da4de27b9 in CreateForeignServer ()
#130962 0x0000558da4fa3c2c in ProcessUtilitySlow.isra.0 ()
#130963 0x0000558da4fa2ce0 in standard_ProcessUtility ()
#130964 0x0000558da4fa154f in PortalRunUtility ()
#130965 0x0000558da4fa166b in PortalRunMulti ()
#130966 0x0000558da4fa1ac1 in PortalRun ()
#130967 0x0000558da4f9e1a8 in exec_simple_query ()
#130968 0x0000558da4fa047a in PostgresMain ()
#130969 0x0000558da4f1f801 in ServerLoop ()
#130970 0x0000558da4f207eb in PostmasterMain ()
#130971 0x0000558da4ca4c8a in main ()

Behavior after: both an import and a delete works without segfault, at least.

@ShaheedHaque
Copy link
Collaborator

ShaheedHaque commented May 11, 2024

I've not tested this, but am happy to merge in a few days unless vetoed by @luss.

Question: let's say your efforts under #44 were merged. What would a new test based on the notes above look like? (This may be better answered in the PR for #44 to avoid blocking this PR).

@luss
Copy link
Contributor

luss commented May 11, 2024 via email

@@ -169,7 +169,7 @@ char *
PyString_AsString(PyObject *unicode)
{
char *rv;
PyObject *o = PyUnicode_AsEncodedString(unicode, GetDatabaseEncodingName(), NULL);
PyObject *o = PyUnicode_AsEncodedString(unicode, getPythonEncodingName(), NULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise that getPythonEncodingName() predates this commit, and my C is VERY rusty, but technically, is that function not buggy in returning the address of a local (i.e. temporary)? IMHO, since this commit will greatly increase the use of that function, it would be nice to move the string "ascii" to file scope, and return it's address to prevent an over-eager compiler messing things up. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment. I've moved the constant to a file scoped static and retested it.

@mfenniak
Copy link
Collaborator Author

Question: let's say your efforts under #44 were merged. What would a new test based on the notes above look like? (This may be better answered in the PR for #44 to avoid blocking this PR).

In #46 I've had to code the test DB to run under "UTF8" in order to avoid this bug --

REGRESS_OPTS = --inputdir=test-$(PYTHON_TEST_VERSION) --encoding=UTF8
. So a regression test to cover this case would either be removing that hack 🤣, or even better, running the test suites under a small collection of targeted/supported encodings.

I'll likely remove that in #46 after merging this.

Based upon the positive reviews and having addressed the C local comment, I'm going to make use of @luss's newly granted merge rights to go ahead with this. Thanks!

@mfenniak mfenniak merged commit ea5b3f0 into pgsql-io:main May 11, 2024
@mfenniak mfenniak deleted the error-handling-bugfix branch May 11, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants