Implement `schemas.add` RPC method #3620

seancolsen · 2024-06-12T18:18:38Z

Notes

I went around in circles a bit with this PR, which is why it took a while. I'll describe my thought process chronologically in hopes that it might be useful as these same problems could potentially apply to other DB objects as well...

First I proceeded to remove the if_not_exists parameter from msar.create_schema because I didn't want the additional complexity in there. I wanted to default to not using IF NOT EXISTS for the sake of simplicity. This is in line with comments in this meeting (starting at 21:10) on which @mathemancer and @Anish9901 were supportive. I wanted to raise an exception if the supplied schema name already existed.
But then I noticed that our codebase actually made use of that if_not_exists parameter in a few places for internal schemas and creating ephemeral schemas on the fly for type inference. Ugh!
My initial instinct was to modify the IF NOT EXISTS code locations by resorting to calling raw SQL from within the service layer (instead of calling functions). I had a tricky time figuring out how to do that cleanly with SQLAlchemy. I imagine it's possible, but it didn't seem to fit our patterns, and I didn't want to fiddle with it too much.
After some hemming and hawing, I decided to retain support for if_not_exists, in msar.create_schema while also adding support for setting descriptions on schemas within the same function. This choice sent me down a rabbit hole, making me wish I had trusted my early intuition on if_not_exists more strongly.
The msar.create_schema function got more and more complex as I discovered more edge cases and inconsistencies. If it can return an existing schema, then we need to make sure that NULL description values don't overwrite existing description. Ok fine. Then for the front end's sake, we should actually be fetching and returning the description. Ok. But that means the return value gets much more complex. It should be an object instead of a simple oid. And if it's an object, it ought to match the structure of the objects returned from msar.get_schemas, meaning it should supply a table_count property too. Ugh. So I started modifying msar.get_schemas to accept a sch_oid filter parameter which would allow me to compose that function to reliably get the full schema details in a consistent manner. But then I just threw up my hands and said, "this is ridiculous!"
So I decided to split the function into two:
- msar.create_schema_if_not_exists(sch_name text)
  
  and
- msar.create_schema(sch_name text, description text DEFAULT '')
This means you can use IF NOT EXISTS OR you can supply a description — but you can't do both at the same time. And at the API layer there's no way to use IF NOT EXISTS.

This approach has some limitations. But compared to cramming all that logic in one function, it's much simpler and less prone to weird edge cases and bugs.

Checklist

My pull request has a descriptive title (not a vague title like Update index.md).
My pull request targets the develop branch of the repository
My commit messages follow best practices.
My code follows the established code style of the repository.
I added tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

mathemancer

Okay, this PR has the required functionality and doesn't seem to add any bugs. So, we'll merge it. However, I have some feedback on the SQL functions.

I disagree with your statement that the needed logic is so complicated that it can't all be in one function. I suspect you got tangled up using the actual IF NOT EXISTS SQL, because it doesn't give any feedback w.r.t. whether the schema already existed, or was newly created. This makes the branching for whether to add a comment or not cumbersome.

From another perspective, though, there is only one decision to make: Attempt all the logic (i.e., create the schema and comment), or none of it. All logic should be skipped if both are true:

The if_not_exists flag (is true), and
the the schema already exists.

In this case, we can't create the schema, don't want to comment on it, and don't want to throw an error. So, we just exit the function.

Now, suppose the opposite. I.e., at least one of the checks above fails. Then we should attempt to create the schema and comment on it, in that order.

If the schema already exists, we'll throw an error and exit (before attempting to comment)
If it doesn't already exist, we'll create it and comment on it.

While this is technically more branching, we don't actually have to implement or think about it. We're just letting nature take its course w.r.t. error handling. We also don't need the IF NOT EXISTS SQL clause anymore. Here's an example of what I mean, demonstrated via a function that would be (I think) an improvement:

CREATE OR REPLACE FUNCTION
msar.create_schema(
  sch_name text,
  description text,
  skip_preexisting boolean DEFAULT false
) RETURNS TEXT AS $$/*
Create a schema with an optional comment, returning its OID.

Args:
  sch_name: The unquoted name of the schema to be created.
  description: a comment to add to the schema, un-(quoted/escaped).
  skip_preexisting: Whether to skip the function logic if the schema already exists.
*/
DECLARE
  schema_oid oid  := to_regnamespace(quote_ident(sch_name));
  schema_preexists boolean := schema_oid IS NOT NULL;
BEGIN
  IF NOT (skip_preexisting AND schema_preexists) THEN
    -- No need for 'IF NOT EXISTS', since we want to throw an error here if the schema exists.
    EXECUTE format('CREATE SCHEMA %I', sch_name);
    schema_oid := to_regnamespace(quote_ident(sch_name));
    PERFORM msar.comment_on_schema(schema_oid, description);
  END IF;
  RETURN schema_oid;
END;
$$ LANGUAGE plpgsql;

I've changed if_not_exists to skip_preexisting since I think it's clearer.

As for why not just have multiple functions:

The single function puts all branching in one spot where it's visible.
I think the single function will be more extendable over time (e.g., if we want to create a schema and apply default permissions for it someday).

seancolsen added this to the Beta milestone Jun 12, 2024

seancolsen added the pr-status: review A PR awaiting review label Jun 12, 2024

seancolsen assigned mathemancer Jun 12, 2024

Add schemas.add RPC function

c6f994c

seancolsen force-pushed the schemas_add branch from ec3350f to c6f994c Compare June 12, 2024 18:21

Fix failing test

05eb176

mathemancer approved these changes Jun 14, 2024

View reviewed changes

mathemancer added this pull request to the merge queue Jun 14, 2024

Merged via the queue into develop with commit 78b6b31 Jun 14, 2024
37 checks passed

mathemancer deleted the schemas_add branch June 14, 2024 04:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `schemas.add` RPC method #3620

Implement `schemas.add` RPC method #3620

seancolsen commented Jun 12, 2024

mathemancer left a comment

Implement schemas.add RPC method #3620

Implement schemas.add RPC method #3620

Conversation

seancolsen commented Jun 12, 2024

Notes

Checklist

Developer Certificate of Origin

mathemancer left a comment

Choose a reason for hiding this comment

Implement `schemas.add` RPC method #3620

Implement `schemas.add` RPC method #3620