Implement `Table.delete_rows`. #7709

radeusgd · 2023-08-31T15:17:53Z

Pull Request Description

Closes Add Table.delete_rows function allowing deleting rows in a database table. #7238
Aligns update_database_table to a more consistent and clearer API - update_rows.
Adds a truncate_table helper function, to pair up with drop_table. Both are PRIVATE for now.
Adds tests for NULLs in keys in update_rows and delete_rows.
- The behaviour is sometimes unexpected, so instead these fail with Null_Values_In_Key_Columns.
Adds a workaround for The DefaultStackTraceElementObject.getExecutableName breaks the contract of InteropLibrary oracle/graal#7359
Adds a workaround for a related bug where a stack frame has no name (its rootNode.getName() == null).
- I could not track down this bug to provide a neat repro.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

test/Table_Tests/src/Database/Upload_Spec.enso

Co-authored-by: GregoryTravis <greg.m.travis@gmail.com>

jdunkerley · 2023-09-04T12:31:11Z

distribution/lib/Standard/Database/0.0.0-dev/src/Extensions/Upload_In_Memory_Table.enso

+    _ = [source_table, update_action, key_columns, error_on_missing_columns, on_problems]
+    Error.throw (Illegal_Argument.Error "Table.update_rows modifies the underlying table, so it is only supported for Database tables - in-memory tables are immutable.")


I think we might want this behavior to work.

It's effectively a find and replace operation.

I'm not sure this is a good idea. I appreciate a potential usefulness of such operation, but the practical API is vastly different.

In the Database, what we do is we mutably modify the target table (and return it back). This is a side-effecting operation. From now on, all references to the table will have updated data.

In in-memory, Tables are immutable, so all this would do is return a new table with modified contents. A bit like DB dry-run (but currently our dry-run returns the original table unchanged - so even here this does not match).

We aim for our APIs to be compatible and mostly interchangeable between DB and in-memory, of course to the extent possible.

I'm afraid that introducing an operation which behaves in a similar but fundamentally different and subtle way is asking for trouble - when switching between in-memory and Database the users may not notice the subtle differences and get bad results.

I'm not completely sure. To be honest, the difference is indeed rather small and it feels like in practice in many workflows it will not be felt. So it seems like this could work.

But I think it can introduce hard to understand errors (with the unexpected behaviour change).

IMHO it is better to have a clear distinction between side-effecting WRITE operations (like this one, or writing to file etc. that update some _external state), and normal 'immutable' transformations creating a new modified data structure (which this would have to become for the in-memory approach to work).

If we really want the 'search and replace' behaviour, maybe we should introduce a separate method, that returns a new modified table with these changes applied? We could then implement it both for in-memory and Database and ensure that in both cases they do not modify the original table, but just return a new one (i.e. in Database it would be a modified query which does the replace, essentially a kind of a VIEW; I guess it may not be most efficient in that backend, but at least would be consistent).

What do you think? IMO creating a single operation that for one backend is a side-effecting WRITE and for another is an immutable TRANSFORM is risky. If anything, I'd simply create a separate operation for the TRANSFORM scenario.

It's a bit extreme, but we could append _write to every method that writes to the database, so it would always be clear.

jdunkerley · 2023-09-04T16:32:24Z

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Base_Generator.enso

@@ -457,6 +457,8 @@ generate_query dialect query = case query of
    Query.Drop_Table name if_exists ->
        maybe_if_exists = if if_exists then Builder.code "IF EXISTS " else Builder.empty
        Builder.code "DROP TABLE " ++ maybe_if_exists ++ dialect.wrap_identifier name
+    Query.Truncate_Table name ->
+        Builder.code "DELETE FROM " ++ dialect.wrap_identifier name


Why not TRUNCATE TABLE ...?

Not available in SQLite. I guess I could try to do it in Postgres.

radeusgd · 2023-09-05T17:52:13Z

I checked the following tests to see how presence of NULLs affects update_rows:

        Test.specify "TODO 1" <|
            t1 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s1 = source_table_builder [["X", [2]], ["Y", ['zzz']]]
            r1 = t1.update_rows s1 key_columns=["X"] update_action=Update_Action.Align_Records
            m1 = r1.read . order_by ["X"]
            m1.at "X" . to_vector . should_equal [2]
            m1.at "Y" . to_vector . should_equal ['zzz']

        Test.specify "TODO 1.5" <|
            t2 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s2 = source_table_builder [["X", [2]], ["Y", ['zzz']]]
            r2 = t2.update_rows s2 key_columns=["X"] update_action=Update_Action.Update_Or_Insert
            m2 = r2.read . order_by ["X"]
            m2.at "X" . to_vector . should_equal [Nothing, 1, 2, 3]
            m2.at "Y" . to_vector . should_equal ['d', 'a', 'zzz', 'c']

        Test.specify "TODO 2" <|
            t1 = target_table_builder [["X", [1, 2, 3, 4]], ["Y", ['a', 'b', 'c', 'd']]]
            s1 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r1 = t1.update_rows s1 key_columns=["X"] update_action=Update_Action.Align_Records
            m1 = r1.read . order_by ["X"]
            m1.at "X" . to_vector . should_equal [Nothing, 2]
            m1.at "Y" . to_vector . should_equal ['www', 'zzz']

        Test.specify "TODO 2.5" <|
            t2 = target_table_builder [["X", [1, 2, 3, 4]], ["Y", ['a', 'b', 'c', 'd']]]
            s2 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r2 = t2.update_rows s2 key_columns=["X"] update_action=Update_Action.Update_Or_Insert
            m2 = r2.read . order_by ["X"]
            m2.at "X" . to_vector . should_equal [Nothing, 1, 2, 3, 4]
            m2.at "Y" . to_vector . should_equal ['www', 'a', 'zzz', 'c', 'd']

        Test.specify "TODO 3" <|
            t1 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s1 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r1 = t1.update_rows s1 key_columns=["X"] update_action=Update_Action.Align_Records
            m1 = r1.read . order_by ["X"]
            m1.at "X" . to_vector . should_equal [Nothing, 2]
            m1.at "Y" . to_vector . should_equal ['d','zzz']

        Test.specify "TODO 3.5" <|
            t2 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s2 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r2 = t2.update_rows s2 key_columns=["X"] update_action=Update_Action.Update_Or_Insert
            m2 = r2.read . order_by ["X"]
            m2.at "X" . to_vector . should_equal [Nothing, 1, 2, 3]
            m2.at "Y" . to_vector . should_equal ['www', 'a', 'zzz', 'c']

The results are:

    - [FAILED] TODO 1 [71ms]
        Reason: [Nothing, 2] did not equal [2]; lengths differ (2 != 1)  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:675:13-52).
    - TODO 1.5 [78ms]
    - [FAILED] TODO 2 [72ms]
        Reason: [2, 3] did not equal [Nothing, 2]; first difference at index 0  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:691:13-61).
    - [FAILED] TODO 2.5 [70ms]
        Reason: [1, 2, 3, 4] did not equal [Nothing, 1, 2, 3, 4]; lengths differ (4 != 5)  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:699:13-70).
    - [FAILED] TODO 3 [89ms]
        Reason: [Nothing, 2, 3] did not equal [Nothing, 2]; lengths differ (3 != 2)  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:707:13-61).
    - [FAILED] TODO 3.5 [63ms]
        Reason: ['d', 'a', 'zzz', 'www'] did not equal ['www', 'a', 'zzz', 'c']; first difference at index 0  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:716:13-73).

We can see that Align_Records behaves in unexpected ways and so does Upsert if the source table contains NULL values. That's why at least for now I plan to error if source contains null keys.

radeusgd · 2023-09-06T12:20:51Z

...java/org/enso/interpreter/node/expression/builtin/interop/generic/GetExecutableNameNode.java

+    // Workaround for https://github.com/oracle/graal/issues/7359
+    if (!functionsLibrary.hasExecutableName(function)) {
+      err.enter();
+      Builtins builtins = EnsoContext.get(this).getBuiltins();
+      throw new PanicException(
+          builtins.error().makeTypeError(builtins.function(), function, "function"), this);
+    }


Here's a, possibly temporary, workaround for oracle/graal#7359

Essentially, it is possible that getExecutableName will not throw but still return null, so a compiler error on null will crash the interpreter.

radeusgd · 2023-09-06T12:21:14Z

distribution/lib/Standard/Test/0.0.0-dev/src/Test_Reporter.enso

-                    builder.append ('\n            <error message="' + escaped_message + '">\n')
+                    builder.append ('\n            <failure message="' + escaped_message + '">\n')
                    if details.is_nothing.not then
                        ## We duplicate the message, because sometimes the
                           attribute is skipped if the node has any content.
                        builder.append (escape_xml msg)
                        builder.append '\n'
                        builder.append (escape_xml details)
-                    builder.append '</error>\n'
+                    builder.append '</failure>\n'


I was testing visualizations of JUnit XML and it seems that failure tag is more appropriate here.

radeusgd · 2023-09-06T12:23:17Z

distribution/lib/Standard/Base/0.0.0-dev/src/Runtime.enso

+    name = Panic.catch Type_Error (Polyglot.get_executable_name el) _->
+        # Workaround for a bug where some stack frames' root nodes do not have a name.
+        "<unknown node>"


This was sometimes failing, around SQL_Type_Reference.get, do_fetch and Connection.fetch_columns. Whenever there were warnings attached to the query result, one entry in the stack trace had rootNode.getName() == null and it was crashing the stack trace operation.

I think we better have a stack trace with an <unknown node> entry, than prevent adding warnings.

Ideally, I'd like to figure out why rootNode.getName() == null in the first place, but I could not easily find a minimal example.

@JaroslavTulach any ideas?

JaroslavTulach

Great to see a fix proposed to GraalVM: oracle/graal#7360

radeusgd added 16 commits August 31, 2023 17:09

WIP - adding method stubs, basic checks

76b6080

WIP: clarifying docs, added truncate_table

20a67d8

change to update_rows, clarifying docs

7564b41

Replace removed update_database_table with flipped update_rows

b24cd69

fix

ff82797

WIP: tests for delete_rows

ce9ba1b

added tests for various delete_rows scenarios

b3e0043

fix JUnit format?

e0659e8

fixes

641dcf5

add one more test

e7c66e2

WIP

8fd3bab

basic implementation of delete_rows

efb80fe

check for duplicates if needed

9d61406

check unnecessary anymore

c4a3fb6

avoid temporary table in dry run if source=Database

0af8879

fix, docs

8850c03

radeusgd requested a review from jdunkerley as a code owner August 31, 2023 15:17

radeusgd self-assigned this Aug 31, 2023

radeusgd requested a review from GregoryTravis as a code owner August 31, 2023 15:17

radeusgd added 3 commits August 31, 2023 17:18

changelog

8ddfa1c

refactor: DRY

f504d13

shorten test table name to make it palatable for Postgres

7aa4a3d

GregoryTravis approved these changes Aug 31, 2023

View reviewed changes

test/Table_Tests/src/Database/Upload_Spec.enso Outdated Show resolved Hide resolved

test/Table_Tests/src/Database/Upload_Spec.enso Outdated Show resolved Hide resolved

radeusgd and others added 4 commits August 31, 2023 17:57

Update test as suggested in code review

f8cc423

Co-authored-by: GregoryTravis <greg.m.travis@gmail.com>

Merge branch 'develop' into wip/radeusgd/7238-table-delete-rows

30dafc6

fix suggested test change

24ca089

Merge branch 'develop' into wip/radeusgd/7238-table-delete-rows

9afe68c

jdunkerley approved these changes Sep 4, 2023

View reviewed changes

radeusgd added 2 commits September 5, 2023 14:13

special handling for Postgres truncate

cb285ed

Merge branch 'develop' into wip/radeusgd/7238-table-delete-rows

5c6bc54

radeusgd added 4 commits September 5, 2023 16:57

check handling of NULL keys in delete_rows

c40c19e

forgot to . build

7ff4861

test delete

e3cbd94

experiment with update

2e5f39a

radeusgd added 9 commits September 5, 2023 22:31

change IN to EXISTS

088dec4

update tests for NULL handling, add error

1094743

Merge branch 'develop' into wip/radeusgd/7238-table-delete-rows

ffe71f0

Merge branch 'develop' into wip/radeusgd/7238-table-delete-rows

f7078ce

checking for NULL keys

21db3a5

add workaround for oracle/graal#7359

4fb1f27

change order

a9c07b7

workaround for a bug I did not manage to trace

a6ed5bd

Merge branch 'develop' into wip/radeusgd/7238-table-delete-rows

dbcfdcf

radeusgd requested review from 4e6, JaroslavTulach, hubertp and Akirathan as code owners September 6, 2023 12:12

radeusgd requested a review from jdunkerley September 6, 2023 12:19

radeusgd commented Sep 6, 2023

View reviewed changes

JaroslavTulach approved these changes Sep 6, 2023

View reviewed changes

radeusgd added 2 commits September 6, 2023 15:18

tests: ensure no unexpected warnings occur

abccbfc

ensure naming strategy is used for sub-tables in distinct

906f7c1

jdunkerley approved these changes Sep 7, 2023

View reviewed changes

radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Sep 7, 2023

mergify bot merged commit 7d424bf into develop Sep 7, 2023
28 checks passed

mergify bot deleted the wip/radeusgd/7238-table-delete-rows branch September 7, 2023 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `Table.delete_rows`. #7709

Implement `Table.delete_rows`. #7709

radeusgd commented Aug 31, 2023 •

edited

Loading

jdunkerley Sep 4, 2023

radeusgd Sep 4, 2023

radeusgd Sep 4, 2023

GregoryTravis Sep 5, 2023

jdunkerley Sep 4, 2023

radeusgd Sep 4, 2023

radeusgd commented Sep 5, 2023

radeusgd Sep 6, 2023

radeusgd Sep 6, 2023

radeusgd Sep 6, 2023

radeusgd Sep 6, 2023

JaroslavTulach left a comment

		_ = [source_table, update_action, key_columns, error_on_missing_columns, on_problems]
		Error.throw (Illegal_Argument.Error "Table.update_rows modifies the underlying table, so it is only supported for Database tables - in-memory tables are immutable.")

Implement Table.delete_rows. #7709

Implement Table.delete_rows. #7709

Conversation

radeusgd commented Aug 31, 2023 • edited Loading

Pull Request Description

Important Notes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd commented Sep 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaroslavTulach left a comment

Choose a reason for hiding this comment

Implement `Table.delete_rows`. #7709

Implement `Table.delete_rows`. #7709

radeusgd commented Aug 31, 2023 •

edited

Loading