Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Table.delete_rows. #7709

Merged
merged 40 commits into from
Sep 7, 2023
Merged

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Aug 31, 2023

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd self-assigned this Aug 31, 2023
test/Table_Tests/src/Database/Upload_Spec.enso Outdated Show resolved Hide resolved
test/Table_Tests/src/Database/Upload_Spec.enso Outdated Show resolved Hide resolved
Comment on lines +125 to +126
_ = [source_table, update_action, key_columns, error_on_missing_columns, on_problems]
Error.throw (Illegal_Argument.Error "Table.update_rows modifies the underlying table, so it is only supported for Database tables - in-memory tables are immutable.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want this behavior to work.

It's effectively a find and replace operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a good idea. I appreciate a potential usefulness of such operation, but the practical API is vastly different.

In the Database, what we do is we mutably modify the target table (and return it back). This is a side-effecting operation. From now on, all references to the table will have updated data.

In in-memory, Tables are immutable, so all this would do is return a new table with modified contents. A bit like DB dry-run (but currently our dry-run returns the original table unchanged - so even here this does not match).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aim for our APIs to be compatible and mostly interchangeable between DB and in-memory, of course to the extent possible.

I'm afraid that introducing an operation which behaves in a similar but fundamentally different and subtle way is asking for trouble - when switching between in-memory and Database the users may not notice the subtle differences and get bad results.


I'm not completely sure. To be honest, the difference is indeed rather small and it feels like in practice in many workflows it will not be felt. So it seems like this could work.

But I think it can introduce hard to understand errors (with the unexpected behaviour change).

IMHO it is better to have a clear distinction between side-effecting WRITE operations (like this one, or writing to file etc. that update some _external state), and normal 'immutable' transformations creating a new modified data structure (which this would have to become for the in-memory approach to work).

If we really want the 'search and replace' behaviour, maybe we should introduce a separate method, that returns a new modified table with these changes applied? We could then implement it both for in-memory and Database and ensure that in both cases they do not modify the original table, but just return a new one (i.e. in Database it would be a modified query which does the replace, essentially a kind of a VIEW; I guess it may not be most efficient in that backend, but at least would be consistent).

What do you think? IMO creating a single operation that for one backend is a side-effecting WRITE and for another is an immutable TRANSFORM is risky. If anything, I'd simply create a separate operation for the TRANSFORM scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit extreme, but we could append _write to every method that writes to the database, so it would always be clear.

@@ -457,6 +457,8 @@ generate_query dialect query = case query of
Query.Drop_Table name if_exists ->
maybe_if_exists = if if_exists then Builder.code "IF EXISTS " else Builder.empty
Builder.code "DROP TABLE " ++ maybe_if_exists ++ dialect.wrap_identifier name
Query.Truncate_Table name ->
Builder.code "DELETE FROM " ++ dialect.wrap_identifier name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not TRUNCATE TABLE ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not available in SQLite. I guess I could try to do it in Postgres.

@radeusgd
Copy link
Member Author

radeusgd commented Sep 5, 2023

I checked the following tests to see how presence of NULLs affects update_rows:

        Test.specify "TODO 1" <|
            t1 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s1 = source_table_builder [["X", [2]], ["Y", ['zzz']]]
            r1 = t1.update_rows s1 key_columns=["X"] update_action=Update_Action.Align_Records
            m1 = r1.read . order_by ["X"]
            m1.at "X" . to_vector . should_equal [2]
            m1.at "Y" . to_vector . should_equal ['zzz']

        Test.specify "TODO 1.5" <|
            t2 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s2 = source_table_builder [["X", [2]], ["Y", ['zzz']]]
            r2 = t2.update_rows s2 key_columns=["X"] update_action=Update_Action.Update_Or_Insert
            m2 = r2.read . order_by ["X"]
            m2.at "X" . to_vector . should_equal [Nothing, 1, 2, 3]
            m2.at "Y" . to_vector . should_equal ['d', 'a', 'zzz', 'c']

        Test.specify "TODO 2" <|
            t1 = target_table_builder [["X", [1, 2, 3, 4]], ["Y", ['a', 'b', 'c', 'd']]]
            s1 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r1 = t1.update_rows s1 key_columns=["X"] update_action=Update_Action.Align_Records
            m1 = r1.read . order_by ["X"]
            m1.at "X" . to_vector . should_equal [Nothing, 2]
            m1.at "Y" . to_vector . should_equal ['www', 'zzz']

        Test.specify "TODO 2.5" <|
            t2 = target_table_builder [["X", [1, 2, 3, 4]], ["Y", ['a', 'b', 'c', 'd']]]
            s2 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r2 = t2.update_rows s2 key_columns=["X"] update_action=Update_Action.Update_Or_Insert
            m2 = r2.read . order_by ["X"]
            m2.at "X" . to_vector . should_equal [Nothing, 1, 2, 3, 4]
            m2.at "Y" . to_vector . should_equal ['www', 'a', 'zzz', 'c', 'd']

        Test.specify "TODO 3" <|
            t1 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s1 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r1 = t1.update_rows s1 key_columns=["X"] update_action=Update_Action.Align_Records
            m1 = r1.read . order_by ["X"]
            m1.at "X" . to_vector . should_equal [Nothing, 2]
            m1.at "Y" . to_vector . should_equal ['d','zzz']

        Test.specify "TODO 3.5" <|
            t2 = target_table_builder [["X", [1, 2, 3, Nothing]], ["Y", ['a', 'b', 'c', 'd']]]
            s2 = source_table_builder [["X", [2, Nothing]], ["Y", ['zzz', 'www']]]
            r2 = t2.update_rows s2 key_columns=["X"] update_action=Update_Action.Update_Or_Insert
            m2 = r2.read . order_by ["X"]
            m2.at "X" . to_vector . should_equal [Nothing, 1, 2, 3]
            m2.at "Y" . to_vector . should_equal ['www', 'a', 'zzz', 'c']

The results are:

    - [FAILED] TODO 1 [71ms]
        Reason: [Nothing, 2] did not equal [2]; lengths differ (2 != 1)  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:675:13-52).
    - TODO 1.5 [78ms]
    - [FAILED] TODO 2 [72ms]
        Reason: [2, 3] did not equal [Nothing, 2]; first difference at index 0  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:691:13-61).
    - [FAILED] TODO 2.5 [70ms]
        Reason: [1, 2, 3, 4] did not equal [Nothing, 1, 2, 3, 4]; lengths differ (4 != 5)  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:699:13-70).
    - [FAILED] TODO 3 [89ms]
        Reason: [Nothing, 2, 3] did not equal [Nothing, 2]; lengths differ (3 != 2)  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:707:13-61).
    - [FAILED] TODO 3.5 [63ms]
        Reason: ['d', 'a', 'zzz', 'www'] did not equal ['www', 'a', 'zzz', 'c']; first difference at index 0  (at C:\NBO\enso\test\Table_Tests\src\Database\Upload_Spec.enso:716:13-73).

We can see that Align_Records behaves in unexpected ways and so does Upsert if the source table contains NULL values. That's why at least for now I plan to error if source contains null keys.

Comment on lines +28 to +34
// Workaround for https://github.com/oracle/graal/issues/7359
if (!functionsLibrary.hasExecutableName(function)) {
err.enter();
Builtins builtins = EnsoContext.get(this).getBuiltins();
throw new PanicException(
builtins.error().makeTypeError(builtins.function(), function, "function"), this);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a, possibly temporary, workaround for oracle/graal#7359

Essentially, it is possible that getExecutableName will not throw but still return null, so a compiler error on null will crash the interpreter.

Comment on lines -59 to +66
builder.append ('\n <error message="' + escaped_message + '">\n')
builder.append ('\n <failure message="' + escaped_message + '">\n')
if details.is_nothing.not then
## We duplicate the message, because sometimes the
attribute is skipped if the node has any content.
builder.append (escape_xml msg)
builder.append '\n'
builder.append (escape_xml details)
builder.append '</error>\n'
builder.append '</failure>\n'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing visualizations of JUnit XML and it seems that failure tag is more appropriate here.

Comment on lines +97 to +99
name = Panic.catch Type_Error (Polyglot.get_executable_name el) _->
# Workaround for a bug where some stack frames' root nodes do not have a name.
"<unknown node>"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was sometimes failing, around SQL_Type_Reference.get, do_fetch and Connection.fetch_columns. Whenever there were warnings attached to the query result, one entry in the stack trace had rootNode.getName() == null and it was crashing the stack trace operation.

I think we better have a stack trace with an <unknown node> entry, than prevent adding warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd like to figure out why rootNode.getName() == null in the first place, but I could not easily find a minimal example.

@JaroslavTulach any ideas?

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see a fix proposed to GraalVM: oracle/graal#7360

@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Sep 7, 2023
@mergify mergify bot merged commit 7d424bf into develop Sep 7, 2023
28 checks passed
@mergify mergify bot deleted the wip/radeusgd/7238-table-delete-rows branch September 7, 2023 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Table.delete_rows function allowing deleting rows in a database table.
4 participants