Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create database table from memory #6429

Merged
merged 23 commits into from
Apr 27, 2023

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Apr 26, 2023

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd linked an issue Apr 26, 2023 that may be closed by this pull request
@radeusgd radeusgd force-pushed the wip/radeusgd/6326-create-database-table-from-memory branch from 880cfdc to a2ef0a7 Compare April 26, 2023 12:46
@radeusgd radeusgd marked this pull request as ready for review April 26, 2023 12:46
Comment on lines 151 to 158
type Non_Unique_Primary_Key
## PRIVATE
Indicates that the columns selected for the primary key do not uniquely
identify rows in the table.

Arguments:
- primary_key: The primary key that is not unique.
# TODO [RW] should we include `duplicate_rows` here to show example ids of rows that are not unique?
Error (primary_key : Vector Text)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jdunkerley I wanted to figure out if we want to have these duplicate_rows.
It could make the user's life easier to show them indices of rows of one of the duplicated groups.

OTOH, the user can always compute these themselves and computing them is pretty costly for big tables.

Third thing is, when trying to compute them I realised it's actually not that easy. I'm thinking if we should add some methods that could help. I'm thinking of Aggregate_Column.As_Vector which could create a vector of all elements of a single group. Or something else that could help - as currently finding out duplicated rows is not that easy - unless I'm not seeing some cool trick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be very useful; I would think you could do it with a Vector.group_by on the rows, returning a Vector of Vectors, and show the inner vectors with > 1 row. Assign this to me if you like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I reported this as #6437 and we can handle it when we have a bit of spare time.

returning a Vector of Vectors

I guess that may require extending Aggregate_Column, but yeah that sounds like a good idea.

Comment on lines 151 to 158
type Non_Unique_Primary_Key
## PRIVATE
Indicates that the columns selected for the primary key do not uniquely
identify rows in the table.

Arguments:
- primary_key: The primary key that is not unique.
# TODO [RW] should we include `duplicate_rows` here to show example ids of rows that are not unique?
Error (primary_key : Vector Text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be very useful; I would think you could do it with a Vector.group_by on the rows, returning a Vector of Vectors, and show the inner vectors with > 1 row. Assign this to me if you like.

@radeusgd radeusgd force-pushed the wip/radeusgd/6326-create-database-table-from-memory branch from df2e655 to 79b9af6 Compare April 26, 2023 14:13
@radeusgd radeusgd force-pushed the wip/radeusgd/6326-create-database-table-from-memory branch 5 times, most recently from 9c6c854 to 4627742 Compare April 27, 2023 11:52
@radeusgd radeusgd self-assigned this Apr 27, 2023
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Apr 27, 2023
@radeusgd radeusgd force-pushed the wip/radeusgd/6326-create-database-table-from-memory branch from d73bdd8 to 4ea5fe9 Compare April 27, 2023 15:08
@radeusgd radeusgd force-pushed the wip/radeusgd/6326-create-database-table-from-memory branch from 4ea5fe9 to 2d398aa Compare April 27, 2023 19:17
@mergify mergify bot merged commit 462016a into develop Apr 27, 2023
@mergify mergify bot deleted the wip/radeusgd/6326-create-database-table-from-memory branch April 27, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add create_database_table to In-Memory table.
4 participants