Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement auto_value_type operation #7908

Merged
merged 31 commits into from
Sep 27, 2023
Merged

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Sep 27, 2023

Pull Request Description

Closes #6113

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd self-assigned this Sep 27, 2023
@radeusgd
Copy link
Member Author

image

@radeusgd
Copy link
Member Author

image

to be done manually through `cast`).
- If all elements in a text column have the same length, the type
will become fixed length.
- Otherwise, if a text column is variable length, but all text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is length 255 treated specially here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently because it is usually the cutoff point where Databases will stop storing the strings 'in-place'.

That was the design discussed by @jdunkerley and @NedHarding.

I think the rationale is that we don't want to shrink the column length just to the maximum length of the longest entry because it may be quite arbitrary and in other runs the length may differ - thus if we e.g. create a Database column out of this, we could get too small a limit.

So instead, we see if the values are 'small' (i.e. fit in this 255 cutoff) and if so, we shrink the type to that and otherwise keep the type 'large' - i.e. unlimited.

The user can always manually choose whatever limit they want through cast.

if cols.distinct .getName . length != cols.length then
Panic.throw (Illegal_Argument.Error "Column names must be distinct.")

mismatched_size_column = cols.find if_missing=Nothing c->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there aren't going to be a lot of columns most of the time, I think it would be good to report here the lengths of all the columns so it's easier for the user to know which columns are wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO if we just list lengths of all columns it will be harder to find which ones fit and which ones do not, especially if there are >5 columns which is rather common.

IMO best to just show one that is wrong - then the user can fix it - if there are any more wrong ones - a new one will appear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just the lengths, but the names of the columns as well, so it wouldn't be confusing. If we only show one, and there are 10 columns with different lengths, that would take 9-10 runs to find all the problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but keep in mind that our IDE error message box is currently very small, so the bigger the error is the less readable it becomes.

I'm happy to revisit it at later point, e.g. when our error messages will be displayed larger. For now IMO this is already improving that we know at least one 'offending example', so I'd keep it as is.


If the user has just a few columns they will just see the sizes. If there's more of them, the message will be too long to easily see anyway - the user can always do columns.map .row_count or something like that.

Positive_Integer.from (that : Integer) = Positive_Integer.new that

## PRIVATE
Integer.from (that : Positive_Integer) = that.integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this pattern, I hope we use it more.

@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Sep 27, 2023
@mergify mergify bot merged commit c690559 into develop Sep 27, 2023
27 checks passed
@mergify mergify bot deleted the wip/radeusgd/6113-auto-value-type branch September 27, 2023 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Table.auto_value_types
3 participants