-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement auto_value_type
operation
#7908
Conversation
to be done manually through `cast`). | ||
- If all elements in a text column have the same length, the type | ||
will become fixed length. | ||
- Otherwise, if a text column is variable length, but all text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is length 255 treated specially here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently because it is usually the cutoff point where Databases will stop storing the strings 'in-place'.
That was the design discussed by @jdunkerley and @NedHarding.
I think the rationale is that we don't want to shrink the column length just to the maximum length of the longest entry because it may be quite arbitrary and in other runs the length may differ - thus if we e.g. create a Database column out of this, we could get too small a limit.
So instead, we see if the values are 'small' (i.e. fit in this 255 cutoff) and if so, we shrink the type to that and otherwise keep the type 'large' - i.e. unlimited.
The user can always manually choose whatever limit they want through cast
.
if cols.distinct .getName . length != cols.length then | ||
Panic.throw (Illegal_Argument.Error "Column names must be distinct.") | ||
|
||
mismatched_size_column = cols.find if_missing=Nothing c-> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there aren't going to be a lot of columns most of the time, I think it would be good to report here the lengths of all the columns so it's easier for the user to know which columns are wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if we just list lengths of all columns it will be harder to find which ones fit and which ones do not, especially if there are >5 columns which is rather common.
IMO best to just show one that is wrong - then the user can fix it - if there are any more wrong ones - a new one will appear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just the lengths, but the names of the columns as well, so it wouldn't be confusing. If we only show one, and there are 10 columns with different lengths, that would take 9-10 runs to find all the problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but keep in mind that our IDE error message box is currently very small, so the bigger the error is the less readable it becomes.
I'm happy to revisit it at later point, e.g. when our error messages will be displayed larger. For now IMO this is already improving that we know at least one 'offending example', so I'd keep it as is.
If the user has just a few columns they will just see the sizes. If there's more of them, the message will be too long to easily see anyway - the user can always do columns.map .row_count
or something like that.
Positive_Integer.from (that : Integer) = Positive_Integer.new that | ||
|
||
## PRIVATE | ||
Integer.from (that : Positive_Integer) = that.integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this pattern, I hope we use it more.
Pull Request Description
Closes #6113
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.