-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve generation of long operation in presence of column name length limit #7556
Conversation
c1 = a + b | ||
Problems.assume_no_problems c1 | ||
# The column names should be truncated so that prefixes of both names and the operator all fit: | ||
c1.name . should_contain "+" | ||
c1.name . should_contain "..." | ||
c1.name . should_contain "aaa" | ||
c1.name . should_contain "bbb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Postgres run, this got the following generated name:
[aaaaaaaaaaaaaaaaaaaaaaaaaa... + [bbbbbbbbbbbbbbbbbbbbbbbbbb...
c2 = (a == b).iif b 0 | ||
Problems.assume_no_problems c2 | ||
c2.name . should_start_with "if [[aaa" | ||
c2.name . should_contain " then [bbb" | ||
c2.name . should_contain "bb... else 0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets
if [[aaaaaaaaaaaaaaaaaa... then [bbbbbbbbbbbbbbbbbbb... else 0
# We repeat the argument maaany times. | ||
args = Vector.new (max_column_name_length * 2) _-> b | ||
c3 = a.max args | ||
Problems.assume_no_problems c3 | ||
c3.name.should_start_with "max([aaa" | ||
c3.name.should_contain "..., " | ||
c3.name.should_contain ")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this gets
max([aaaa..., [bbbb..., [bbbb..., [bbbb..., [bbbb..., [bbbb...)
It's unclear from the form that there are more arguments than listed. But I think it is good enough - the user shall rename this operation anyway. It is also rather unlikely to add so many arguments. The name is supposed to just be a help, not definitely distinguish the operation being performed.
## Now having accounted for the parts that do not need truncation, | ||
we distribute the remaining space among the ones that do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all of course based on estimates and may very well still generate texts that do not fully use the space that is available for the name.
But that is not the point of this algorithm - the idea is to exploit the available space reasonably well to generate a name that +- shows what is the performed operation even if argument column names are large. We don't need it to be perfect.
# Just to be sure, we still truncate the end result. | ||
self.naming_properties.truncate new_joined max_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make this algorithm in such a way that the result will not exceed the maximum size. But I might have made a mistake. Since this is just a helper name that is not super important, instead of risking corruption by allowing it to be too long, and instead of making an assertion that will fail the script, we just ignore it and truncate it to be sure it fits.
separator = if add_spaces then " " else "" | ||
joined = texts.join separator | ||
case self.naming_properties.size_limit of | ||
Nothing -> joined | ||
max_size -> self.naming_properties.truncate joined max_size | ||
max_size -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good, comprehensive solution. But I wonder if, for very long column names, it might be better to just name it after the operation. Since this has to truncate pieces of the name, and even leave some out, it might wind up being more confusing to have partial information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Although I'm not sure if a column name +
is a good name. IMO [aaa... + [bbbb...
is a bit better still.
Also, such simple names will not be very distinct so if we add them back to the table, there will more often be collisions. Whereas this tries to come up with unique name as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is not urgent, so I'd wait for a second opinion from @jdunkerley once he's back and we can see :)
I'm happy with both, although slightly prefer the current one (but I'm also biased having written it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is reasonable - it should be a rare case hopefully as would expect users to rename columns as they are created.
Pull Request Description
I planned to do this as part of #7428, but I forgot. Making up for that now.
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.