Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve generation of long operation in presence of column name length limit #7556

Merged
merged 6 commits into from
Aug 14, 2023

Conversation

radeusgd
Copy link
Member

Pull Request Description

I planned to do this as part of #7428, but I forgot. Making up for that now.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Aug 10, 2023
@radeusgd radeusgd self-assigned this Aug 10, 2023
Comment on lines +327 to +333
c1 = a + b
Problems.assume_no_problems c1
# The column names should be truncated so that prefixes of both names and the operator all fit:
c1.name . should_contain "+"
c1.name . should_contain "..."
c1.name . should_contain "aaa"
c1.name . should_contain "bbb"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Postgres run, this got the following generated name:

[aaaaaaaaaaaaaaaaaaaaaaaaaa... + [bbbbbbbbbbbbbbbbbbbbbbbbbb...

Comment on lines +335 to +339
c2 = (a == b).iif b 0
Problems.assume_no_problems c2
c2.name . should_start_with "if [[aaa"
c2.name . should_contain " then [bbb"
c2.name . should_contain "bb... else 0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets

if [[aaaaaaaaaaaaaaaaaa... then [bbbbbbbbbbbbbbbbbbb... else 0

Comment on lines +341 to +347
# We repeat the argument maaany times.
args = Vector.new (max_column_name_length * 2) _-> b
c3 = a.max args
Problems.assume_no_problems c3
c3.name.should_start_with "max([aaa"
c3.name.should_contain "..., "
c3.name.should_contain ")"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this gets

max([aaaa..., [bbbb..., [bbbb..., [bbbb..., [bbbb..., [bbbb...)

It's unclear from the form that there are more arguments than listed. But I think it is good enough - the user shall rename this operation anyway. It is also rather unlikely to add so many arguments. The name is supposed to just be a help, not definitely distinguish the operation being performed.

Comment on lines +148 to +149
## Now having accounted for the parts that do not need truncation,
we distribute the remaining space among the ones that do.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all of course based on estimates and may very well still generate texts that do not fully use the space that is available for the name.

But that is not the point of this algorithm - the idea is to exploit the available space reasonably well to generate a name that +- shows what is the performed operation even if argument column names are large. We don't need it to be perfect.

Comment on lines +164 to +165
# Just to be sure, we still truncate the end result.
self.naming_properties.truncate new_joined max_size
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this algorithm in such a way that the result will not exceed the maximum size. But I might have made a mistake. Since this is just a helper name that is not super important, instead of risking corruption by allowing it to be too long, and instead of making an assertion that will fail the script, we just ignore it and truncate it to be sure it fits.

separator = if add_spaces then " " else ""
joined = texts.join separator
case self.naming_properties.size_limit of
Nothing -> joined
max_size -> self.naming_properties.truncate joined max_size
max_size ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good, comprehensive solution. But I wonder if, for very long column names, it might be better to just name it after the operation. Since this has to truncate pieces of the name, and even leave some out, it might wind up being more confusing to have partial information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Although I'm not sure if a column name + is a good name. IMO [aaa... + [bbbb... is a bit better still.

Also, such simple names will not be very distinct so if we add them back to the table, there will more often be collisions. Whereas this tries to come up with unique name as much as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is not urgent, so I'd wait for a second opinion from @jdunkerley once he's back and we can see :)

I'm happy with both, although slightly prefer the current one (but I'm also biased having written it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is reasonable - it should be a rare case hopefully as would expect users to rename columns as they are created.

@jdunkerley jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Aug 14, 2023
@mergify mergify bot merged commit 8541a9e into develop Aug 14, 2023
28 checks passed
@mergify mergify bot deleted the wip/radeusgd/improve-column-name-concat branch August 14, 2023 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants