Improve generation of long operation in presence of column name length limit #7556

radeusgd · 2023-08-10T16:10:37Z

Pull Request Description

I planned to do this as part of #7428, but I forgot. Making up for that now.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

radeusgd · 2023-08-10T16:11:10Z

test/Table_Tests/src/Database/Common/Names_Length_Limits_Spec.enso

+                c1 = a + b
+                Problems.assume_no_problems c1
+                # The column names should be truncated so that prefixes of both names and the operator all fit:
+                c1.name . should_contain "+"
+                c1.name . should_contain "..."
+                c1.name . should_contain "aaa"
+                c1.name . should_contain "bbb"


In Postgres run, this got the following generated name:

[aaaaaaaaaaaaaaaaaaaaaaaaaa... + [bbbbbbbbbbbbbbbbbbbbbbbbbb...

radeusgd · 2023-08-10T16:11:24Z

test/Table_Tests/src/Database/Common/Names_Length_Limits_Spec.enso

+                c2 = (a == b).iif b 0
+                Problems.assume_no_problems c2
+                c2.name . should_start_with "if [[aaa"
+                c2.name . should_contain " then [bbb"
+                c2.name . should_contain "bb... else 0"


This gets

if [[aaaaaaaaaaaaaaaaaa... then [bbbbbbbbbbbbbbbbbbb... else 0

radeusgd · 2023-08-10T16:12:18Z

test/Table_Tests/src/Database/Common/Names_Length_Limits_Spec.enso

+                # We repeat the argument maaany times.
+                args = Vector.new (max_column_name_length * 2) _-> b
+                c3 = a.max args
+                Problems.assume_no_problems c3
+                c3.name.should_start_with "max([aaa"
+                c3.name.should_contain "..., "
+                c3.name.should_contain ")"


And this gets

max([aaaa..., [bbbb..., [bbbb..., [bbbb..., [bbbb..., [bbbb...)

It's unclear from the form that there are more arguments than listed. But I think it is good enough - the user shall rename this operation anyway. It is also rather unlikely to add so many arguments. The name is supposed to just be a help, not definitely distinguish the operation being performed.

radeusgd · 2023-08-10T16:16:34Z

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Column_Naming_Helper.enso

+                    ## Now having accounted for the parts that do not need truncation,
+                       we distribute the remaining space among the ones that do.


This is all of course based on estimates and may very well still generate texts that do not fully use the space that is available for the name.

But that is not the point of this algorithm - the idea is to exploit the available space reasonably well to generate a name that +- shows what is the performed operation even if argument column names are large. We don't need it to be perfect.

radeusgd · 2023-08-10T16:17:43Z

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Column_Naming_Helper.enso

+                    # Just to be sure, we still truncate the end result.
+                    self.naming_properties.truncate new_joined max_size


I tried to make this algorithm in such a way that the result will not exceed the maximum size. But I might have made a mistake. Since this is just a helper name that is not super important, instead of risking corruption by allowing it to be too long, and instead of making an assertion that will fail the script, we just ignore it and truncate it to be sure it fits.

GregoryTravis · 2023-08-10T16:45:33Z

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Column_Naming_Helper.enso

        separator = if add_spaces then " " else ""
        joined = texts.join separator
        case self.naming_properties.size_limit of
            Nothing  -> joined
-            max_size -> self.naming_properties.truncate joined max_size
+            max_size ->


This is a good, comprehensive solution. But I wonder if, for very long column names, it might be better to just name it after the operation. Since this has to truncate pieces of the name, and even leave some out, it might wind up being more confusing to have partial information.

Good point.

Although I'm not sure if a column name + is a good name. IMO [aaa... + [bbbb... is a bit better still.

Also, such simple names will not be very distinct so if we add them back to the table, there will more often be collisions. Whereas this tries to come up with unique name as much as possible.

Well, this is not urgent, so I'd wait for a second opinion from @jdunkerley once he's back and we can see :)

I'm happy with both, although slightly prefer the current one (but I'm also biased having written it).

I think this approach is reasonable - it should be a rare case hopefully as would expect users to rename columns as they are created.

radeusgd added 6 commits August 10, 2023 17:50

add tests for better concat

e8d5f5d

implement better concat

8566f34

improving, fixing tests

103798a

remove debug prints

2d4bf4e

remove obsolete TODO

39e565b

explain the logic

b362dcb

radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Aug 10, 2023

radeusgd self-assigned this Aug 10, 2023

radeusgd requested review from jdunkerley and GregoryTravis as code owners August 10, 2023 16:10

radeusgd commented Aug 10, 2023

View reviewed changes

GregoryTravis approved these changes Aug 10, 2023

View reviewed changes

jdunkerley approved these changes Aug 14, 2023

View reviewed changes

jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Aug 14, 2023

mergify bot merged commit 8541a9e into develop Aug 14, 2023
28 checks passed

mergify bot deleted the wip/radeusgd/improve-column-name-concat branch August 14, 2023 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve generation of long operation in presence of column name length limit #7556

Improve generation of long operation in presence of column name length limit #7556

radeusgd commented Aug 10, 2023

radeusgd Aug 10, 2023

radeusgd Aug 10, 2023

radeusgd Aug 10, 2023

radeusgd Aug 10, 2023

radeusgd Aug 10, 2023

GregoryTravis Aug 10, 2023

radeusgd Aug 10, 2023

radeusgd Aug 10, 2023

jdunkerley Aug 14, 2023

		## Now having accounted for the parts that do not need truncation,
		we distribute the remaining space among the ones that do.

		# Just to be sure, we still truncate the end result.
		self.naming_properties.truncate new_joined max_size

Improve generation of long operation in presence of column name length limit #7556

Improve generation of long operation in presence of column name length limit #7556

Conversation

radeusgd commented Aug 10, 2023

Pull Request Description

Important Notes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment