Disable tests that fail because of locale-specific Postgres Unicode collation differences #8869

GregoryTravis · 2024-01-25T19:22:40Z

Pull Request Description

Disables two Order_By tests that fail when the Postgres Unicode collation locale is not en_GB.UTF8. Further research would be needed to figure out exactly how to handle locale-specific collation.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

GregoryTravis · 2024-01-25T19:23:28Z

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Postgres/Postgres_Connection.enso

+
+    ## PRVIATE
+       Returns the collation setting for the current database.
+    is_collation_en_gb_utf8 : Text


@radeusgd Should this method be added to Connection as well?

IMO this method should not be added here at all.

This is a detail of an assumption our tests are making. It's not something that should be part of our library, even as a PRIVATE helper, if we can avoid it.

IMO there is no reason to keep this logic inside of the library itself and not just put it in the tests.

I've moved it to the test. However, I can only run this check against Postgres -- what's the best way to find out if it's Postges, without modifying the _Connection classes?

We were usually checking the prefix which contains the name of the DB:

enso/test/Table_Tests/src/Common_Table_Operations/Filter_Spec.enso

Lines 98 to 105 in ad7fad4

# In PostgreSQL, NaN is greater than any other value, so it is > 10.0; in other implementations it is usually not greater nor smaller, so it gets filtered out.

t.filter "X" (Filter_Condition.Greater than=10.0) . at "ix" . to_vector . should_equal <|

if prefix.contains "PostgreSQL" . not then [1, 5] else [1, 4, 5]

# Similarly, PostgreSQL treats NaN==NaN

t.filter "X" (Filter_Condition.Equal to=Number.nan) . at "ix" . to_vector . should_equal <|

if prefix.contains "PostgreSQL" . not then [] else [4]

t.filter "X" (Filter_Condition.Equal to=Number.positive_infinity) . at "ix" . to_vector . should_equal [5]

GregoryTravis · 2024-01-25T19:24:34Z

test/Table_Tests/src/Common_Table_Operations/Main.enso

@@ -88,6 +88,8 @@ type Test_Selection
       - supports_unicode_normalization: Specifies if the backend compares
         strings taking Unicode Normalization into account, i.e. whether
         's\u0301' is considered equal to 'ś'.
+       - is_collation_en_gb_utf8: Specifies if the backend is running with the


Not a great name, but I'm not sure of the right way to handle this; I cannot reproduce the difference locally.

It feels like this is actually redundant with the order_by_unicode_normalization_by_default flag. Please check if we can avoid adding this flag at all.

I think there are two different settings here -- order_by_unicode_normalization_by_default is whether it can use normalization in ordering, and collation is about what the ordering is. I wasn't able to try different collations locally; the setting didn't seem to matter. I think this needs further exploration, but not for this test fix.

No, I don't agree.

That is exactly the same thing. This switches if the ordering uses normalization or not. I think there is only one valid ordering (for this example) if Unicode normalization is used. At least that's what I meant in this test by "order_by_unicode_normalization_by_default".

So the two options really are redundant for the purposes of these tests, there does not seem to be any point in keeping them separate.

Also note that this setting is only used in this 1 test.

radeusgd · 2024-01-25T21:24:19Z

test/Table_Tests/src/Database/Postgres_Spec.enso

@@ -521,7 +521,9 @@ run_tests connection db_name =

    Common_Spec.spec prefix connection

-    common_selection = Common_Table_Operations.Main.Test_Selection.Config supports_case_sensitive_columns=True order_by_unicode_normalization_by_default=True allows_mixed_type_comparisons=False fixed_length_text_columns=True removes_trailing_whitespace_casting_from_char_to_varchar=True supports_decimal_type=True supported_replace_params=supported_replace_params
+    is_collation_en_gb_utf8 = connection.is_collation_en_gb_utf8


IMO we should move the code checking for the collaction here instead of keeping it inside of the library.

Moreover, the code can be simplified, I'd suggest:

(connection.read "SELECT datcollate FROM pg_database WHERE datname = current_database()" . at "datcollate" . at 0) == "en_GB.UTF8"

I wonder if this is not too narrow check though - wondering what settings are on our CI DB?

Maybe the check should be (col != "C.UTF-8") && (col.contains "UTF8") instead?

I wonder if this is not too narrow check though - wondering what settings are on our CI DB?

Maybe the check should be (col != "C.UTF-8") && (col.contains "UTF8") instead?

Ideally if you can, please check, to ensure that this test does run on our CI. Otherwise it will be running only on my laptop and nowhere else 😅

I'd suggest adding an IO.println here and inspecting the logs of the CI workflow.

radeusgd

I really don't think we should put this collation check logic into our main library, it may just as well be kept inside of tests.

radeusgd · 2024-01-29T15:47:52Z

test/Table_Tests/src/Common_Table_Operations/Order_By_Spec.enso

@@ -34,6 +34,13 @@ type Data
            table_builder [col1, col2, col3, col4, col5, col6, col7, col8, col9, col10] connection=connection
        [connection, mk_table]

+    is_unexpected_collation self _setup =
+        IO.println "ABCDE "+(Meta.get_simple_type_name self.connection)
+        if Meta.get_simple_type_name self.connection != "Postgres_Connection" then False else


please try

Suggested change

if Meta.get_simple_type_name self.connection != "Postgres_Connection" then False else

if prefix.contains "Postgre" . not then False else

After discussion with @radeusgd, I am simply marking these tests pending. It will require more research to know how to test this properly.

GregoryTravis added 3 commits January 25, 2024 12:47

get collation

b3fd6c2

make it a test_selection

8228878

cleanup

1f2bd4e

GregoryTravis added the CI: No changelog needed Do not require a changelog entry for this PR. label Jan 25, 2024

GregoryTravis commented Jan 25, 2024

View reviewed changes

GregoryTravis marked this pull request as ready for review January 25, 2024 19:24

GregoryTravis requested review from jdunkerley, radeusgd and AdRiley as code owners January 25, 2024 19:24

radeusgd reviewed Jan 25, 2024

View reviewed changes

radeusgd requested changes Jan 25, 2024

View reviewed changes

GregoryTravis added 5 commits January 26, 2024 11:43

merge

72c3f99

move collation check into test

f71eb35

cleanup

29d9ef7

only check for postgres

3dd8392

Merge branch 'develop' into wip/gmt/8827-collation-fix

bf5d681

radeusgd reviewed Jan 29, 2024

View reviewed changes

disable collation-dependent tests

c9d0c73

GregoryTravis requested a review from radeusgd January 29, 2024 17:42

radeusgd approved these changes Jan 29, 2024

View reviewed changes

GregoryTravis added the CI: Ready to merge This PR is eligible for automatic merge label Jan 29, 2024

GregoryTravis linked an issue Jan 29, 2024 that may be closed by this pull request

Two Postgres unicode order_by tests fail locally on some machines #8827

Closed

GregoryTravis added 3 commits January 30, 2024 11:23

Merge branch 'develop' into wip/gmt/8827-collation-fix

418faa4

Merge branch 'develop' into wip/gmt/8827-collation-fix

15903e1

Merge branch 'develop' into wip/gmt/8827-collation-fix

af9fb42

GregoryTravis added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Feb 6, 2024

mergify bot merged commit 62cfa8a into develop Feb 7, 2024
29 checks passed

mergify bot deleted the wip/gmt/8827-collation-fix branch February 7, 2024 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable tests that fail because of locale-specific Postgres Unicode collation differences #8869

Disable tests that fail because of locale-specific Postgres Unicode collation differences #8869

GregoryTravis commented Jan 25, 2024 •

edited

Loading

GregoryTravis Jan 25, 2024

radeusgd Jan 25, 2024

GregoryTravis Jan 26, 2024

radeusgd Jan 29, 2024

GregoryTravis Jan 25, 2024

radeusgd Jan 25, 2024

GregoryTravis Jan 26, 2024

radeusgd Jan 26, 2024

radeusgd Jan 25, 2024

radeusgd Jan 25, 2024

radeusgd Jan 25, 2024

radeusgd left a comment

radeusgd Jan 29, 2024

GregoryTravis Jan 29, 2024

	# In PostgreSQL, NaN is greater than any other value, so it is > 10.0; in other implementations it is usually not greater nor smaller, so it gets filtered out.
	t.filter "X" (Filter_Condition.Greater than=10.0) . at "ix" . to_vector . should_equal <\|
	if prefix.contains "PostgreSQL" . not then [1, 5] else [1, 4, 5]

	# Similarly, PostgreSQL treats NaN==NaN
	t.filter "X" (Filter_Condition.Equal to=Number.nan) . at "ix" . to_vector . should_equal <\|
	if prefix.contains "PostgreSQL" . not then [] else [4]
	t.filter "X" (Filter_Condition.Equal to=Number.positive_infinity) . at "ix" . to_vector . should_equal [5]

	if Meta.get_simple_type_name self.connection != "Postgres_Connection" then False else
	if prefix.contains "Postgre" . not then False else

Disable tests that fail because of locale-specific Postgres Unicode collation differences #8869

Disable tests that fail because of locale-specific Postgres Unicode collation differences #8869

Conversation

GregoryTravis commented Jan 25, 2024 • edited Loading

Pull Request Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GregoryTravis commented Jan 25, 2024 •

edited

Loading