Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable tests that fail because of locale-specific Postgres Unicode collation differences #8869

Merged
merged 12 commits into from
Feb 7, 2024

Conversation

GregoryTravis
Copy link
Contributor

@GregoryTravis GregoryTravis commented Jan 25, 2024

Pull Request Description

Disables two Order_By tests that fail when the Postgres Unicode collation locale is not en_GB.UTF8. Further research would be needed to figure out exactly how to handle locale-specific collation.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@GregoryTravis GregoryTravis added the CI: No changelog needed Do not require a changelog entry for this PR. label Jan 25, 2024

## PRVIATE
Returns the collation setting for the current database.
is_collation_en_gb_utf8 : Text
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radeusgd Should this method be added to Connection as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this method should not be added here at all.

This is a detail of an assumption our tests are making. It's not something that should be part of our library, even as a PRIVATE helper, if we can avoid it.

IMO there is no reason to keep this logic inside of the library itself and not just put it in the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it to the test. However, I can only run this check against Postgres -- what's the best way to find out if it's Postges, without modifying the _Connection classes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were usually checking the prefix which contains the name of the DB:

# In PostgreSQL, NaN is greater than any other value, so it is > 10.0; in other implementations it is usually not greater nor smaller, so it gets filtered out.
t.filter "X" (Filter_Condition.Greater than=10.0) . at "ix" . to_vector . should_equal <|
if prefix.contains "PostgreSQL" . not then [1, 5] else [1, 4, 5]
# Similarly, PostgreSQL treats NaN==NaN
t.filter "X" (Filter_Condition.Equal to=Number.nan) . at "ix" . to_vector . should_equal <|
if prefix.contains "PostgreSQL" . not then [] else [4]
t.filter "X" (Filter_Condition.Equal to=Number.positive_infinity) . at "ix" . to_vector . should_equal [5]

@@ -88,6 +88,8 @@ type Test_Selection
- supports_unicode_normalization: Specifies if the backend compares
strings taking Unicode Normalization into account, i.e. whether
's\u0301' is considered equal to 'ś'.
- is_collation_en_gb_utf8: Specifies if the backend is running with the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a great name, but I'm not sure of the right way to handle this; I cannot reproduce the difference locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this is actually redundant with the order_by_unicode_normalization_by_default flag. Please check if we can avoid adding this flag at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two different settings here -- order_by_unicode_normalization_by_default is whether it can use normalization in ordering, and collation is about what the ordering is. I wasn't able to try different collations locally; the setting didn't seem to matter. I think this needs further exploration, but not for this test fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't agree.

That is exactly the same thing. This switches if the ordering uses normalization or not. I think there is only one valid ordering (for this example) if Unicode normalization is used. At least that's what I meant in this test by "order_by_unicode_normalization_by_default".

So the two options really are redundant for the purposes of these tests, there does not seem to be any point in keeping them separate.

Also note that this setting is only used in this 1 test.

@GregoryTravis GregoryTravis marked this pull request as ready for review January 25, 2024 19:24
@@ -521,7 +521,9 @@ run_tests connection db_name =

Common_Spec.spec prefix connection

common_selection = Common_Table_Operations.Main.Test_Selection.Config supports_case_sensitive_columns=True order_by_unicode_normalization_by_default=True allows_mixed_type_comparisons=False fixed_length_text_columns=True removes_trailing_whitespace_casting_from_char_to_varchar=True supports_decimal_type=True supported_replace_params=supported_replace_params
is_collation_en_gb_utf8 = connection.is_collation_en_gb_utf8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should move the code checking for the collaction here instead of keeping it inside of the library.

Moreover, the code can be simplified, I'd suggest:

(connection.read "SELECT datcollate FROM pg_database WHERE datname = current_database()" . at "datcollate" . at 0) == "en_GB.UTF8"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is not too narrow check though - wondering what settings are on our CI DB?

Maybe the check should be (col != "C.UTF-8") && (col.contains "UTF8") instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is not too narrow check though - wondering what settings are on our CI DB?

Maybe the check should be (col != "C.UTF-8") && (col.contains "UTF8") instead?

Ideally if you can, please check, to ensure that this test does run on our CI. Otherwise it will be running only on my laptop and nowhere else 😅

I'd suggest adding an IO.println here and inspecting the logs of the CI workflow.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think we should put this collation check logic into our main library, it may just as well be kept inside of tests.

@@ -34,6 +34,13 @@ type Data
table_builder [col1, col2, col3, col4, col5, col6, col7, col8, col9, col10] connection=connection
[connection, mk_table]

is_unexpected_collation self _setup =
IO.println "ABCDE "+(Meta.get_simple_type_name self.connection)
if Meta.get_simple_type_name self.connection != "Postgres_Connection" then False else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please try

Suggested change
if Meta.get_simple_type_name self.connection != "Postgres_Connection" then False else
if prefix.contains "Postgre" . not then False else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @radeusgd, I am simply marking these tests pending. It will require more research to know how to test this properly.

@GregoryTravis GregoryTravis added the CI: Ready to merge This PR is eligible for automatic merge label Jan 29, 2024
@GregoryTravis GregoryTravis added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Feb 6, 2024
@mergify mergify bot merged commit 62cfa8a into develop Feb 7, 2024
29 checks passed
@mergify mergify bot deleted the wip/gmt/8827-collation-fix branch February 7, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Two Postgres unicode order_by tests fail locally on some machines
2 participants