Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach a warning when Nothing is used in a Filter_Condition #8865

Merged
merged 14 commits into from
Jan 27, 2024

Conversation

GregoryTravis
Copy link
Contributor

@GregoryTravis GregoryTravis commented Jan 25, 2024

Pull Request Description

Attach a warning when Nothing is used as a value in a comparison or is_in Filter_Condition.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@GregoryTravis GregoryTravis marked this pull request as ready for review January 25, 2024 16:57
case values of
_ : Vector ->
values.fold action (flip (warn_on_nothing_in_comparison filter_condition))
_ -> action
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a column is passed in, shouldn't we also raise the warning if it contains nulls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the column is costly to compute, are we concerned about the overhead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the intention is that we wanted to detect explicit Nothing values used to construct Filter_Conditions, not the presence of Nothing in the data itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the intention is that we wanted to detect explicit Nothing values used to construct Filter_Conditions, not the presence of Nothing in the data itself.

Then I'm not sure if we should warn about the Vector case either.

I think we should be consistent, as otherwise we are building user expectations (I can expect a warning) and then breaking them (If I provide a Column instead of the Vector, there's no warning anymore)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Vector case is different. That is a direct argument to Filter_Condition.Is_In, and we are warning the user that they are explicitly using a value that cannot match anything. A Column is generally from the user's source-of-truth dataset, and could be very large and often contain Nothing.

My understanding is that we are providing a warning about what was explicitly passed in to a Filter_Condition constructor, not about what the user's dataset might contain.

In addition, scanning the Column for Nothing requires that it be computed, which could be costly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, from that point of view the current behaviour seems OK.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I'd like to test a few more edge cases.

Comment on lines 449 to 451
Test.specify "should not attach a warning when comparing with a column containing Nothing in a comparison or `is_in` `Filter_Condition`" <|
t = table_builder [["x", [1, 2, 3]], ["y", [1, Nothing, 2]]]
Problems.assume_no_problems (t.filter "x" (Filter_Condition.Equal (t.at "y")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this test has nothing to do with is_in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Problems.assume_no_problems (t.filter "x" (Filter_Condition.Equal [Nothing, Nothing]))
Problems.assume_no_problems (t.filter "x" (Filter_Condition.Is_In [[Nothing, Nothing]]))
Problems.assume_no_problems (t.filter "x" (Filter_Condition.Is_In [(t.at "y")]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Problems.assume_no_problems (t.filter "x" (Filter_Condition.Is_In [(t.at "y")]))
Problems.assume_no_problems (t.filter "x" (Filter_Condition.Is_In (t.at "y")))

I think this should go this way. A column inside of a vector is 'too deep' nesting - it would check equality of each value from x if it is equal to the whole column "y", that does not seem right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minor comments

@GregoryTravis GregoryTravis added CI: Keep up to date Automatically update this PR to the latest develop. CI: Ready to merge This PR is eligible for automatic merge labels Jan 26, 2024
@mergify mergify bot merged commit 644c9af into develop Jan 27, 2024
25 of 26 checks passed
@mergify mergify bot deleted the wip/gmt/8814-fc-null-warn branch January 27, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Keep up to date Automatically update this PR to the latest develop. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants