Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: Restore automatic boolean conversion #1410

Merged
merged 10 commits into from Feb 12, 2024

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Feb 9, 2024

Description of proposed changes

Boolean conversion was not considered when automatic nullable numeric conversion was applied¹, but it continued to work because augur filter relied on pandas.read_csv's automatic type inference up until it was disabled in favor of reading all columns as string².

A note to include boolean was added³ but removed inadvertently in another big change⁴.

¹ b325b97: Try converting all columns to numerical type
² 9f9be3a: Read all metadata as string type
³ 725e1b4: Expand comment on numeric conversion
b0a0d11: Add --query-columns option

Related issue(s)

Checklist

  • Tests added
  • Checks pass
  • If making user-facing changes, add a message in CHANGES.md summarizing the changes in this PR

@victorlin victorlin self-assigned this Feb 9, 2024
Copy link

codecov bot commented Feb 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e4353b4) 67.38% compared to head (d44506d) 67.46%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1410      +/-   ##
==========================================
+ Coverage   67.38%   67.46%   +0.08%     
==========================================
  Files          69       69              
  Lines        7465     7484      +19     
  Branches     1836     1840       +4     
==========================================
+ Hits         5030     5049      +19     
  Misses       2161     2161              
  Partials      274      274              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@victorlin victorlin force-pushed the victorlin/fix-filter-query-bool branch from a3fada4 to df73174 Compare February 9, 2024 20:02
Use one comment to explain both the example file and command. This will
make the file more readable with multiple commands.
Previous wording was wordy. Make it more concise.
@victorlin victorlin force-pushed the victorlin/fix-filter-query-bool branch from df73174 to 6765b8b Compare February 9, 2024 22:05
@victorlin victorlin marked this pull request as ready for review February 9, 2024 22:13
@victorlin victorlin requested a review from a team February 9, 2024 22:13
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @victorlin!

I think there's a potential bug if user specifies --query-column with a numeric type. Otherwise, changes look good to me.


Since filter can now automatically infer boolean types, should bool be added to the ACCEPTED_TYPES for the --query-columns flag? Maybe outside of the scope of this PR, but just wondering if there are plans to add it later.

augur/filter/include_exclude_rules.py Show resolved Hide resolved
Automatic conversion is applied by default. Reserve the 'numeric' type
for automatic conversions and keep the accepted values to
--query-columns strict (for numeric, either 'int' or 'float').
Boolean conversion was not considered when automatic nullable numeric
conversion was applied¹, but it continued to work because augur filter
relied on pandas.read_csv's automatic type inference up until it was
disabled in favor of reading all columns as string².

A note to include boolean was added³ but removed inadvertently in
another big change⁴.

¹ b325b97: Try converting all columns to numerical type
² 9f9be3a: Read all metadata as string type
³ 725e1b4: Expand comment on numeric conversion
⁴ b0a0d11: Add --query-columns option
This was not previously supported by pandas.read_csv's built-in type
inference, but it aligns with the existing support for nullable numeric
columns.
@victorlin victorlin merged commit 499f0e9 into master Feb 12, 2024
20 checks passed
@victorlin victorlin deleted the victorlin/fix-filter-query-bool branch February 12, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

filter: Automatic type inference does not work on boolean columns
2 participants