Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: Grouping by day works when it shouldn't #1069

Closed
victorlin opened this issue Oct 24, 2022 · 0 comments · Fixed by #1070
Closed

filter: Grouping by day works when it shouldn't #1069

victorlin opened this issue Oct 24, 2022 · 0 comments · Fixed by #1070
Assignees
Labels
bug Something isn't working

Comments

@victorlin
Copy link
Member

victorlin commented Oct 24, 2022

Current Behavior

--group-by month day works. It groups on the extracted day integer from the YYYY-MM-DD date string. It generally works when day is used with month and/or year (these trigger the creation of the day column).

Expected behavior

A warning saying day column was not found, and it should behave as --group-by month.

How to reproduce

cat >metadata.tsv <<~~
strain	date
SEQ1	2022-01-01
SEQ2	2022-01-01
SEQ3	2022-01-02
SEQ4	2022-01-03
SEQ5	2022-01-04
~~

augur filter \
   --metadata metadata.tsv \
   --group-by month day \
   --sequences-per-group 1 \
   --subsample-seed 0 \
   --output-metadata out.tsv

cat out.tsv
# strain	date
# SEQ1	2022-01-01
# SEQ3	2022-01-02
# SEQ4	2022-01-03
# SEQ5	2022-01-04

Possible solutions

  1. Formally enable --group-by day. This has been ruled out as impractical in filter: Reduce over-sampling in partial months with --group-by month #960 (comment).
  2. Disable --group-by day.
@victorlin victorlin added the bug Something isn't working label Oct 24, 2022
@victorlin victorlin self-assigned this Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant