Skip to content

Commit

Permalink
Read all metadata columns when query parsing fails
Browse files Browse the repository at this point in the history
In the case that column names cannot be extracted from the query, read
all columns from the metadata. The behavior in this case is comparable
to the behavior prior to "filter: Read a subset of metadata columns"
(d0f36a1).
  • Loading branch information
victorlin committed Feb 16, 2024
1 parent 06a68d0 commit 59a91d9
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 8 deletions.
12 changes: 11 additions & 1 deletion augur/filter/io.py
Expand Up @@ -3,6 +3,7 @@
from argparse import Namespace
import os
import re
from textwrap import dedent
from typing import Sequence, Set
import numpy as np
from collections import defaultdict
Expand Down Expand Up @@ -65,7 +66,16 @@ def get_useful_metadata_columns(args: Namespace, id_column: str, all_columns: Se
# Attempt to automatically extract columns from the query.
variables = extract_variables(args.query)
if variables is None and not args.query_columns:
raise AugurError("Could not infer columns from the pandas query. If the query is valid, please specify columns using --query-columns.")
print_err(dedent(f"""\
WARNING: Could not infer columns from the pandas query. Reading all metadata columns,
which may impact execution time. If the query is valid, please open a new issue:
<https://github.com/nextstrain/augur/issues/new/choose>
and add the query to the description:
{args.query}"""))
columns.update(all_columns)
else:
columns.update(variables)

Expand Down
17 changes: 11 additions & 6 deletions tests/functional/filter/cram/filter-query-backtick-quoting.t
Expand Up @@ -12,16 +12,21 @@ Create metadata file for testing.
> SEQ_4
> ~~
The 'region name' column should be query-able by backtick quoting.
This does not currently work due to a bug.
The 'region name' column is query-able by backtick quoting.
$ ${AUGUR} filter \
> --metadata metadata.tsv \
> --query '(`region name` == "A")' \
> --output-strains filtered_strains.txt > /dev/null
ERROR: Could not infer columns from the pandas query. If the query is valid, please specify columns using --query-columns.
[2]
WARNING: Could not infer columns from the pandas query. Reading all metadata columns,
which may impact execution time. If the query is valid, please open a new issue:
<https://github.com/nextstrain/augur/issues/new/choose>
and add the query to the description:
(`region name` == "A")
$ sort filtered_strains.txt
sort: No such file or directory
[2]
SEQ_1
SEQ_2
12 changes: 11 additions & 1 deletion tests/functional/filter/cram/filter-query-errors.t
Expand Up @@ -41,5 +41,15 @@ However, other Pandas errors are not so helpful, so a link is provided for users
> --metadata "$TESTDIR/../data/metadata.tsv" \
> --query "some bad syntax" \
> --output-strains filtered_strains.txt > /dev/null
ERROR: Could not infer columns from the pandas query. If the query is valid, please specify columns using --query-columns.
WARNING: Could not infer columns from the pandas query. Reading all metadata columns,
which may impact execution time. If the query is valid, please open a new issue:

<https://github.com/nextstrain/augur/issues/new/choose>

and add the query to the description:

some bad syntax
ERROR: Internal Pandas error when applying query:
invalid syntax (<unknown>, line 1)
Ensure the syntax is valid per <https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-query>.
[2]

0 comments on commit 59a91d9

Please sign in to comment.