Feature request: allow wildcard filtering based on assembly name #75

jdwinkler-lanzatech · 2022-08-21T23:15:54Z

Hi,

I was wondering if it would be possible to provide a filtering option based on assembly (species/assigned) name? I often want to pull a group of microbes with a general metabolic capabilities (say methanogenesis) but I have to manually pick out the TaxIDs currently to do so. Not a major problem, but the feature might be useful for other people too!

pirovc · 2022-08-22T12:32:50Z

Hi, thanks for the suggestion. genome_updater selects and filters data based on the assembly_summary.txt file provided by NCBI (more info https://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt). Besides the filter parameters, the -F option allow custom filtering for data selection. However, I'm not sure the information you refer to is contained in that file.

jdwinkler-lanzatech · 2022-08-22T13:43:42Z

Column 8 would be the target, I think. I believe right now the -F option is an exact match though, so I am thinking of another flag that basically uses grep behind the scenes to implement the matching. I'd basically want to grab all the assemblies with an organism name matching "methano*", if that makes sense. Obviously would not be perfect, but could be handy if you have a specific enough search string.

pirovc · 2022-08-24T11:44:24Z

Partial matching should be doable, will mark it as enhancement. For now one can download the full assembly_summary.txt from genbank or refseq and apply the filter/grep manually and use the resulting file as an external assembly_summary.txt (param. -e).

jdwinkler-lanzatech · 2022-08-24T13:08:10Z

Great, thanks! I figure it is a logical addition to the custom filtering offered by -F already.

pirovc added the question label Aug 22, 2022

pirovc added the enhancement label Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: allow wildcard filtering based on assembly name #75

Feature request: allow wildcard filtering based on assembly name #75

jdwinkler-lanzatech commented Aug 21, 2022

pirovc commented Aug 22, 2022

jdwinkler-lanzatech commented Aug 22, 2022

pirovc commented Aug 24, 2022

jdwinkler-lanzatech commented Aug 24, 2022

Feature request: allow wildcard filtering based on assembly name #75

Feature request: allow wildcard filtering based on assembly name #75

Comments

jdwinkler-lanzatech commented Aug 21, 2022

pirovc commented Aug 22, 2022

jdwinkler-lanzatech commented Aug 22, 2022

pirovc commented Aug 24, 2022

jdwinkler-lanzatech commented Aug 24, 2022