Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`filter_obs`: consider allowing multiple datasets at once #179

Closed
zachary-foster opened this issue Jul 23, 2018 · 0 comments
Closed

`filter_obs`: consider allowing multiple datasets at once #179

zachary-foster opened this issue Jul 23, 2018 · 0 comments

Comments

@zachary-foster
Copy link
Collaborator

@zachary-foster zachary-foster commented Jul 23, 2018

If I have an object like:

<Taxmap>
  629 taxa: aac. Bacteria ... azs. Nitrospinaceae
  629 edges: NA->aac, aac->aad, aac->aae ... anm->azr, ann->azs
  3 data sets:
    tax_data:
      # A tibble: 3,070,243 x 3
        taxon_id class                      input                    
        <chr>    <chr>                      <chr>                    
      1 ano      "=Root;rootrank;Bacteria;… "\tLineage=Root;rootrank…
      2 ano      "=Root;rootrank;Bacteria;… "\tLineage=Root;rootrank…
      3 ano      "=Root;rootrank;Bacteria;… "\tLineage=Root;rootrank…
      # ... with 3.07e+06 more rows
    class_data:
      # A tibble: 16,684,640 x 4
        taxon_id input_index taxon_name           rdp_rank
        <chr>          <int> <chr>                <chr>   
      1 aac                1 Bacteria             d       
      2 aad                1 "\"Actinobacteria\"" p       
      3 aca                1 Actinobacteria       c       
      # ... with 1.668e+07 more rows
    sequence:
      3070243 DNA sequences in binary format stored in a list.
      
      Mean sequence length: 1042.096 
         Shortest sequence: 400 
          Longest sequence: 2922 
      
      Labels:
      ano
      ano
      ano
      ano
      ano
      ano
      ...
      
      More than 10 million nucleotides: not printing base composition
  0 functions:

where there are multiple datasets with the same length and same taxon IDs, it would be nice to filter both at once.

filter_obs(rdp, c("tax_data", "sequence"), vapply(sequence, length, numeric(1)) >= min_seq_length, drop_taxa = TRUE)

instead of;

  long_enough <- vapply(rdp$data$sequence, length, numeric(1)) >= min_seq_length
  rdp <- filter_obs(rdp, "sequence", long_enough, drop_taxa = TRUE)
  rdp <- filter_obs(rdp, "tax_data", long_enough, drop_taxa = TRUE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.