OTU filtering #18

mlbendall · 2016-07-20T21:41:34Z

For those not at BU, we had a conversation today about how different analyses have different filtering requirements for the data. For example, you should not filter low-abundance OTUs for alpha diversity calculations, but there are other situations where you might want to filter for analysis or visualization. So we concluded:

The entire raw PathoID reports should be read in and stored
We need a general purpose function for filtering the data. For example, get only the top 10 OTUs, or get all OTUs that account for >1% of the data, or remove OTUs that are only present in one sample.
There will be intermediate layer that performs this filtering. Functions should assume that it is being handed a properly filtered object.

There are other details that need to be sorted out, such as how to track if users upload pre-filtered data, etc.

ecastron · 2016-07-20T23:14:44Z

The Santiago team agrees with this 100%!

I'd like to add that while pathoID writes a sorted .tsv file, it's sorted by Final Guess and sometimes you want it sorted by Final Read Numbers.
If we read the full pathoID output without any cutoffs, then in phyloseq you can easily get the top X by issuing something like:

top10 <- names(sort(taxa_sums(physeq), TRUE)[1:10])

Someone may want to define the top X by proportions instead of counts, in which case a transformation is needed:

physeq <- transform_sample_counts(physeq, function(x) x / sum(x) )

Regarding point 3, I think users should be warned to upload unfiltered results only, and let pathoStats decide when it's appropriate to filter.

BTW, @mlosada323 mentioned rarefaction for 16S data. That's also a oneliner in phyloseq:

physeq_rare<-rarefy_even_depth(physeq, sample.size =1000,replace=FALSE, rngseed=T);physeq_rare

Cheers,

Eduardo

PS: The alluvial plot is almost done! @Sanrrone

mlbendall · 2016-07-20T23:20:43Z

Wow looks nice @Sanrrone!

mlbendall · 2016-07-20T23:23:04Z

Can you make a remote branch and push up what you have currently? I'd like to look at how you are getting the sample condition.

Sanrrone · 2016-07-21T14:57:14Z

Im confusing about how remote branch works, I did make a pull request, is the same?, wherever, you can looks the change in my fork: https://github.com/Sanrrone/PathoStat

mlbendall · 2016-07-21T15:13:17Z

Oh, didn't know you were working on a fork.

Remote branch is in the same repository, while fork creates a new repository. There is currently debate about when to branch or fork, but it boils down to how closely you are involved with the original project and whether your changes will eventually be incorporated into the original project.

Just make sure to keep your fork in sync with master, and (ideally) merge the upstream master and test your code before making a pull request. Same goes for branches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTU filtering #18

OTU filtering #18

mlbendall commented Jul 20, 2016

ecastron commented Jul 20, 2016

mlbendall commented Jul 20, 2016

mlbendall commented Jul 20, 2016

Sanrrone commented Jul 21, 2016

mlbendall commented Jul 21, 2016

OTU filtering #18

OTU filtering #18

Comments

mlbendall commented Jul 20, 2016

ecastron commented Jul 20, 2016

mlbendall commented Jul 20, 2016

mlbendall commented Jul 20, 2016

Sanrrone commented Jul 21, 2016

mlbendall commented Jul 21, 2016