|
| 1 | +deblur version 2021.09 |
| 2 | +====================== |
| 3 | + |
| 4 | +The deblur version 2021.09 addresses a bug with the fragment insertion parsing and |
| 5 | +cache that ignored some fragments for getting an accurate placement in the tree. In |
| 6 | +summary, in some occasions SEPP will return multiple fragments in a single entry; which |
| 7 | +was unexpected by the qp-deblur plugin parser, which assumed only one entry - the |
| 8 | +extra features will be seen as missing by the plugin and that information was |
| 9 | +sent and stored in the cache provided by Qiita, then propagated to future studies and |
| 10 | +meta-analyses. |
| 11 | + |
| 12 | +This bug was resolved in this `pull request <https://github.com/qiita-spots/qp-deblur/pull/60>`__. |
| 13 | + |
| 14 | +Sample counts implications |
| 15 | +-------------------------- |
| 16 | + |
| 17 | +At the time of writing Qiita had 978,052 16S deblured private or pubic samples. |
| 18 | +In the figure below, we have at different trimming lengths how samples we will recover |
| 19 | +based on the minimum number of sequences per sample - this is an important consideration |
| 20 | +as we normally need to remove samples below a given threshold for beta diversity |
| 21 | +calculations (via rarefactoin) or differential abundance testing. |
| 22 | + |
| 23 | +.. figure:: deblur2021.09_private_public.png |
| 24 | + :align: center |
| 25 | + |
| 26 | +A few conclusions from this plot: |
| 27 | + |
| 28 | +- The maximum number of samples that we will recover are 6,771 at `Trimming (length: 150)` |
| 29 | + and min_seqs of 1,500; which represents a 0.7% increment in private and public samples. |
| 30 | +- At all Trimming lengths the curve tends to go up at up and then down based on min_seq, |
| 31 | + which is a common trend seen in rarefacion plots |
| 32 | + |
| 33 | + |
| 34 | +Reaching out to affected study owners |
| 35 | +------------------------------------- |
| 36 | + |
| 37 | +As you saw in the previous section the effect of the missing fragments depends on the |
| 38 | +study, the trimming length and the minimal sequence count per sample selected. As a |
| 39 | +general rule of thumb, as a first analytical pass for meta-analysis for 16S data, we use |
| 40 | +5,000 sequences per sample and we prefer 150 base pair trimming. Thus, we directly |
| 41 | +contacted all study owners that would recover more than 5% of the samples in their study |
| 42 | +(total 24). |
0 commit comments