-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpretation of smudgy smudgeplot for diatom genome #108
Comments
Hi Kamil, Thanks so much for your detailed response! We've spent the last several months wrapping our heads around the data in lots of different ways. We're fairly sure that our Merqury results are consistent with diploid, and now we think we must just have some interesting transposon related duplication like you suggested. Soon, I will try to make some phylogenies from the repeats to try to get at this question. We may yet try to perform a haploid resolved assembly, but we are somewhat limited by sequencing depth and read length, so we've yet to see if this is viable. We also now think our genome is closer to 80% repetitive, which introduces further complications. These organisms are so fun and interesting, but sometimes difficult to work with! The shirt is beautiful! Thank you for sharing. The diatoms really are living art pieces. |
My group is working on sequencing the genome of a freshwater diatom, one member of a large class of unicellular algae. We have no previous knowledge or indication of its ploidy, though most published diatom genomes are diploid. This genome is more repetitive (>60%) and the assembly size (~500Mb) nearly twice as large as any previously published genome. This repetitiveness and large size of the assembly (in addition to genomescope evidence and individual inspection of k-mer alignments to our genome) made us think there might be some polyploidy at play.
We have been doing the assembly with both nanopore and PE150 illumna reads. Recently we have been running Genomescope and Smudgeplot on our Illumina reads.
Illumina data has been trimmed, deduplicated, and filtered for Q>20, k-mers counted with jellyfish then ran the following lines in smudgeplot. I have generated plots for a range of k-mers from 13-31 but here I will include plots for k=21.
Resulting log10 smudgeplot looks like this. No warnings.
Most smudgeplots for k >= 19 are proposed diploid. For smaller k and and if I plot with -q 0.9, I often get proposed tetraploid.
Here is the 21-mer genomescope:
Our interpretation of the genomescope (which we ran before smudgeplot) was that our organism is potentially tetraploid, possibly even hexaploid or octoploid when looking at the smaller, higher coverage bumps in the log plot. In addition, the log of the smudgeplot seems to my eyes to have smudges present at 6, 7 and 8N in the log plot that aren't detected and labeled. Could we be having a similar diploid prediction of a highly heterozygous polyploid genome, similar to the allohexaploid wheat example given in the paper?
Any advice on reconciling the genomescope and smudgeplot, or just rectifying misunderstandings we have about interpreting these plots, is much appreciated.
Thanks for for this tool and genomescope! It's been a great help
The text was updated successfully, but these errors were encountered: