-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invertebrate support #34
Comments
Hi Luohao, I tried LTR_retriever on fruitfly, mouse, micro- and mega- bats, and human, and it worked similarly as in plants, although most of these species have much less LTR content in their genomes. LAI requires a minimum of 5% total LTR and 0.1% intact LTR sequences present in the genome for the purpose of accurate evaluation, so you may need to check these two values. For classification of LTR superfamilies, LTR_retriever uses models trained from rice LTR classifications, so the same model may not be applicable to invertebrate genomes. However, the classification information is not the major factor to identify LTR elements. You may need to do the classification yourself based on the identified LTR elements. Best, |
Hi Shujun, Thanks for your email. In amphioxus it seems LTR content is less than 1%, that's might be the reason. On another note, LTR_retirever annotated 25.27% LTRs (according to the .tbl file) in a tilapia genome while the actual portion should be about 4%. I wonder if it has a lot false positives? The LAI score is also unexpectedly low for a Pacbio assembly: 3.01. Below is the script I used, would you have any suggestions for reducing false positives? `/apps/genometools/1.5.9/bin/gt suffixerator -db $genome -indexname gt_index/$g -suf -lcp -des -ssp -sds -dna /scratch/luohao/software/LTR_Finder/source/ltr_finder -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9 $genome > $g.finder.scn perl /scratch/luohao/software/mgescan-1.1/mgescan/ltr/find_ltr.pl -seq=$genome -min-ltr=100 -max-ltr=7000 -min_iden=90 /scratch/luohao/software/LTR_retriever-2.0/LTR_retriever -genome $g -nonTGCA $g.harvest.scn -inharvest $g.harvest.motif.scn -infinder $g.finder.scn -threads=20` Thanks! |
Hi Luohao, Thank you very Much! |
I did not actually use mgescan-1.1, as shujun suggested in some of the
threads.
…On Wed, 16 Jan 2019, 02:17 wangzhennan14 ***@***.*** wrote:
Hi Shujun,
Thanks for your email. In amphioxus it seems LTR content is less than 1%,
that's might be the reason.
On another note, LTR_retirever annotated 25.27% LTRs (according to the
.tbl file) in a tilapia genome while the actual portion should be about 4%.
I wonder if it has a lot false positives? The LAI score is also
unexpectedly low for a Pacbio assembly: 3.01. Below is the script I used,
would you have any suggestions for reducing false positives?
`/apps/genometools/1.5.9/bin/gt suffixerator -db $genome -indexname
gt_index/$g -suf -lcp -des -ssp -sds -dna
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr
7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 >
$g.harvest.scn
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr
7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 -motif
TGCA -motifmis 1 > $g.harvest.motif.scn
/scratch/luohao/software/LTR_Finder/source/ltr_finder -D 15000 -d 1000 -L
7000 -l 100 -p 20 -C -M 0.9 $genome > $g.finder.scn
perl /scratch/luohao/software/mgescan-1.1/mgescan/ltr/find_ltr.pl
-seq=$genome -min-ltr=100 -max-ltr=7000 -min_iden=90
/scratch/luohao/software/LTR_retriever-2.0/LTR_retriever -genome $g
-nonTGCA $g.harvest.scn -inharvest $g.harvest.motif.scn -infinder
$g.finder.scn -threads=20`
Thanks!
Hi Luohao,
Where did you download the mgescan-1.1? Can you give me the url? I have
download three mgescan packages, but all of them did not work.
Thank you very Much!
Zhennan
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#34 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE478S6-a58-Ii67-UGacsvMH1lh30pQks5vDn2OgaJpZM4ZpX_Y>
.
|
Sorry for delay response (somehow I thought I did). Your commands look good, but I have no idea about the total LTR content of amphioxus. If you suspect high proportions of false positives, you may manually curate a couple of them to verify (try NCBI blast and see what are they). If you do find some, please post example sequences here with 100bp extended on up- and downstreams, which would help to debug. If LTR content is too low, then LAI is not accurate. You may plot out regional LAI values in the *.LAI file to see if there is any uneven distribution. Using long reads is not a guarantee of assembly quality, which is also depended on a lot of things. Shujun |
Hi, thanks for getting back to me. Yes you replied before on the amphioxus
issue. However, I am no longer interested in amphioxus LTR since there are
not many anyway.
My second question (sorry for mixing up questions) was about a cichlid fish
(tilapia) which should have about 4% LTR. If you are interested in the
false positives, maybe you can download the genome from
https://www.ncbi.nlm.nih.gov/assembly/GCF_001858045.2 and test your
program? Sorry but at least for now I am not going to further analyze
LTR_retriever results at least for tilapias.
L
…On Wed, Jan 16, 2019 at 7:56 AM Shujun Ou ***@***.***> wrote:
@lurebgi <https://github.com/lurebgi>
Sorry for delay response (somehow I thought I did).
Your commands look good, but I have no idea about the total LTR content of
amphioxus. If you suspect high proportions of false positives, you may
manually curate a couple of them to verify (try NCBI blast and see what are
they). If you do find some, please post example sequences here with 100bp
extended on up- and downstreams, which would help to debug.
If LTR content is too low, then LAI is not accurate. You may plot out
regional LAI values in the *.LAI file to see if there is any uneven
distribution. Using long reads is not a guarantee of assembly quality,
which is also depended on a lot of things.
Shujun
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE478W8Z1m6nLLyCPwYHcwHQRT4z1f9yks5vDs0mgaJpZM4ZpX_Y>
.
|
@lurebgi I am curious how the 4% LTR in tilapia is estimated? |
by repeatmasker using a library from Repbase plus repeatModeler library.
This paper shows a similar result:
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3723-5
…On Tue, Jan 29, 2019 at 1:54 AM Shujun Ou ***@***.***> wrote:
@lurebgi <https://github.com/lurebgi> I am curious how the 4% LTR in
tilapia is estimated?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE478WrPTl4ROxcODRtXng57V4FHuuzOks5vH5vFgaJpZM4ZpX_Y>
.
|
@lurebgi Repbase is a database for known TEs. The sequence of LTR elements varies wildly between species, so using other species's LTR sequence to identify the tilapia LTR sequence should be an underestimate. RepeatModeler is a general method for TE identification. It has some attempts to classify TEs but also not accurate in our experience. RepeatModeler can work as a supplement after some good identifications, but Repbase is not a good approach for LTR finding. |
Thanks for the explanation. However, according to
https://www.nature.com/articles/nature13726, it is likely true that cichlid
fish (including tilapia) have a relatively low content of LTRs. That said,
it would be very interesting to note that LTR_retriever actually identified
many unannotated LTRs in cichlids.
…On Tue, Jan 29, 2019 at 3:42 PM Shujun Ou ***@***.***> wrote:
@lurebgi <https://github.com/lurebgi> Repbase is a database for known
TEs. The sequence of LTR elements varies wildly between species, so using
other species's LTR sequence to identify the tilapia LTR sequence should be
an underestimate. RepeatModeler is a general method for TE identification.
It has some attempts to classify TEs but also not accurate in our
experience. RepeatModeler can work as a supplement after some good
identifications, but Repbase is not a good approach for LTR finding.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE478WCeJeqEJSBXjUqA5D8F2hfw_pFzks5vIF2-gaJpZM4ZpX_Y>
.
|
@lurebgi Thanks for sharing the paper. I read the method section. TE annotations were based on RepeatModeler or RepeatScout, so this is kind of a loop. Since both methods are copy-number based, low copy number TEs will be missed out. You may try to figure what new elements are annotated by LTR_retriever. I'll be happy to see how it works/fails. |
Hi,
I was wondering if LTR_retriever supports invertebrate genomes. We have an amphioxus genome derived from 60X Pacbio sequencing, however, it shows the LAI score is only 7.07. Moreover, all of the 206 LTRs in LTRlib.fa were classified as 'Unknown'. Does this look normal to you?
Thank you!
Luohao
The text was updated successfully, but these errors were encountered: