Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About repeat elements in the results of EDTA! #29

Closed
sunnycqcn opened this issue Nov 19, 2019 · 2 comments
Closed

About repeat elements in the results of EDTA! #29

sunnycqcn opened this issue Nov 19, 2019 · 2 comments
Labels
question Further information is requested

Comments

@sunnycqcn
Copy link

Hello,
I tried a small dataset and got the results as following:
Confusion matrix of BL06.R11.pilon.fasta.EDTA.TE.fa.stat for the all category
DNA/DTC DNA/DTH DNA/DTM LTR/Copia LTR/unknown MITE/DTM Misclas_rate
DNA/DTC 7 0 0 0 0 0 0.0000
DNA/DTH 0 1163 1 0 0 0 0.0009
DNA/DTM 0 0 7936 0 3 1 0.0005
LTR/Copia 0 0 0 259 0 0 0.0000
LTR/unknown 1 1 4 0 25193 1 0.0003
MITE/DTM 0 0 2 0 0 168 0.0118
So my question is that EDTA can analyze the repeat elments, such as AT-rich, GC-rich, short repeat elments, like (AT)n.
Thanks,
Fuyou

@oushujun
Copy link
Owner

oushujun commented Nov 19, 2019

Dear Fuyou,

You must be using a very standardized dataset. Your results are super good! The misclassification rate is even lower than the curated library (0.1%-2%).
To answer your question, no EDTA does not have the functionality to identify low complexity sequences or tandem repeats. You may use RepeatMasker to do so.

Best,
Shujun

@sunnycqcn
Copy link
Author

Thanks.
Fuyou

@oushujun oushujun added the question Further information is requested label Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants