-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AT trim defaults parameters #26
Comments
As first guess, use parameters from Lucy From: David Streett notifications@github.com Hey, @samhunter Specific to AT trim - what should the default values be for min trim length and number of mismatch? In general, min accepted length default? All trimming algorithms will also have parameters for stranded, 3' trim, 5' trim. Anything I am missing? We can run test later to actually get optimal values, but is there a decent first guess? Thank you! — |
By "min trim length' do you mean the minimum number of bases to trim or the minimum size that is kept after trimming? I think Lucy is using a 10bp sliding window and continues to slide the window until 3 mismatches are encountered? At a quick glance I don't see any rational for this strategy, and it seems like we could be a little more sensitive for short bits of poly A/T if we used a sliding window 5bp allowing for 2 mismatches, moving from either end of the sequence towards the center. Either way should produce sequences that bwa mem will map/trim I would guess? This is the Lucy strategy: Poly-A/T tail removal |
Id still say that is a starting place, then modify have no idea on Matt On Sep 6, 2016 12:13 PM, "Sam Hunter" notifications@github.com wrote:
|
Hello, again, @msettles ! So, for sickle reboot and poly AT tail remover, I wasn't planning on doing a sliding window. I was just planning on doing a simple loop starting at both ends for both of these algorithms. Any reason we should keep the sliding window? Thanks! |
Don’t know actually!, But in talking with @shunter just now, I may have the perfect dataset (SE100) to test with, it is mouse and 5’ biased, meaning there should be A LOT of differing length polyA/T tails. I think we can use mapping result, and how say BWA mem soft clips the right side of the read to validate and tune with. Matt From: David Streett notifications@github.com Hello, again, @msettles ! So, for sickle reboot and poly AT tail remover, I wasn't planning on doing a sliding window. I was just planning on doing a simple loop starting at both ends for both of these algorithms. Any reason we should keep the sliding window? Thanks! — |
There must be some information on whether poly-A trimming impacts analysis? I'm not sure if I've ever seen it rigorously analyzed before however? Anyone have a citation? Does bwa mem just happily soft-clip off all of those AAA's and map anyway? Maybe Kallisto/Salmon/etc aren't impacted much? |
Well, my thoughts,
But with this dataset should be able to determine that! And I have no citation matt From: Sam Hunter notifications@github.com There must be some information on whether poly-A trimming impacts analysis? I'm not sure if I've ever seen it rigorously analyzed before however? Anyone have a citation? Does bwa mem just happily soft-clip off all of those AAA's and map anyway? Maybe Kallisto/Salmon/etc aren't impacted much? — |
I was also wondering, @msettles , if there were any assumptions we could build into this. Such as T's will only appear on the 5' end and A's will only appear on the 3' end? |
Depends on the library preparation method. So this data I have in mind has a specific set of assumptions, on what/where the polyA/T will occur, but the generic RNAseq, could have A or T at beginning of read or end of read But should think of how to specify some of those possibilities as parameters, with the default to look at all possible. And stats for all, for now Matt From: David Streett notifications@github.com I was also wonder, @msettles , if there were any assumptions we could build into this. Such as T's will only appear on the 5' end and A's will only appear on the 3' end? — |
Hey, @samhunter
Specific to AT trim - what should the default values be for min trim length and number of mismatch?
In general, min accepted length default?
All trimming algorithms will also have parameters for stranded, 3' trim, 5' trim.
Anything I am missing? We can run test later to actually get optimal values, but is there a decent first guess?
Thank you!
The text was updated successfully, but these errors were encountered: