Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Lighter performance #12

Open
flashton2003 opened this issue Dec 23, 2014 · 5 comments
Open

Issue with Lighter performance #12

flashton2003 opened this issue Dec 23, 2014 · 5 comments

Comments

@flashton2003
Copy link

Hello,

I'm not sure that Github issue is the best place for this, but it is the suggested channel for support so will give it a go.

I had some good initial experiences with Lighter, so ran it on a larger number of samples (n = 2000). The hypothesis of the experiment was that Lighter would help to reduce errors that were causing 'mixed positions', where the consensus base at a position had the support of less than 90% of the reads that mapped there.

However, my initial good experience was not continued. The image below is 100 randomly selected samples from our 2000. It shows the number of mixed positions obtained when reads that have just been quality trimmed (uncor_trimmed) and those that have been quality trimmed and Lighter corrected (cor_trimmed) are mapped vs reference.

screen shot 2014-12-23 at 16 38 21

As you can see, the general trend is for there to be more mixed positions in the alignments that have been Lightered, rather than those that have been just trimmed. This was not expected!

When I looked more closely at the positions that were mixed after Lighter, but not before, I saw something like.

Before
before 10 36 01

After
after 10 36 01

I was initially using an alpha of 0.05 and k = 17, changing this to alpha = 0.1 and k = 25 made no difference to this phenomenon. Do you have any insight into what might be causing this?

OS is Red Hat Enterprise Linux Server release 6.4 (Santiago).

@mourisl
Copy link
Owner

mourisl commented Dec 23, 2014

What is the average coverage for these data sets? It looks like the depths of coverage is much thinner than the figures showed before.

@flashton2003
Copy link
Author

All of these have an average coverage across the whole genome of greater than 30 fold (average 55.5).

@mourisl
Copy link
Owner

mourisl commented Dec 24, 2014

I just added a "-K" feature, which infers alpha from the total number of bases and genome size(very naive method). And it can take care of the different coverage between samples. Can you give this a try?
If this still have the problem, all the reads may have low quality covering the region miscorrected and Lighter will not store those kmers. If this is the case, you can try the parameter "-noQual".
If this could not solve the problem, I think you can try a more conservative correction parameter by setting "-maxcor" to 2 or even 1.

@tseemann
Copy link

@flashton2003 did you ever go back and try the auto-alpha mode?

@flashton2003
Copy link
Author

No, I never did :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants