Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting mapping instance(s) for a read which maps to multiple positions #90

Closed
ffranr opened this issue Feb 28, 2018 · 12 comments
Closed
Assignees

Comments

@ffranr
Copy link
Contributor

ffranr commented Feb 28, 2018

Quasimapping gives multiple mapping options

A read maps to 7 "positions" within the PRG:

  1. non-variant region 1
  2. non-variant region 2
  3. site 2
  4. site 2 + site 3 + site 4
  5. site 1 (within allele 1 twice or more)
  6. site 5 (completely encapsulated within allele 1 and completely encapsulated within allele 2)
  7. site 6 (completely encapsulated within allele 1, completely encapsulated within allele 2, and not completely encapsulated within allele 3).

A single option is chosen from [1, 7] via uniform random selection.


Handling Selected Option

  1. Move on to the next read without modifying coverage information.
  2. Move on to the next read without modifying coverage information.
  3. Record coverage information for all relevant alleles.
  4. Record coverage information for all relevant alleles.
  5. Uniform random selection on options within allele 1, followed by recording coverage information for all relevant alleles.
  6. Record coverage information for all relevant alleles.
  7. Record coverage information for all relevant alleles.
@ffranr
Copy link
Contributor Author

ffranr commented Feb 28, 2018

@martinghunt @iqbal-lab How does that sound?

@martinghunt
Copy link
Member

Fine with me. So long as we're happy that there are two positions in 5, but uniform random selection is on [1,2,3,4,5], halving the probability of the two positions within site 1 allele 1?

@iqbal-lab
Copy link
Collaborator

iqbal-lab commented Feb 28, 2018

I'm ok with this, but for option 5, shouldn't it read:

Uniform random selection on options within allele 1, followed by Record coverage information for all relevant alleles

@iqbal-lab
Copy link
Collaborator

Does that make sense?

@ffranr
Copy link
Contributor Author

ffranr commented Feb 28, 2018

@iqbal-lab OK, updated point 5.

@iqbal-lab
Copy link
Collaborator

Cool, just checking i had not misunderstood anything

@ffranr
Copy link
Contributor Author

ffranr commented Mar 6, 2018

@martinghunt This work should be completed with this commit: 47e41d7

This commit also contributed to correctly implementing this issue: 2c776c2

If there are further changes to be made, let's keep them to this issue.

@iqbal-lab
Copy link
Collaborator

To discuss after Easter, I think our handling of item 4 is not right.
Right thing would be choose the site where the mate read maps closest, but that's an enhancement.
Not sure what right thing is

@ffranr
Copy link
Contributor Author

ffranr commented May 24, 2018

@martinghunt @iqbal-lab I'm currently handling this issue: quasimap Assertion `index_end_boundary >= allele_coverage_offset' failed I understand why the error occurs and I've implemented a partial solution. However, that issue has lead me back to this issue.

I'm uncertain about how to handle point 6 and point 7 (see first post in this issue: "Quasimapping gives multiple mapping options").

Please help me understand how I should deal with this.

@iqbal-lab
Copy link
Collaborator

OK, so suppose we now have

A read maps to SEVEN "positions" within the PRG:

  1. non-variant region 1
  2. non-variant region 2
  3. site 2
  4. site 2 + site 3 + site 4
  5. site 1 (within allele 1 twice or more)
  6. site 5 (completely encapsulated within allele 1 and completely encapsulated within allele 2)
  7. site 6 (completely encapsulated within allele 1, completely encapsulated within allele 2, and not completely encapsulated within allele 3).

A single option is chosen from [1, 7] via uniform random selection.
Once you have chosen, deal with it as follows:

For 1..5 this is identical to above

  1. Move on to the next read without modifying coverage information.
  2. Move on to the next read without modifying coverage information.
  3. Record coverage information for all relevant alleles.
  4. Record coverage information for all relevant alleles.
  5. Uniform random selection on options within allele 1, followed by recording coverage information for all relevant alleles.
  6. Record coverage information for all relevant alleles.
  7. Record coverage information for all relevant alleles.

Completely within an allele is the same as partially overlapping an allele. Record the per-base coverage, and the equivalence-classes/partitions as normal

@ffranr
Copy link
Contributor Author

ffranr commented May 29, 2018

@iqbal-lab Thanks! I've updated the first post.

@iqbal-lab
Copy link
Collaborator

Sounds good - does this need to be open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants