Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assumption made in code masked by earlier check. #42

Open
innovate-invent opened this issue May 13, 2017 · 6 comments
Open

Assumption made in code masked by earlier check. #42

innovate-invent opened this issue May 13, 2017 · 6 comments

Comments

@innovate-invent
Copy link

This is super low priority but the code here assumes that the first read is the forward read and the second read is the reverse. Everywhere that this function is called ensures this but any new code that calls this function may not. This also assumes that the reverse read does not extend past the beginning of the forward read. This is dependent on the sequencing technique and may not be future proof.

@pjvandehaar
Copy link
Contributor

pjvandehaar commented May 15, 2017

Thanks for reporting this. Unfortunately, I don't understand yet. Could you explain more? SamRecord.get0BasedPosition() returns the leftmost position of the record, the first record always starts before the second, and CigarHelper::softClipEndByRefPos handles forward/reverse reads correctly, right?

@innovate-invent
Copy link
Author

the first record always starts before the second

is the assumption I am referring to. This could fail given an exotic data set.

firstRecord                    ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
secondRecord  |||||||||||||||||||||||||||||||||||||||||||||||||

is just one possibility if the data is not sorted by coordinate.

@pjvandehaar
Copy link
Contributor

pjvandehaar commented May 15, 2017

Thanks for clarifying that. Would an assertion be a good approach?

if (firstRecord.get0BasedPosition() > secondRecord.get0BasedPosition()) {
 std::ostringstream errmsg;
 errmsg 
  << "BamUtil's OverlapClipLowerBaseQual::handleOverlapPair encountered paired reads"
  << " in the wrong order where firstRecord's leftmost 0-based position is "
  << firstRecord.get0BasedPosition()
  << " and secondRecord's leftmost 0-based position is "
  << secondRecord.get0BasedPosition()
  << std::endl;
 throw(std::runtime_error(errmsg.str()));
}

Or just fixing the problem?

void OverlapClipLowerBaseQual::handleOverlapPair(SamRecord& firstRecord,
                                                 SamRecord& secondRecord)
{
  SamRecord *first = &firstRecord;
  SamRecord *second = &secondRecord;
  if (first->get0BasedPosition() > second->get0BasedPosition()) {
    SamRecord *tmp = first;
    first = second;
    second = tmp;
  }
  // replace `firstRecord.` with `first->` and `f(firstRecord)` with `f(*first)`
}

@innovate-invent
Copy link
Author

innovate-invent commented May 15, 2017

Depending on how robust you want the code to be you have a few options.
Ideally you would handle any data set by just comparing the start and ends of each record and retain the inner region.
Something along the lines of

    int32_t overlapStart = firstRecord.get0BasedPosition() > secondRecord.get0BasedPosition() ? firstRecord.get0BasedPosition() : secondRecord.get0BasedPosition();
    int32_t overlapEnd = firstRecord.get0BasedAlignmentEnd() > secondRecord.get0BasedAlignmentEnd() ? secondRecord.get0BasedAlignmentEnd() : firstRecord.get0BasedAlignmentEnd();

I haven't fully dissected the rest of this function to understand how this change would affect it otherwise I would just submit a pull request.

Given my assumption of how the function handles these variables it should be able to handle my previous example as well as

first               ||||||||||||||||
second  |||||||||||||||||||||||||||||||||||||

or

first       |||||||||||||||||||||||||||||||||||||
second             ||||||||||||||

@pjvandehaar
Copy link
Contributor

It already handles

first       |||||||||||||||||||||||||||||||||||||
second             ||||||||||||||

here

@innovate-invent
Copy link
Author

Unless the average quality of second is greater.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants