Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update batch processing length normalization to match non-batch processing length normalization #441

Merged
merged 4 commits into from
May 16, 2024

Conversation

bound-to-love
Copy link
Collaborator

No description provided.

Copy link

@mschilli87 mschilli87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look and thinks this needs work.

src/main.cpp Outdated
//<< " --error_rate Estimated error rate of long reads (required for --long)" << endl
<< " --threshold Threshold for rate of unmapped kmers per read" << endl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks accidental.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what looks accidental?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the diff on github, you are removing whitespace here:
image
In line 2073 you fix the alignment of 'Treat' but in line 2075, 'Threshold' is shifted to left too much.

@@ -2180,7 +2180,7 @@ void usageTCCQuant(bool valid_input = true) {
<< " (default: equivalence classes are taken from the index)" << endl
<< "-f, --fragment-file=FILE File containing fragment length distribution" << endl
<< " (default: effective length normalization is not performed)" << endl
<< "--long Use version of EM for long reads " << endl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify your question?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem as in line 2075.

@@ -2380,7 +2380,7 @@ int main(int argc, char *argv[]) {
if (fld_lr_c[i] > 0.5) {
//Good results with comment below.
//flensout_f << std::fabs((double)fld_lr[i] / (double)fld_lr_c[i] - index.k);//index.target_lens_[i] - (double)fld_lr[i] / (double)fld_lr_c[i] - k); // take mean of recorded uniquely aligning read lengths
flensout_f << std::fabs(index.target_lens_[i] - ((double)fld_lr[i] / (double)fld_lr_c[i]) - index.k);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Care to elaborate a bit? Ideally in comment, else maybe at least in the commit message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Based on our analysis for effective length normalization for long reads, the updated effective length provides better results.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't follow what you are changing and why but I am not familiar with the code. So if this makes sense to others without further explanation, feel free to ignore my comment.

Copy link

@mschilli87 mschilli87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bound-to-love: I tried to answer your questions.

@@ -2380,7 +2380,7 @@ int main(int argc, char *argv[]) {
if (fld_lr_c[i] > 0.5) {
//Good results with comment below.
//flensout_f << std::fabs((double)fld_lr[i] / (double)fld_lr_c[i] - index.k);//index.target_lens_[i] - (double)fld_lr[i] / (double)fld_lr_c[i] - k); // take mean of recorded uniquely aligning read lengths
flensout_f << std::fabs(index.target_lens_[i] - ((double)fld_lr[i] / (double)fld_lr_c[i]) - index.k);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't follow what you are changing and why but I am not familiar with the code. So if this makes sense to others without further explanation, feel free to ignore my comment.

@@ -2180,7 +2180,7 @@ void usageTCCQuant(bool valid_input = true) {
<< " (default: equivalence classes are taken from the index)" << endl
<< "-f, --fragment-file=FILE File containing fragment length distribution" << endl
<< " (default: effective length normalization is not performed)" << endl
<< "--long Use version of EM for long reads " << endl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem as in line 2075.

src/main.cpp Outdated
//<< " --error_rate Estimated error rate of long reads (required for --long)" << endl
<< " --threshold Threshold for rate of unmapped kmers per read" << endl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the diff on github, you are removing whitespace here:
image
In line 2073 you fix the alignment of 'Treat' but in line 2075, 'Threshold' is shifted to left too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants