Skip to content

Implement alignment-adjusted PEP recalculation and re-ranking#8

Draft
Copilot wants to merge 4 commits intomasterfrom
copilot/update-pep-recalculation
Draft

Implement alignment-adjusted PEP recalculation and re-ranking#8
Copilot wants to merge 4 commits intomasterfrom
copilot/update-pep-recalculation

Conversation

Copy link

Copilot AI commented Oct 31, 2025

Building on PRs #1 and #7 which integrated alignment data into exports, this PR computes adjusted posterior error probabilities (PEPs) that combine MS2 and alignment evidence, then re-ranks peak groups and computes new model-based FDR.

Changes

Core Implementation

  • Added compute_adjusted_pep_and_rerank() in pyprophet/io/util.py
    • Computes pep_adj = 1 - (1 - pep_ms2) × (1 - pep_align) for aligned features
    • Reference features (alignment anchors) maintain MS2 PEP unchanged to avoid double-counting
    • Re-ranks within each (run_id, transition_group_id) by adjusted PEP
    • Computes new qvalues via compute_model_fdr on top-1 features per group

Integration

  • Called automatically in all readers (OSW, Parquet, SplitParquet)
    • Executes after data augmentation, before export
    • Gracefully skips when no alignment data present

Output Schema

  • Preserves MS2-only results: m_scorems2_m_score, peak_group_rankms2_peak_group_rank
  • New columns:
    • ms2_aligned_adj_pep: Combined PEP from MS2 and alignment
    • m_score: Model-based qvalues from adjusted PEPs (replaces old m_score)
    • peak_group_rank: New ranking based on adjusted PEPs (replaces old ranking)

Example

# Reference feature (alignment anchor)
pep_ms2 = 0.02
alignment_pep = N/A  # Reference doesn't align to itselfms2_aligned_adj_pep = 0.02  # Unchanged

# Aligned feature with weak MS2 but good alignment
pep_ms2 = 0.90
alignment_pep = 0.06ms2_aligned_adj_pep = 1 - (1-0.90)×(1-0.06) = 0.906

Documentation

  • Updated ALIGNMENT_INTEGRATION_WORKFLOW.md with detailed process overview, formula interpretation, and output column descriptions

Bug Fix

  • Fixed incorrect return statement placement in SplitParquetReader.read() that prevented proper cleanup
Original prompt

From PR's #1 and #7, we've endabled the incorproation and export of alignment results if they're present, when exporting regular standard OpenSwath results. Now the next step is to use the alignment results to perform a recalulcation of the PEPs, qvalue and re-ranking of peak-groups.

  1. What to do with the reference feature’s missing alignment_pep?

Treat it as neutral: set alignment_pep = 1.0.
Reasoning:

The reference is the anchor; the alignment statistic is defined relative to it, so it doesn’t provide extra independent evidence for itself.

Setting it to 0 (perfect support) or to pep_ms2 would double-count evidence and bias the anchor downward (over-confident).

With alignment_pep = 1, the combined posterior stays pep_adj = pep_ms2 for the reference row, which is exactly what we want.

(If you want a reporting column that summarizes group support, you can propagate the group-best adjusted score to all siblings for display, but keep the ranking/FDR based on each row’s own pep_adj and use only top-1 per (run, precursor) for FDR.)

  1. Can you reuse compute_model_fdr on pep_adj?

Yes. That function expects a vector of PEPs and returns model-based q-values (the mean PEP up to each rank). Use it like this:

Compute pep_adj = 1 - (1 - pep_ms2)·(1 - pep_align) (clip to (ε, 1−ε) as usual).

Within each (run_id, FullPeptideName, Charge) (or your key), keep top-1 = argmin(pep_adj).

Pass that top-1 vector of pep_adj to compute_model_fdr. The result is your new m_score_adj (q-values) for those winners.

If desired, join m_score_adj back to all rows and/or propagate the group-best for display.

That’s fully consistent with PyProphet’s semantics: you combined calibrated posteriors first, then computed model-based FDR on the winners. If you instead want decoy-based q-values, you’d refit the usual target/decoy q from the new score order—but for model-based FDR your compute_model_fdr is exactly the right tool.

Tiny implementation detail

For the reference row (your feature id 5405272318039692409), set alignment_pep=1.0 ⇒ pep_adj = pep_ms2 = 0.02.

For the three alignment-only rows (pep_ms2=1), pep_adj = alignment_pep (0.06, 0.14, 0.08).

Re-rank within each run/precursor by pep_adj (or -log10(pep_adj)); if you want MS2 to win on exact ties, break ties with from_alignment (MS2 preferred).

That’s it—you’ll get calibrated, conservative q-values that let alignment-rescued peaks rise when the alignment evidence is strong, while keeping the anchor unbiased.

So the end result of what we would want basically, is the old m_score, and peak_group_rank would be kept in the output tsv and renamed to ms2_m_score, and providence_peak_group_rank. We would then add the add the pep_adj (which we can name ms2_aligned_adj_pep), and m_score (which would represent the new qvalue from the new adjusted PEPs).


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@coderabbitai
Copy link

coderabbitai bot commented Oct 31, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 3 commits October 31, 2025 04:05
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Copilot AI changed the title [WIP] Update PEP recalculation using alignment results Implement alignment-adjusted PEP recalculation and re-ranking Oct 31, 2025
Copilot AI requested a review from singjc October 31, 2025 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants