Conversation
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Comment |
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…parquet handling Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
|
@copilot can you make the |
… data Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Changed
Commit: 0132d06 |
|
@copilot Can you add the same logic to the split_parquet reader: pyprophet/io/export/split_parquet.py |
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Added alignment integration to split_parquet reader with the same logic:
Alignment now works across all export formats: OSW, Parquet, and Split Parquet. Commit: 6c545ea |
|
@copilot Can you explain or add a workflow diagram illustrating your addition/change |
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Created comprehensive workflow documentation in Workflow Diagrams:
Examples:
Key Information:
The diagrams use ASCII art for easy viewing in any text editor or on GitHub. Commit: 901f25b |
|
@copilot can you pull in recent changes from master branch and resolve conflicts |
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
|
@copilot great. Now I'm just thinking, I think we should return the alignment_group_id, alignment_reference_feature_id, alignment_reference_rt, alignment_pep, alignment_qvalue for the other features that don't need the alignment recovery as well right, because they also still get scored for the alignment quality. |
You're absolutely right. Currently, alignment information is only added to features that need recovery (those that didn't pass the MS2 threshold). However, since all features get scored for alignment quality, the alignment columns should be populated for ALL features that have alignment data available, not just recovered ones. I'll update the logic to:
This will provide complete alignment information across all features while still clearly marking which ones were recovered through alignment.
Updated the implementation to include alignment information for ALL features, not just recovered ones. Now when alignment is enabled:
This makes sense since alignment tools score all features for alignment quality regardless of MS2 threshold, so users can see alignment consistency across all features. Commit: e0c4445 |
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
|
@copilot we don't seem to include |
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Added Commit: bc78c82 |
|
@copilot, There are two problems:
From this example, for feature id 932099423037153459 we see the aligned reference id is 5405272318039692288. However, if we look at the id the one this id is supposed to be /match with is 5405272318039692409. We can tell this because the aligned reference RT is 3560.75 and the RT for feature id 5405272318039692409 is 3560.75.
Can you fix these two issues. |
@copilot can you try again |
|
@copilot Did you fix and commit changes to address the problem below There are two problems:
From this example, for feature id 932099423037153459 we see the aligned reference id is 5405272318039692288. However, if we look at the id the one this id is supposed to be /match with is 5405272318039692409. We can tell this because the aligned reference RT is 3560.75 and the RT for feature id 5405272318039692409 is 3560.75.
Can you fix these two issues. |
|
@copilot, There are two problems:
From this example, for feature id 932099423037153459 we see the aligned reference id is 5405272318039692288. However, if we look at the id the one this id is supposed to be /match with is 5405272318039692409. We can tell this because the aligned reference RT is 3560.75 and the RT for feature id 5405272318039692409 is 3560.75.
Can you fix these two issues. |
…olumn - Added 'pep' column to the output of test_pyprophet_export.test_osw_analysis with split_parquet set to False. - Updated output of test_pyprophet_export.test_osw_analysis with split_parquet set to True to reflect the addition of the 'pep' column.


SCORE_ALIGNMENT Integration for Export - COMPLETE ✅
Successfully integrated SCORE_ALIGNMENT table results into PyProphet's export TSV/matrix methods for regular OpenSWATH results, and added MS2 PEP (posterior error probability) to all standard exports.
Latest Update
Added SCORE_MS2.PEP column (
MS2_PEAKGROUP_PEP) to all standard OpenSWATH data exports across OSW, Parquet, and Split Parquet readers, matching what was already done for IPF exports.Enhanced alignment integration to include alignment information for ALL features, not just recovered ones.
Change:
Previously, alignment columns were only added to features recovered through alignment. Now, alignment information (
alignment_group_id,alignment_reference_feature_id,alignment_reference_rt,alignment_pep,alignment_qvalue) is added to ALL features that have alignment data:from_alignment=0from_alignment=1Rationale:
All features are scored for alignment quality by alignment tools (e.g., ARYCAL), regardless of whether they pass the MS2 threshold. Users should see these alignment quality metrics for all features to assess the consistency of identification across runs. Similarly, MS2 PEP provides the posterior error probability alongside the existing QVALUE (
m_score).Previous Fixes:
alignment_reference_feature_iddisplaying as scientific notation by explicitly converting to Int64 typeLABEL=1(OSW) andDECOY=1(Parquet) for non-decoy alignmentsComplete Implementation Coverage
✅ OSW Export (
io/export/osw.py) - MS2 PEP + Alignment info for all features with proper integer types✅ Parquet Export (
io/export/parquet.py) - MS2 PEP + Alignment info for all features with proper integer types✅ Split Parquet Export (
io/export/split_parquet.py) - MS2 PEP + Alignment info for all features with proper integer types✅ CLI (
cli/export.py) - Default enabled with auto-detection✅ Configuration (
_config.py) - Defaultuse_alignment=True✅ Documentation (
ALIGNMENT_INTEGRATION_WORKFLOW.md) - Workflow diagrams✅ Up-to-date with master - All recent changes merged
✅ CI Tests Fixed - No output format changes when alignment not used
✅ Reference tracking - Reference feature ID and RT included for all aligned features
✅ Reference quality control - Reference features must pass MS2 QVALUE threshold
✅ Alignment grouping - alignment_group_id tracks alignment relationships (like IPF)
✅ SQL error fixed - No more ambiguous column errors
✅ Target filtering - Parquet/split_parquet correctly filter DECOY=1 (targets)
✅ Integer display fixed - Reference IDs display as integers, not scientific notation
✅ Complete alignment data - All features get alignment info, not just recovered ones
✅ MS2 PEP included - All exports now include SCORE_MS2.PEP (consistent with IPF)
Problem Solved
Previously, alignment results from tools like ARYCAL could only be used in IPF workflows. Regular OpenSWATH exports ignored alignment data, meaning peaks with weak MS2 signals couldn't be recovered even when alignment indicated reliability across runs. Additionally, the MS2 PEP score was missing from standard exports despite being available in IPF exports.
Solution Implemented
Configuration (
_config.py)use_alignmentdefault fromFalsetoTruemax_alignment_pepthreshold (default=0.7)OSW Export (
io/export/osw.py)_check_alignment_presence(): Detects if FEATURE_MS2_ALIGNMENT and SCORE_ALIGNMENT tables exist_fetch_alignment_features(): Queries alignment tables with dual quality filters and calculates alignment_group_id:alignment_group_idusingDENSE_RANK() OVER (ORDER BY PRECURSOR_ID, ALIGNMENT_ID)_read_standard_data(): Integrates alignment data into ALL features when enabled AND data presentSCORE_MS2.PEP AS MS2_PEAKGROUP_PEP_export_alignment_data(): Now includes SCORE_ALIGNMENT PEP/QVALUE in parquet exportsParquet Export (
io/export/parquet.py)_check_alignment_file_exists(): Detects if alignment parquet file exists_fetch_alignment_features(): Filters alignment data with reference quality check and group ID calculationDECOY = 1SCORE_MS2_PEP AS MS2_PEAKGROUP_PEPSplit Parquet Export (
io/export/split_parquet.py)_check_alignment_file_exists(): Detects alignment file in parent directory_fetch_alignment_features(): Loads and filters alignment data with reference quality check and group ID calculationDECOY = 1_read_standard_data(): Integrates alignment data into all features with same logic as other readersSCORE_MS2_PEP AS MS2_PEAKGROUP_PEPCLI (
cli/export.py)Truefor bothexport tsvandexport matrixDocumentation (
ALIGNMENT_INTEGRATION_WORKFLOW.md)Usage
Output Columns
Standard Columns (always present):
m_score- SCORE_MS2.QVALUEMS2_PEAKGROUP_PEP- SCORE_MS2.PEP (NEW - now included in all exports)Alignment Columns (when alignment enabled and data present):
from_alignment- 0 for features that passed MS2 threshold, 1 for recovered featuresalignment_group_id- Unique ID grouping features from the same alignment (Int64)alignment_reference_feature_id- Feature ID used as alignment reference (Int64)alignment_reference_rt- Retention time of the reference feature (float)alignment_pep- Alignment posterior error probability (float)alignment_qvalue- Alignment q-value (float)Quality Control Logic
For a feature to be recovered through alignment:
This ensures only high-confidence reference features are used for recovery.
Key Features
✅ Auto-detects alignment data presence (like IPF detection)
✅ Enabled by default for convenience
✅ Recovers peaks with weak MS2 but good alignment scores
✅ Reference features must pass quality threshold for recovery
✅ Alignment group ID tracks alignment relationships (like IPF)
✅ Works with OSW, Parquet, and Split Parquet formats
✅ Graceful handling of missing alignment tables/files
✅ All features get alignment info (not just recovered ones)
✅ MS2 PEP now included in all exports (consistent with IPF)
✅ Clear marking of recovered vs passed features with
from_alignmentflag✅ Complete traceability with reference feature ID, RT, and group ID
✅ Proper export of alignment scores
✅ No breaking changes to existing workflows
✅ Comprehensive documentation with workflow diagrams
✅ Up-to-date with latest master branch changes
✅ Backward compatible output format
✅ No SQL ambiguous column errors
✅ Proper integer display (no scientific notation)
Requirements
Alignment tables:
FEATURE_MS2_ALIGNMENTandSCORE_ALIGNMENTIf tables are missing, export proceeds normally without errors.
See
ALIGNMENT_INTEGRATION_WORKFLOW.mdfor detailed workflow diagrams and examples.Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.