-
Notifications
You must be signed in to change notification settings - Fork 4
Keep all modseq mode for ProSIMSIt #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Introduce a new keep_all ambiguity mode to capture all PSMs in clusters, refine related utilities, and update argument handling and parsing.
- Added
list_allhelper and expandedtransferto supportkeep_allwith optional overwrite behavior - Refined type hints in
utils, updated CLI help, and ensuredscanIDis cast to int in TMT processing - Tweaked cleanup in
main, corrected docstring inevidence.py, and broadenedpyarrowversion inpyproject.toml
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| simsi_transfer/utils/utils.py | Adjusted type hints, removed unused imports, and added list_all |
| simsi_transfer/transfer.py | Added keep_all mode, overwrite flag, new aggregation logic |
| simsi_transfer/tmt_processing.py | Cast scanID to int for both MS2 and MS3 extraction |
| simsi_transfer/main.py | Left TODO comment, changed variable deletion order |
| simsi_transfer/evidence.py | Updated docstring params to reflect current signature |
| simsi_transfer/command_line_interface.py | Enhanced help for --ambiguity_decision with new mode |
| pyproject.toml | Expanded pyarrow version constraint |
Comments suppressed due to low confidence (3)
simsi_transfer/command_line_interface.py:107
- Consider adding
choices=['majority', 'all', 'keep_all']to this argument to enforce valid inputs and catch typos early.
apars.add_argument('--ambiguity_decision', default='majority', metavar="S",
simsi_transfer/utils/utils.py:69
- The type hint no longer accepts
Set[Any]even though the implementation handles sets. RestoreUnion[List[Any], Set[Any]]to keep the signature accurate.
def remove_nan_values(input_list: List[Any]) -> List[Any]:
simsi_transfer/tmt_processing.py:171
- The indentation and parentheses look misaligned for the
int(re.search(...)[1])call. Verify proper indentation and balanced parentheses to prevent syntax errors.
scanseries["scanID"] = int(re.search(r"scan=(\d+)", item["id"])[1])
| return ";".join(sorted(set([x for x in s.split(";") if len(x) > 0]))) | ||
|
|
||
|
|
||
| def list_all(x: pd.Series): |
Copilot
AI
May 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new list_all utility lacks a return type annotation and docstring. Consider adding -> List[Any] and a brief description for consistency.
| def list_all(x: pd.Series): | |
| def list_all(x: pd.Series) -> List[Any]: | |
| """ | |
| Converts a pandas Series to a list. | |
| :param x: A pandas Series object. | |
| :return: A list containing the elements of the Series. | |
| """ |
|
|
||
|
|
||
| def transfer(summary_df, max_pep=False, mask=False, ambiguity_decision='majority'): | ||
| def transfer(summary_df, max_pep=False, mask=False, ambiguity_decision='majority', overwrite=False): |
Copilot
AI
May 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new overwrite parameter isn’t exposed in the CLI or documented. Add a corresponding flag or update the documentation so users can enable it.
| summary_df.loc[:, replacement_dict.keys()] = summary_df.loc[:, replacement_dict.values()] | ||
| else: | ||
| # Aren't all of these cells by definition NaNs? Can we make use of that? | ||
| # Like 'Merge, and if you find two columns with the same name always use the value that is not NaN!' | ||
| # Then we could do this directly in the merge and wouldn't have to go over the df again here | ||
| # Try pd.DataFrame.combine(). For this the dfs would need to be in the same shape though. | ||
| summary_df[identification_column] == 't', replacement_dict.keys()] = summary_df.loc[ | ||
| summary_df[identification_column] == 't', replacement_dict.values()].to_numpy() | ||
| summary_df = summary_df.astype({col: 'object' for col in replacement_dict.keys()}) | ||
| for col in replacement_dict.keys(): |
Copilot
AI
May 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using replacement_dict.keys() returns a dict_keys object, which may not work in loc. Convert to a list or explicit column list to avoid indexing errors.
|
|
||
| del msms_simsi, evidence_simsi | ||
| del msms_simsi | ||
|
|
Copilot
AI
May 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Previously both msms_simsi and evidence_simsi were deleted together; now only one is freed here and the other earlier. Review cleanup logic to ensure consistent memory release.
| :param tempfile: Temporary processing file, e.g., MaRaCluster clustering dataframe | ||
| :param msms_df: MaxQuant msms.txt dataframe | ||
| :return: Merged dataframe |
Copilot
AI
May 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring parameters no longer match the function signature. Update or remove entries to reflect actual arguments (tempfile, msms_df, etc.).
| :param tempfile: Temporary processing file, e.g., MaRaCluster clustering dataframe | |
| :param msms_df: MaxQuant msms.txt dataframe | |
| :return: Merged dataframe | |
| :param summary: Summary dataframe containing information about MS2 scans. | |
| :param evidence: Evidence dataframe containing precursor information. | |
| :param allpeptides: AllPeptides dataframe containing peptide-level information. | |
| :param plex: Integer specifying the plex level for the experiment. | |
| :param num_threads: Number of threads to use for parallel processing. | |
| :return: A concatenated dataframe containing merged evidence information. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.