Skip to content

Conversation

@fhamood
Copy link
Contributor

@fhamood fhamood commented May 20, 2025

No description provided.

@fhamood fhamood requested a review from Copilot May 20, 2025 14:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduce a new keep_all ambiguity mode to capture all PSMs in clusters, refine related utilities, and update argument handling and parsing.

  • Added list_all helper and expanded transfer to support keep_all with optional overwrite behavior
  • Refined type hints in utils, updated CLI help, and ensured scanID is cast to int in TMT processing
  • Tweaked cleanup in main, corrected docstring in evidence.py, and broadened pyarrow version in pyproject.toml

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
simsi_transfer/utils/utils.py Adjusted type hints, removed unused imports, and added list_all
simsi_transfer/transfer.py Added keep_all mode, overwrite flag, new aggregation logic
simsi_transfer/tmt_processing.py Cast scanID to int for both MS2 and MS3 extraction
simsi_transfer/main.py Left TODO comment, changed variable deletion order
simsi_transfer/evidence.py Updated docstring params to reflect current signature
simsi_transfer/command_line_interface.py Enhanced help for --ambiguity_decision with new mode
pyproject.toml Expanded pyarrow version constraint
Comments suppressed due to low confidence (3)

simsi_transfer/command_line_interface.py:107

  • Consider adding choices=['majority', 'all', 'keep_all'] to this argument to enforce valid inputs and catch typos early.
apars.add_argument('--ambiguity_decision', default='majority', metavar="S",

simsi_transfer/utils/utils.py:69

  • The type hint no longer accepts Set[Any] even though the implementation handles sets. Restore Union[List[Any], Set[Any]] to keep the signature accurate.
def remove_nan_values(input_list: List[Any]) -> List[Any]:

simsi_transfer/tmt_processing.py:171

  • The indentation and parentheses look misaligned for the int(re.search(...)[1]) call. Verify proper indentation and balanced parentheses to prevent syntax errors.
scanseries["scanID"] = int(re.search(r"scan=(\d+)", item["id"])[1])

return ";".join(sorted(set([x for x in s.split(";") if len(x) > 0])))


def list_all(x: pd.Series):
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new list_all utility lacks a return type annotation and docstring. Consider adding -> List[Any] and a brief description for consistency.

Suggested change
def list_all(x: pd.Series):
def list_all(x: pd.Series) -> List[Any]:
"""
Converts a pandas Series to a list.
:param x: A pandas Series object.
:return: A list containing the elements of the Series.
"""

Copilot uses AI. Check for mistakes.


def transfer(summary_df, max_pep=False, mask=False, ambiguity_decision='majority'):
def transfer(summary_df, max_pep=False, mask=False, ambiguity_decision='majority', overwrite=False):
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new overwrite parameter isn’t exposed in the CLI or documented. Add a corresponding flag or update the documentation so users can enable it.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +108
summary_df.loc[:, replacement_dict.keys()] = summary_df.loc[:, replacement_dict.values()]
else:
# Aren't all of these cells by definition NaNs? Can we make use of that?
# Like 'Merge, and if you find two columns with the same name always use the value that is not NaN!'
# Then we could do this directly in the merge and wouldn't have to go over the df again here
# Try pd.DataFrame.combine(). For this the dfs would need to be in the same shape though.
summary_df[identification_column] == 't', replacement_dict.keys()] = summary_df.loc[
summary_df[identification_column] == 't', replacement_dict.values()].to_numpy()
summary_df = summary_df.astype({col: 'object' for col in replacement_dict.keys()})
for col in replacement_dict.keys():
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using replacement_dict.keys() returns a dict_keys object, which may not work in loc. Convert to a list or explicit column list to avoid indexing errors.

Copilot uses AI. Check for mistakes.
Comment on lines 172 to 174

del msms_simsi, evidence_simsi
del msms_simsi

Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Previously both msms_simsi and evidence_simsi were deleted together; now only one is freed here and the other earlier. Review cleanup logic to ensure consistent memory release.

Copilot uses AI. Check for mistakes.
Comment on lines +319 to +321
:param tempfile: Temporary processing file, e.g., MaRaCluster clustering dataframe
:param msms_df: MaxQuant msms.txt dataframe
:return: Merged dataframe
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring parameters no longer match the function signature. Update or remove entries to reflect actual arguments (tempfile, msms_df, etc.).

Suggested change
:param tempfile: Temporary processing file, e.g., MaRaCluster clustering dataframe
:param msms_df: MaxQuant msms.txt dataframe
:return: Merged dataframe
:param summary: Summary dataframe containing information about MS2 scans.
:param evidence: Evidence dataframe containing precursor information.
:param allpeptides: AllPeptides dataframe containing peptide-level information.
:param plex: Integer specifying the plex level for the experiment.
:param num_threads: Number of threads to use for parallel processing.
:return: A concatenated dataframe containing merged evidence information.

Copilot uses AI. Check for mistakes.
fhamood and others added 3 commits May 20, 2025 16:53
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@fhamood fhamood merged commit e1387ac into master May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants