Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge command returned nothing when input msdfs had no confidence column #350

Merged
merged 8 commits into from
Mar 16, 2023

Conversation

hrshdhgd
Copy link
Contributor

@hrshdhgd hrshdhgd commented Mar 13, 2023

Fixes #348

@hrshdhgd hrshdhgd requested a review from matentzn March 13, 2023 19:09
@hrshdhgd
Copy link
Contributor Author

I made this decision. Does this seem logical @matentzn ?

@hrshdhgd hrshdhgd changed the title merge returned nothing when msdfs had no confidence column merge command returned nothing when msdfs had no confidence column Mar 13, 2023
@hrshdhgd hrshdhgd changed the title merge command returned nothing when msdfs had no confidence column merge command returned nothing when input msdfs had no confidence column Mar 13, 2023
@hrshdhgd hrshdhgd marked this pull request as ready for review March 13, 2023 19:11
sssom/util.py Outdated
@@ -273,6 +273,8 @@ def filter_redundant_rows(
:param ignore_predicate: If true, the predicate_id column is ignored, defaults to False
:return: Filtered pandas DataFrame
"""
if CONFIDENCE not in df.columns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we assumed that missing confidence means 100% confidence? I think this assumption is more often true. Also, if you do ad the IF clause here, the dataframe will not be sorted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized, line 280 has assign_default_confidence which adds a new column 'confidence' if absent BUT it initiates to np.NaN. Is this what we still intend to do?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but it is important that the absence of a confidence value does not lead to a removal of the entire mapping! While it is not the same as confidence 1.0, it should probably be interpreted as such during reconciliation.. Unless you see a strong reason against it?

Copy link
Contributor Author

@hrshdhgd hrshdhgd Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of np.NaN is causing the df returned to be empty in the code between lines 297 to 311

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to itiniate it to "0.0" instead of np.Nan

@hrshdhgd
Copy link
Contributor Author

Added a flag: confidence_column_created , thoughts?

@hrshdhgd hrshdhgd self-assigned this Mar 15, 2023
@hrshdhgd hrshdhgd requested a review from matentzn March 15, 2023 15:56
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a xomment

sssom/util.py Outdated Show resolved Hide resolved
sssom/util.py Outdated Show resolved Hide resolved
sssom/util.py Outdated Show resolved Hide resolved
hrshdhgd and others added 2 commits March 16, 2023 12:07
Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>
Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>
@hrshdhgd
Copy link
Contributor Author

Good catch! sorry about that!

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case for reconcile false?

@hrshdhgd
Copy link
Contributor Author

Done

@hrshdhgd hrshdhgd requested a review from matentzn March 16, 2023 19:32
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment

@matentzn
Copy link
Collaborator

Nice! Thanks!

@hrshdhgd hrshdhgd merged commit 66eb981 into master Mar 16, 2023
@hrshdhgd hrshdhgd deleted the issue-348 branch March 16, 2023 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge two data frames with --reconcile true leads to empty result
2 participants