New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Table Matcher module for matching identical noisy observations across two tables. #4238
Conversation
Hello @Raahul-Singh! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-06-03 16:07:26 UTC |
setup.cfg
Outdated
@@ -40,6 +40,7 @@ install_requires = | |||
pandas>=0.23.0 | |||
astropy>=3.2 | |||
parfive[ftp]>=1.1.0 | |||
scikit-learn>=0.23.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved into its own extra dependency (as below) and the imports moved into the functions with an import check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the changes I have made suffice?
Re: the discussion at the community meeting last week, I think there needs to be a larger discussion about whether this feature is really in the scope of sunpy core. I'm of course not debating whether this functionality is useful. It most certainly is! But this seems to have broad utility outside of solar physics and may be more widely used/discoverable if it was placed in a package with a broader scope than sunpy. It looks like this just works on pandas dataframes currently (please correct me if I'm wrong!). If this operated on astropy tables, perhaps it could be upstreamed to astropy? Or maybe even to pandas in its current state? I recognize there's probably a significantly larger effort involved in merging it into either of those two, particularly the latter. Thoughts? |
I'm working on making it compliant with Astropy tables. It should be done soon enough. Hopefully by the next community meeting. 😅 I also request everyone to suggest any other metrics that would be useful, besides the cosines similarity and euclidean distance. |
This adds a module to match the rows across two dataframes, where the rows correspond to the same observation but vary in the exact values of the measured values.
This module requires adding Scikit-learn as a dependency. If this is not acceptable (which I assume it isn't), I will need to implement the pairwise cosine similarity, euclidean distance, etc. algorithms which will be used here.
Currently, this supports the following metrics:
Possible impronvements:
This was earlier part of #4233 but on further discussion with @dpshelio, it was decided that this should be implemented as a separate module.