Skip to content

Modified search to take in multiple strings#4650

Merged
kratman merged 27 commits into
pybamm-team:developfrom
medha-14:multi_string
Dec 26, 2024
Merged

Modified search to take in multiple strings#4650
kratman merged 27 commits into
pybamm-team:developfrom
medha-14:multi_string

Conversation

@medha-14
Copy link
Copy Markdown
Contributor

@medha-14 medha-14 commented Dec 9, 2024

Description

Fixes #4629

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist:

  • No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python -m pytest (or $ nox -s tests)
  • The documentation builds: $ python -m pytest --doctest-plus src (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ nox -s quick.

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@medha-14 medha-14 requested a review from a team as a code owner December 9, 2024 10:14
@medha-14
Copy link
Copy Markdown
Contributor Author

medha-14 commented Dec 9, 2024

I have modified the search method to accept multiple strings. In cases where an exact match is not found, I pass the concatenated string of the multiple inputs to the get_close_matches method.Will this approach suffice for what we are trying to achieve here ?

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.22%. Comparing base (a7253b8) to head (2e59daa).
Report is 121 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4650      +/-   ##
===========================================
- Coverage    99.22%   99.22%   -0.01%     
===========================================
  Files          303      303              
  Lines        23070    23102      +32     
===========================================
+ Hits         22891    22922      +31     
- Misses         179      180       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@kratman kratman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not test the code changes locally, but they were just renamings.

Can you add a test with multiple keys in the search? It looks like you only fixed the test for the output formatting

Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
@medha-14
Copy link
Copy Markdown
Contributor Author

I have implemented the suggested changes and added tests for searching multiple strings as well.

Copy link
Copy Markdown
Member

@brosaplanella brosaplanella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just needs an entry to the CHANGELOG before merging

Copy link
Copy Markdown
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @medha-14! Happy to approve/merge after these suggestions.

Comment thread CHANGELOG.md
Comment thread src/pybamm/util.py
@agriyakhetarpal
Copy link
Copy Markdown
Member

Actually, in the case of a partial match, would it be better to indicate that it is such? For example,

model.variables.search(["NotAVariable", "concentration"])

now returns

No results for search using '['NotAVariable', 'concentration']'.
Best matches are ['Electrolyte concentration']

but we could say something like

Partial match for key 'concentration' in search using keys '['NotAVariable', 'concentration']'.
Best matches are ['Electrolyte concentration']

because we do have a match here for "concentration", but not for "NotAVariable".

@medha-14
Copy link
Copy Markdown
Contributor Author

medha-14 commented Dec 11, 2024

In cases where some keys have an exact match while others only have a best match, what should the expected result be? Should we only prioritize the exact matches in such cases or should we also have the best matches printed separately?

@agriyakhetarpal
Copy link
Copy Markdown
Member

In cases where some keys have an exact match while others only have a best match, what should the expected result be? Should we only prioritize the exact matches in such cases or should we also have the best matches printed separately?

Do you mean the case where we have an exact match for $m$ keys and a partial/best match for $n - m$ keys? Yes, we should print both the exact matches and the best matches according to the keys. Could you share an example?

If I understood your question correctly, then an input as follows:

model.variables.search(["Electrolyte concentration", "Electrolite concentration"])

should return something, in my opinion, like:

Results matched against 'Electrolyte concentration' in search:
Electrolyte concentration

Partial match for 'Electrolite concentration' in search:
Best matches are ['Electrolyte concentration', 'Electrode potential']

We can figure out the best way to display the output later. There is also a case to be made to say that this improvement to the search functionality to accept multiple strings means that the result is returned for only the string that does return a match, but we are not really implementing a search engine, so I feel it is acceptable to have all results for all input strings (as if we are looping over them in the search). @brosaplanella, what do you think?

@medha-14
Copy link
Copy Markdown
Contributor Author

For now i have modified the method to search for exact matches having all the search_keys if no such matches are found it gives search results for each term individually.

model.variables.search(["electrolyte", "concentration"])

Since both terms are present together in a single key, the result will be:

 Results for 'Electrolyte concentration': ['Electrolyte concentration']

For the cases where there are no such matches it will iterate over each string individually and give results as such:

model.variables.search(["RandomKey", "elecrtolyte concentration","electrolite"])

will give results as:

No matches found for 'RandomKey'.
Exact matches for 'electrolyte concentration': ['Electrolyte concentration [Molar]', 'Electrolyte concentration [mol.m-3]']
No exact matches found for 'electrolite'. Best matches are: ['Electrolyte potential [V]']

Copy link
Copy Markdown
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. It's looking good, but I tested with a few adversarial inputs – we should raise a helpful error where applicable for them, and an associated test case that catches it as well:

For example, for:

model.variables = {
    "Concentration [mol.m-3]": 0,
    "Surface concentration [mol.m-3]": 1,
    "Flux [mol.m-2.s-1]": 2,
}

model.variables.search([""]) returns Results for '': ['Concentration [mol.m-3]', 'Flux [mol.m-2.s-1]', 'Surface concentration [mol.m-3]'], but it should ask the user to input a non-empty string instead.

Another case, here I tried the following (with the same model.variables):

In [4]: model.variables.search(["abcd", "concentr"])
No matches found for 'abcd'.
Exact matches for 'concentr': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

In [5]: model.variables.search(["abcd", "concent"])
No matches found for 'abcd'.
Exact matches for 'concent': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

In [6]: model.variables.search(["abcd", "concen"])
No matches found for 'abcd'.
Exact matches for 'concen': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

In [7]: model.variables.search(["abcd", "conce"])
No matches found for 'abcd'.
Exact matches for 'conce': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

In [8]: model.variables.search(["abcd", "conc"])
No matches found for 'abcd'.
Exact matches for 'conc': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

In [9]: model.variables.search(["abcd", "con"])
No matches found for 'abcd'.
Exact matches for 'con': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

In [10]: model.variables.search(["abcd", "co"])
No matches found for 'abcd'.
Exact matches for 'co': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

In [11]: model.variables.search(["abcd", "c"])
No matches found for 'abcd'.
Exact matches for 'c': ['Concentration [mol.m-3]', 'Surface concentration [mol.m-3]']

and the last few of these don't really make a lot of sense. However, this case is wrong on the main branch as well, so it probably doesn't need to be addressed in this PR itself. The best approach would be to use difflib to determine what part of the search string corresponds to at least a significant value (maybe 50%?) of the search results. Please feel free to take this up in a follow-up PR if you'd like to.

So, I'm happy to approve once we manage the empty string case (even that is currently failing on the main branch, but it's quite easy to handle). Thanks for your work!

@medha-14
Copy link
Copy Markdown
Contributor Author

Thank you for the detailed review! I’ve modified the method to handle the empty string case . I’ve also added a tests to cover the same.Regarding the issue with partial matches and using difflib to refine the search logic, I’ll be taking that up in a separate PR.

arjxn-py
arjxn-py previously approved these changes Dec 17, 2024
Copy link
Copy Markdown
Member

@arjxn-py arjxn-py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome Work @medha-14, just a small question and a comment for your help. Otherwise it looks good :)

Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
@medha-14 medha-14 dismissed stale reviews from arjxn-py and agriyakhetarpal via 3a1b130 December 18, 2024 05:02
@medha-14
Copy link
Copy Markdown
Contributor Author

Thanks everyone for the detailed reviews and suggestions, I have made the necessary changes accordingly. Please take a look and let me know if anything else needs attention.

Comment thread src/pybamm/util.py Outdated
Saransh-cpp
Saransh-cpp previously approved these changes Dec 18, 2024
Copy link
Copy Markdown
Member

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @medha-14! This looks amazing!

Most of the suggestions below are "good practices" and they should be applied to the PR. Approving this as I am on holiday from tomorrow, and I don't want my review to block the merge :)

Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py Outdated
@kratman
Copy link
Copy Markdown
Contributor

kratman commented Dec 18, 2024

@medha-14 Thanks for working on this, I will re-review after you finish @Saransh-cpp's suggestions

@medha-14 medha-14 dismissed stale reviews from Saransh-cpp and agriyakhetarpal via 6f36d36 December 19, 2024 05:28
Comment thread src/pybamm/util.py Outdated
Comment thread src/pybamm/util.py
@kratman
Copy link
Copy Markdown
Contributor

kratman commented Dec 26, 2024

I will merge this after the tests pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow dictionary search to take multiple substrings

6 participants