Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using bibtexparser instead of pybtex for bibtex. #3740

Closed
wants to merge 19 commits into from

Conversation

prady0t
Copy link
Contributor

@prady0t prady0t commented Jan 17, 2024

Fixes #3648

Still in progress, Opened a PR so that it's easier to communicate changes and progress.

Signed-off-by: Pradyot Ranjan <99216956+pradyotRanjan@users.noreply.github.com>
@prady0t prady0t marked this pull request as draft January 17, 2024 09:02
pre-commit-ci bot and others added 4 commits January 17, 2024 09:02
Signed-off-by: Pradyot Ranjan <99216956+pradyotRanjan@users.noreply.github.com>
…tation-parser

# Conflicts:
#	pybamm/citations.py
Signed-off-by: Pradyot Ranjan <99216956+pradyotRanjan@users.noreply.github.com>
@agriyakhetarpal
Copy link
Member

Note: the style job has to pass in order for code-related tests to proceed

@prady0t
Copy link
Contributor Author

prady0t commented Jan 17, 2024

sciunto-org/python-bibtexparser#433
Response here might be helpful.

prady0t and others added 3 commits January 23, 2024 20:44
Signed-off-by: Pradyot Ranjan <99216956+pradyotRanjan@users.noreply.github.com>
@prady0t
Copy link
Contributor Author

prady0t commented Jan 23, 2024

@agriyakhetarpal I don't understand the failing test case here : https://github.com/pybamm-team/PyBaMM/actions/runs/7627904153/job/20779262646?pr=3740#step:11:2390:~:text=AssertionError%3A%20ModuleNotFoundError%20not%20raised

This is the only failing test case. Can you help me with this?

Comment on lines +99 to +100
bibtexparser = sys.modules["bibtexparser"]
sys.modules["bibtexparser"] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have to think a bit about this because it wasn't really prominent in the first read. We are using the assertRaisesRegex context manager clause to test the ModuleNotFoundError message, and pybamm.print_citations used to call the have_optional_dependency utility function for pybtex, which was removed – which means that the function is not called and subsequently an error was not raised.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the case indeed. If you feel this should be changed to some other stable optional dependency then it can also be looked into.

pybamm/citations.py Outdated Show resolved Hide resolved
Signed-off-by: Pradyot Ranjan <99216956+pradyotRanjan@users.noreply.github.com>
Signed-off-by: Pradyot Ranjan <99216956+pradyotRanjan@users.noreply.github.com>
@prady0t
Copy link
Contributor Author

prady0t commented Jan 24, 2024

I've added changes. Now only thing there is is changing output format. I might raise a PR in bibtexparser repo related to this which once merged should be easy to integrate. Meanwhile can we go with this output format?

@prady0t prady0t marked this pull request as ready for review January 24, 2024 09:20
Copy link

codecov bot commented Jan 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.60%. Comparing base (e765855) to head (c43805d).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3740   +/-   ##
========================================
  Coverage    99.60%   99.60%           
========================================
  Files          259      259           
  Lines        21270    21279    +9     
========================================
+ Hits         21186    21195    +9     
  Misses          84       84           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@arjxn-py arjxn-py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @prady0t, changes looks nice.
Some comments below -

pyproject.toml Outdated
@@ -84,7 +84,8 @@ plot = [
]
# For the Citations class
cite = [
"pybtex>=0.24.0",
"pybtex>=0.24.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed now

return
except PybtexError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does bibtexparser also have some attribute corresponding to PybtexError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't. The only thing close to exception handling in bibtexparser is this.

def _string_formatting(self, entry):
bibtexparser = have_optional_dependency("bibtexparser")
if not isinstance(entry, bibtexparser.model.Entry):
raise TypeError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage is failing as this is not covered by tests, would be nice to add some test case covering this part too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and a helpful error message would be nice as well – a TypeError in itself is not descriptive enough

@prady0t
Copy link
Contributor Author

prady0t commented Jan 30, 2024

@agriyakhetarpal
Copy link
Member

Unrelated error, I triggered a re-run which should fix it

Copy link
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @prady0t – I haven't looked at the change in functionality in detail as of now, but this PR should – in addition, perform the following changes:

  1. Replace pybtex with bibtexparser in asv.conf.json and in the optional dependencies section/table of the installation guide
  2. Replace instances of pybtex in CONTRIBUTING.md and comments in noxfile.py
  3. Remove any remaining comments referencing pybtex in pybamm/citations.py

Also, this is the output from pybamm.print_citations(), currently:

   {Array programming with NumPy} Harris, Charles R. and Millman, K. Jarrod and van der Walt, St{\'{e}}fan J. and Gommers, Ralf and Virtanen, Pauli and Cournapeau, David and Wieser, Eric and Taylor, Julian and Berg, Sebastian and Smith, Nathaniel J. and others Nature 585 7825 357--362 2020 Nature Publishing Group 10.1038/s41586-020-2649-2 

   {Python Battery Mathematical Modelling (PyBaMM)} Sulzer, Valentin and Marquis, Scott G. and Timms, Robert and Robinson, Martin and Chapman, S. Jon 10.5334/jors.309 Journal of Open Research Software Software Sustainability Institute 9 1 14 2021 

which doesn't suggest that the string parsing is working very well at this moment – I know it is unsupported functionality in bibtexparser, but we can't compromise on losing proper formatting (missing commas and periods, unneeded curly brackets, missing diacritics) – it is essential for the NumPy and PyBaMM papers, and no less for any others.

Another solution is to use something like diffrax's autocitation functionality that I had suggested before and hardcode citation strings inside __init__ methods for model and solver classes, which would mean losing out on plaintext functionality, but brings us towards having clearer outputs in standard BibTeX – which is something I prefer. @brosaplanella and @kratman, what do you think about this?

@@ -178,10 +177,10 @@ def _tag_citations(self):
for key, entry in self._citation_tags.items():
print(f"{key} was cited due to the use of {entry}")

def print(self, filename=None, output_format="text", verbose=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove this option currently. Is there a way to convert to BibTeX, and if there isn't, is it possible that such a way be devised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can convert it into bibtex.

Comment on lines -233 to -244
if output_format == "text":
citations = pybtex.format_from_strings(
self._cited, style="plain", output_backend="plaintext"
)
elif output_format == "bibtex":
citations = "\n".join(self._cited)
else:
raise pybamm.OptionError(
f"Output format {output_format} not recognised."
"It should be 'text' or 'bibtex'."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be removed unless it is necessary to do so since this counts as a breaking change. I don't mind it a lot though since this is not a commonly used feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that we can do now is having an additional option like filtered_output along with bibtex. This option can print the essential hardcoded values (inside init) which might lead to data loss but we can provide much cleaner output if that's what user is more concerned about. Additionally we can also have a option raw which outputs the parsed bibtex as is (which is out only current output).

With this we can solve both problems. Not having data loss and also providing cleaner output.

f"Citations could not be registered. If you are on Google Colab - "
"pybtex does not work with Google Colab due to a known bug - "
"https://bitbucket.org/pybtex-devs/pybtex/issues/148/. "
f"Citations could not be registered."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we can test this – pybtex is quite inactive and did not work on Google Colab, but bibtexparser might work well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try this out.

tests/unit/test_citations.py Outdated Show resolved Hide resolved
@prady0t
Copy link
Contributor Author

prady0t commented Mar 7, 2024

@agriyakhetarpal How should we move forward with this?

Copy link
Member

@arjxn-py arjxn-py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing some conflicts raised with #3866

pybamm/citations.py Outdated Show resolved Hide resolved
pybamm/citations.py Outdated Show resolved Hide resolved
pybamm/citations.py Outdated Show resolved Hide resolved
pybamm/citations.py Outdated Show resolved Hide resolved
tests/unit/test_util.py Outdated Show resolved Hide resolved
Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>
pybamm/citations.py Show resolved Hide resolved
@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Mar 7, 2024

@agriyakhetarpal How should we move forward with this?

Tagging @Saransh-cpp @kratman since they were involved in the discussions where we discussed replacing pybtex earlier – we do not yet have functionality in bibtexparser to parse BibTeX in order to receive plaintext. @prady0t is able to parse that in a hacky way to retrieve all fields, join all of the strings and return a plaintext citation, but it doesn't conform to APA/MLA/Chicago style references (of course, doing that is a mighty task): please see my review above for more.

P.S. to let go of the setuptools installation hack in the nox sessions that we had introduced earlier (see point 2 of my review), I had realised that should be fixable with

nox.options.default_venv_backend = "virtualenv"

i.e., just the one line in noxfile.py. 🙂 This is because setuptools is one of the seed packages that gets installed on virtual environment creation (along with pip and wheel) – this is for virtualenv.
OTOH and fun fact: the venv from stdlib does not add seed packages at all, and post Python 3.12 it does not even bootstrap setuptools – just pip is included (but it does add both setuptools and pip for Python 3.11 and earlier versions).

@Saransh-cpp
Copy link
Member

Hmm, using pybam.print_citations() now works for me on Google Colab.

We could maybe create a feature request inbibtexparser and keep using pybtex for now?

@agriyakhetarpal
Copy link
Member

Hmm, using pybam.print_citations() now works for me on Google Colab.

Nice!

We could maybe create a feature request inbibtexparser and keep using pybtex for now?

Yes, @prady0t has created a feature request upstream earlier: sciunto-org/python-bibtexparser#433

So, yes – we can keep using pybtex for now in this case, because the above request is still welcoming contributions (which may take an indefinite amount of time). We can add the noxfile.py change from #3740 (comment) and remove the Google Colab warning from pybamm/citations.py now.

Thanks for contributing and trying this out, @prady0t – sorry for the mess that this causes 🙂 Let's hope we can have pybtex back to action in the coming years (it is still the best option for a citations parser out there). Could you please revisit this PR and make the above changes, while reverting the rest?

@Saransh-cpp
Copy link
Member

Or, you could close this PR without deleting the branch so that we have some working bibtexparser code on GitHub (given that "squash and merge" will get rid off the intermediate code and commit history). You could create a new PR for the changes listed by @agriyakhetarpal.

@prady0t
Copy link
Contributor Author

prady0t commented Mar 11, 2024

Glad to see progress here. I'll close this PR here and open a new one with suggested changes.

@prady0t prady0t closed this Mar 11, 2024
@prady0t
Copy link
Contributor Author

prady0t commented Mar 11, 2024

Do I have to replace :

nox.options.reuse_existing_virtualenvs = True

with

nox.options.default_venv_backend = "virtualenv"

@agriyakhetarpal
Copy link
Member

Do I have to replace :

nox.options.reuse_existing_virtualenvs = True

with

nox.options.default_venv_backend = "virtualenv"

We should keep both

@prady0t prady0t mentioned this pull request Mar 11, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to an active citations parser dependency to replace pybtex
4 participants