Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parseFully() may still abort parsing #26

Open
tillneum opened this issue Oct 1, 2021 · 4 comments
Open

parseFully() may still abort parsing #26

tillneum opened this issue Oct 1, 2021 · 4 comments

Comments

@tillneum
Copy link

tillneum commented Oct 1, 2021

LaTexParser.parseFully() is not very robust yet. Yes, it continues parsing when it encounters non-matching "{", but still aborts parsing in the following situations:

  • non-matching "}"
  • References to undefined @string declarations, like journal = siammd (value without braces and quotation marks)

Tested with library versions 1.0.15 and 1.0.19.

@vruusmann
Copy link
Member

The JBibTeX project relies on parsers that are auto-generated based on grammar files.

I'm not sure if it's possible to implement very "deep" error detection and recovery this way at all. There's probably a need for multi-layer/multi-tier parse approach, where the incoming BibTeX data is first segmented using some very robust technology (regexes?), and then these segments are parsed using JBibTeX parsers.

non-matching "}"

That's a deep structural flaw in the BibTeX data.

If you open your BibTeX database file in a BibTeX-aware text editor, does it perform syntax highlighting correctly?

References to undefined @string declarations

Shouldn't this be caught/handled programmatically by this method (or similar)?
https://github.com/jbibtex/jbibtex/blob/1.0.19/src/main/javacc/bibtex.jj#L93-L107

@tillneum
Copy link
Author

tillneum commented Oct 1, 2021

I agree that non-matching "}" are a deep structural flaw, just like non-matching"{". Unfortunately such errors occur quite often in user data files. If the library is able to deal with one of these errors then it would be nice if it could deal with the other one, too.

The problem with references to undefined @string references is that the ObjectResolutionException is not caught, and that parseFully() does not just skip the problematic entry but aborts the total parser run.

Citing your project description:

#parseFully(Reader). Error recovery mode. The parser skips an erroneous object definition and continues with the next object definition. The list of error conditions can be accessed via #getExceptions().

The Javadoc of parseFully() is more cautious ;-)

The parser does its best to recover from typical error conditions by skipping the problematic object definition.

@vruusmann
Copy link
Member

The problem with references to undefined @string references is that the ObjectResolutionException is not caught.

Got that - the parser component does not respect the exceptions raised by itself.

Do you have a small self-contained BibTeX database, which could be used for unit testing purposes?

@tillneum
Copy link
Author

tillneum commented Oct 1, 2021

This should do:

%% @STRING{siamdm = "SIAM Journal on Discrete Mathematics"}

@ARTICLE{AleDji96,
	AUTHOR = "L. G. Aleksandrov and H. N. Djidjev",
	TITLE = "Linear Algorithms for Partitioning Embedded Graphs of Bounded Genus",
	JOURNAL = siamdm,
	YEAR = 1996,
	month = "",
	volume = 9,
	number = "",
	pages = "129--150",
	copy = ""
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants