Leading comma bibtex syntax is not supported by the parser #48

grochmal · 2014-11-12T17:16:02Z

Hi guys,

I managed to edit the example from the wiki into a valid BibTex item that is not correctly parsed by bibtexparser 0.6.0

It looks as follows (I've removed the multiline for simplicity):

@ARTICLE{Cesar2013
, author = {Jean César}
, title = {An amazing title}
, year = {2013}
, month = jan
, volume = {12}
, pages = {12--23}
, journal = {Nice Journal}
, abstract = {This is an abstract. This line should be long enough to test}
, comments = {A comment}
, keywords = {keyword1, keyword2}
}

The comma first syntax is valid in BibTex, e.g. I have a reasonably big Bibtex database in a working project and good ol' Patashnik's bibtex have no problems with it. Patashnik's parser uses a BNF coding so it does not care where lines start or end.

On the other hand bibtexparser only splits on commas at the end of the lines (seen in bparser.py), which is not true for the comma first syntax. If you change

kvs = [i.strip() for i in record.split(',\n')]

to

kvs = [i.strip() for i in record.split(',')]

At line 239 of bparser.py it seems to do the trick and parse the file correctly.

This change shall not have impact on the rest of the package as the newline is stripped in i.strip() right away, in the same list comprehension.

I have tested this change with and without multiline and with and without comma first syntax and it seems to do fine.

If no one has anything against BibTeX comma first syntax (Algol60 purists maybe?) I'll make a pull request in 24-48h.

The text was updated successfully, but these errors were encountered:

grochmal · 2014-11-17T02:26:48Z

Ops...

After some more testing my solution shows it's flaws: very often conference locations have a comma in them, e.g. "New York, US" or "London, UK", and the solution above break these lines into different (broken) key-value pairs.

Another solution is to use re.split() as follows:

kvs = [i.strip() for i in re.split(',\s*\n|\n\s*,', record)]

which overcomes the conference location problem.

Yet, I shall write a proper unit test before going forward with it. (but the pull request will come)

grochmal · 2014-11-24T19:07:00Z

I finally had the time to build the pull request, it adds handling of comma first syntax on parsing and writting BibTeX files.

The pull request is here: #49

And this closes the issue.

grochmal closed this as completed Nov 24, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leading comma bibtex syntax is not supported by the parser #48

Leading comma bibtex syntax is not supported by the parser #48

grochmal commented Nov 12, 2014

grochmal commented Nov 17, 2014

grochmal commented Nov 24, 2014

Leading comma bibtex syntax is not supported by the parser #48

Leading comma bibtex syntax is not supported by the parser #48

Comments

grochmal commented Nov 12, 2014

grochmal commented Nov 17, 2014

grochmal commented Nov 24, 2014