Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leading comma bibtex syntax is not supported by the parser #48

Closed
grochmal opened this issue Nov 12, 2014 · 2 comments
Closed

Leading comma bibtex syntax is not supported by the parser #48

grochmal opened this issue Nov 12, 2014 · 2 comments

Comments

@grochmal
Copy link
Contributor

Hi guys,

I managed to edit the example from the wiki into a valid BibTex item that is not correctly parsed by bibtexparser 0.6.0

It looks as follows (I've removed the multiline for simplicity):

@ARTICLE{Cesar2013
, author = {Jean César}
, title = {An amazing title}
, year = {2013}
, month = jan
, volume = {12}
, pages = {12--23}
, journal = {Nice Journal}
, abstract = {This is an abstract. This line should be long enough to test}
, comments = {A comment}
, keywords = {keyword1, keyword2}
}

The comma first syntax is valid in BibTex, e.g. I have a reasonably big Bibtex database in a working project and good ol' Patashnik's bibtex have no problems with it. Patashnik's parser uses a BNF coding so it does not care where lines start or end.

On the other hand bibtexparser only splits on commas at the end of the lines (seen in bparser.py), which is not true for the comma first syntax. If you change

kvs = [i.strip() for i in record.split(',\n')]

to

kvs = [i.strip() for i in record.split(',')]

At line 239 of bparser.py it seems to do the trick and parse the file correctly.

This change shall not have impact on the rest of the package as the newline is stripped in i.strip() right away, in the same list comprehension.

I have tested this change with and without multiline and with and without comma first syntax and it seems to do fine.

If no one has anything against BibTeX comma first syntax (Algol60 purists maybe?) I'll make a pull request in 24-48h.

@grochmal
Copy link
Contributor Author

Ops...

After some more testing my solution shows it's flaws: very often conference locations have a comma in them, e.g. "New York, US" or "London, UK", and the solution above break these lines into different (broken) key-value pairs.

Another solution is to use re.split() as follows:

kvs = [i.strip() for i in re.split(',\s*\n|\n\s*,', record)]

which overcomes the conference location problem.

Yet, I shall write a proper unit test before going forward with it. (but the pull request will come)

@grochmal
Copy link
Contributor Author

I finally had the time to build the pull request, it adds handling of comma first syntax on parsing and writting BibTeX files.

The pull request is here: #49

And this closes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant