Skip to content

pinard/parsemacs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

parsemacs — Emacs Lisp parser (temporary)

This parsemacs project is exploratory, and temporary. It contains the skeleton of an Emacs Lisp parser written in Python, and is meant to be included in another project once ready.

Description

For a good while now, I’ve written many ad hoc Python scripts for reading Org files (Org is a wonderful tool available as an Emacs mode). Org version 8.0 introduces a so-called new exporter, in which the way to parse an Org file has been standardized. Previously, each exporter used its own Org file parser, and exporters were not all agreeing on how to understand an Org file. In the new scheme, all exporters have been rewritten to use a common parser, and the Org format has also been documented more precisely.

However, the parser is written in Emacs Lisp, while my applications are usually written in Python, and I’ve been pondering the idea of writing a Python parser which would be as close as possible with the genuine, Emacs Lisp parser which comes with Org 8.0. On the evening of [2013-04-25 jeu], I enjoyed a sympathetic Python coding fiesta, and decided to use that evening to explore what it would mean to write that Org format Python parser.

I’ve been rather surprised by the richness and volume of the specification, and quickly understood it is not going to be a simple job to do correctly. Even a straightforward Emacs Lisp to Python rewriting of the parser is more daunting that I roughly estimated.

One of the participants suggested that I implement a very small subset of the specification and complete it incrementally as the need arises, but if I take that route, it would surely take forever to complete. Another problem is that the Emacs Lisp parser surely may be amended and corrected: it might be difficult to really follow that evolution.

Thinking around these problems, it came to my mind that I should humbly take lessons from how Pascal TeX was translated to C, long ago. That is: some tool does the translation mechanically, yet with limited cleverness, with some mechanism to replace or amend sections with human-crafted translation, whenever the automatic translation falls short of working correctly, for speed or algorithmic considerations, or otherwise. For my actual need, the tool would read Emacs Lisp and produce Python code.

This parsemacs project is just an exploratory step about reading Emacs Lisp from within Python. I’m not much trying to make it overly general, as the parser is surely going to be adapted to the specific needs described above, when they will get more precise. Yet, as it stands, it can read all of Emacs 23, and the development versions of Gnus and Org mode, save for Lisp files encoded with euc-japan or iso-2022-7bit (I do not know how to decipher these from Python).

Moreover, Org mode is intimately tied to Emacs Lisp at many places, so maybe parsemacs may get other uses as well, later in the project.

François

P.S. I should also mention the Orgnode project, which is a bit less attractive to me, as it is not currently based on the new exporter.

Exchange with Karl Voit on [2014-01-06 lun]

> I did not get the impression that [ply] is a parsing engine that is > done the Python way.

PLY has pros and cons. SPARK[1] always attracted me as being more elegant. While it accepts a wider set of grammars than PLY, SPARK can become quite slow on grammars which are less “natural” (admittedly a very fuzzy, subjective term). For simpler grammars, recursive descent does the job at good enough speed, and often, grammars can be rearranged a bit so the lexer could cleverly help the parser. Of course, it looks like more work writing a recursive descent parser, yet many times in my experience, the programmer is amply repaid with simplicity and clarity.

>> You don’t need a Lisp interpreter written in Python, only Python >> code that understands org syntax without getting confused.

> if you are going to use a ELISP interpreter to parse Org-mode syntax > for Python, this should completely re-use the original Org-parser > and nothing else. I have no idea if this is possible or not. If > you have to implement a parser on your own, you probably should > stick to Python-only.

Hey hey, it’s fun! :-) You misunderstood me, but this is constructive actually, as you raise good points. In my dreams, a pure Python parser parses Org mode files. However, here and there in the parsed files, as data, we can see bits of Emacs Lisp code, or even Calc syntax at some places. That Emacs Lisp code could be mere constants or identifiers, but sometimes more complex, evalable S-expressions.

A parser is probably of limited use if it does not come with some extra-tools covering most frequent use cases around the syntax, and I guess that pressure will develop to have some kind of Emacs Lisp interpreter, hardly complete, probably only mild or even ridiculous.

The interesting idea in your comments is that, if we had an Emacs Lisp interpreter of serious quality, that interpreter could use “the original Org-parser and nothing else”. That would solve maintenance, as the parser would be wholly external, to be found in Org mode distribution, all standard. But this avenue is quite unlikely: it looks like a major undertaking to me, and while such a parser would be useful on small data excerpts within an Org file, it might be inordinately slow if it had to interpret a lot of Lisp code while deciphering big Org files.

Worse, keeping a Python parser in sync with the true Emacs Lisp parser would require much energy, maybe only once in a while, but extended over a long period of time. Unless a great enthusiasm exists, distributed on many people, such projects are always doomed to fail. Not many people are ready to commit themselves for life in the required maintenance.

François


[1] http://pages.cpsc.ucalgary.ca/~aycock/spark/

About

Emacs Lisp parser in Python (temporary)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages