Skip to content

Implementation Information

jseutter edited this page Sep 13, 2010 · 1 revision

The current version of ofxparse uses the BeautifulSoup Python package to parse ofx files. It’s kind of a hack because BeautifulSoup was meant for scraping html, but since html and ofx documents are both sgml-based, it works. What are the implications of this?

The BeautifulSoup library will probably never move to Python 3, so ofxparse in its current incarnation will not work on Python 3. Why? BeautifulSoup uses the sgmllib python standard library module, which was deprecated in Python 2.6 because it doesn’t parse SGML properly and hardly anybody does sgml parsing these days. In Python 3 this module is gone, and the author of BeautifulSoup has no intentions to do further work on BeautifulSoup.

My intention is to eventually parse the ofx format natively in this library. This can be done by using an sgml parser someone else has written, or writing our own sgml parser. It will probably be a progression that first uses a freely available parser, then parsing it using a built-in parser.

There are two sgml parsers listed on PyPI, sgmlplus and sgmlop. sgmlop seems unmaintained. sgmlplus seems to marginally work.

Clone this wiki locally