Skip to content

gl-yziquel/mud-parse2D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mud-parse2D

Toying with python-parse-2d.

Note: a fork of this project exists to provide a packaging of that 2D parsing library. Please use this fork as it allows importing it into the current project as a python library.

Context

Current code parses the content of a 2D text box.

There is a dearth of well structured parsing toolkits for textual data or textual programming languages with a 2D layout. One exception is python-parse-2d, which seems to work but may be, perhaps, too simplistic for our needs. So we need to check it out in some details to evaluate it.

Ideally, we would like to be able to detect two boxes randomly laid out on a page. The toody comonadic parser combinator haskell library is claimed to handle that, but it does not build out of the box anymore. It is cursorily discussed on reddit.

If you know of other 2D text parsing toolkits, please let me know. The problematics relevant to 2D parsing notably include various markdown parsings; such as markdeep for instance. The comonadic structure also appears in the ascii art to unicode parsing technology. One may also mention the 2D language Befunge which is indeed required to parse two dimensional text.

The problematic of subtler parsing technologies able to handle 2D layout does matter inasmuch as modern artificial intelligence workflows (such as in the txtai software) seem to hit a bottleneck: that of preprocessing the data coming out of ugly things such as pdf files into something that can be satisfactorily consumed by these modern artificial intelligence workflows.

Incidentally, it may be interesting to develop 2D parsing technology to create more modern alternatives to things like the par formatting tool.

Build system considerations

Caveat: the project builds out of the box, provided you have the same toolkit as me: just, fd, par, bat / batcat, hatch, tomlq for python yq, sed. If you have that, you're good to go. If you don't, leave me a note, and I'll attempt to make my setup as portable as possible.

Type just for the list of commands at your disposal to manage the project.

Not yet surveyed litterature

  • Masaru Tomita. (1989). Parsing 2-Dimensional Language. Proceedings of the First International Workshop on Parsing Technologies, 414–424. link

About

Toying around with python-parse-2d

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published