Skip to content
Hans Jeuken edited this page Aug 11, 2017 · 4 revisions

What is Kamelon?

Kamelon is a versatile and fully programmable textual content parser that is extremely well suited for syntax highlighting and code folding. Out of the box it comes with about 260 syntax definitions. It is written in Perl and lives in the name space Syntax::Kamelon.

Kamelon borrows the syntax highlighting technology from the Kate text editor which is part of the KDE desktop environment. Kate contains the highlighting and code folding algorithms in XML-files. From now to XML files as syntax definitions. These syntax definitions are public domain, usually under the LGPL license. I have reverse engineered the highlighting technology of Kate and duplicated it in Perl.

Kamelon started as Syntax::Highlight::Engine::Kate (S::H::E::K), a module that has been around on the web since 2004. I abandoned this project in 2008 due to personal circumstances. At the beginning of 2016 I have picked it up again. I noticed that the editor Kate had developed since then but S::H::E::K had not. I discovered several bugs that I had not noticed in 2008, among which a nasty memory leak. I started a rewrite.

Kamelon takes a different approach than S::H::E::K. For S::H::E::K I automatically generated Perl modules from the syntax definition files and bundled them into a package. Kamelon loads the syntax definition files directly and creates a compact and efficient data structure, or if you wish a lexer, on the fly. This makes development and testing a lot easier. It also opens up a broader field of applications than just syntax highlighting and code folding.

Present stage of development

Where has Kamelon arrived at this moment? An overview of what has happened so far.

Commodities

At present we have:

A fully operational development and testing framework for this package. Albeit it lacks a versioning system. A bit more than 260 syntax definitions. They need some adjusting to work properly in the context of Kamelon. This is because I refuse to go along with a couple of design aspects that complicate matters and limit the freedom of the end user.

A builder module that compiles a syntax definition into an efficient and compact lexer. All decisions for the highlighting process that can be taken at compile time are taken here. The lexer consists of arrays of anonymous callbacks that are fired as bullets through an automated machine gun. A functional engine. There are no systemic issues. I do not know if it is 100% without errors. A diagnostics module that can show you what a loaded syntax definition file looks like internally. That's a better read than raw XML.

A debugger that monitors the highlighting process and allows you to set a watch point. It is still in an infant state but has been essential to the development so far. A graphical debugging console, with a text area that shows the highlighting results. It works as an editor, modifications in the text are re-highlighted live.

Documentation

Auuwch! There is some basic documentation from the original S::H::E::K project. But other than that I have to start from scratch.

Speed

Ever since I started developing S::H::E::K I have been running a benchmark test. It has always been the same test. Now I use it for Kamelon. I started out with an average speed of 2,100 characters per second. At present it does about 15,400 characters per second. I guess that's it for a pure Perl setup. By the way, I use a 12 year old PC with an Athlon64 X2 and OpenSuSE 13.2.

Development plan

Increase speed

My aim is to make Kamelon the fasted highlighter available. Kamelon is about four times faster than S::H::E::K. That still does not make it a fast highlighter. Perl has integrated facilities to interface with other programming languages, especially C and C++ since it is written in C. I plan to rewrite the heavy and often called methods in C++. I estimate it will become somewhere between 10 to 50 times faster.

Improve debugging

At present the debugger launches warnings when it finds something fishy in the highlighting process. And you have the ability to set a watch point So you can determine what rule in what context parsed a certain position in the text. I want to add a mechanism that allows you to step through the highlighting process. I need some tools to benchmark and time the rules. This way I can prioritize development and measure results more accurately. The debugger console can not yet handle code folding.

Deamonize it

Make Kamelon Perl independent. This can be done by encapsulating Kamelon into a deamon that listens and responds on a IO socket or TCP port. This way web servers and other software can interface with it the same way they interface with a MYSQL database.

Create format parsers

At this moment Kamelon is capable of returning the raw output of the lexers. And we have a simple facility that allows us to format into basic HTML. Development has so far been focused on getting the process right. The output side of it all has been rendered without care and love.

I want to create a basic Raw parser module that other users can build upon for their own output formatting. It wood be a good thing to have parser modules for HTML4, HTML5, PDF, Postscript, RTF, ODF, DOC etc. The code folding information in the syntax definitions can be used to actually create active folding points. Or you can use it to create a click-able index. Of cause there should be an option to show line numbers. Preferably in such a way that the user can copy text without copying them along.

Improve and add syntax definitions

Unfortunately the syntax definition for Perl contains a couple of flaws. That's bad for my ego. And we do not have lexers for Perl6 and PerlXS. I am not happy with the way the Perl syntax definition handles pod-sections in a Perl script. A good thing is that the technology behind it is sound and it can be remedied. I would like to work with the original author of the Perl syntax definition.