-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Darren Hiebert
committed
Nov 2, 2001
1 parent
dc8eabf
commit 759ae2c
Showing
87 changed files
with
24,049 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
configure | ||
config.h.in | ||
ctags.html | ||
.o | ||
.od | ||
ctags | ||
dctags |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,212 @@ | ||
<html> | ||
<head> | ||
<title>Exuberant Ctags: Adding a new parser</title> | ||
</head> | ||
<body> | ||
|
||
<h1>How to Add a New Parser to Exuberant Ctags</h1> | ||
|
||
<b>Exuberant Ctags</b> has been designed to make it very easy to add your own | ||
custom language parser. | ||
|
||
<h2>Operational background</h2> | ||
As ctags considers each file name, it tries to determine the language of the | ||
file by applying the following three tests in order: if the file extension has | ||
been mapped to a language, if the file name matches a shell pattern mapped to | ||
a language, and finally if the file is executable and its first line specifies | ||
an interpreter using the Unix-style "#!" specification (if supported on the | ||
platform). If a language was identified, the file is opened and then the | ||
appropriate language parser is called to operate on the currently open file. | ||
The parser parses through the file and whenever it finds some interesting | ||
token, calls a function to define a tag entry. | ||
|
||
<h2>Integrating a new parser</h2> | ||
|
||
Let's assume that I want to add support for my new language, <em>Swine</em>, | ||
the successor to Perl (i.e. Perl before Swine <wince>). | ||
|
||
<p> | ||
First, I create a new module, <code>swine.c</code>, and add one externally | ||
visible function to it, <code>extern parserDefinition *SwineParser(void)</code>, | ||
and add its name to the table in <code>parsers.h</code>. The job of this | ||
parser definition function is to create an instance of the | ||
<code>parserDefinition</code> structure (using <code>parserNew()</code>) and | ||
populate it with information defining how files of this language are | ||
recognized, what kinds of tags it can locate, and the function used to invoke | ||
the parser on the currently open file. | ||
|
||
<p> | ||
The structure <code>parserDefinition</code> allows assignment of the following | ||
fields: | ||
|
||
<p> | ||
<pre> | ||
const char *name; /* name of language */ | ||
kindOption *kinds; /* tag kinds handled by parser */ | ||
unsigned int kindCount; /* size of `kinds' list */ | ||
const char *const *extensions; /* list of default extensions */ | ||
const char *const *patterns; /* list of default file name patterns */ | ||
parserInitialize initialize; /* initialization routine, if needed */ | ||
simpleParser parser; /* simple parser (common case) */ | ||
rescanParser parser2; /* rescanning parser (unusual case) */ | ||
boolean regex; /* is this a regex parser? */ | ||
</pre> | ||
|
||
<p> | ||
The <code>name</code> field must be set to a non-empty string. Also, unless | ||
<code>regex</code> is set true (see below), either <code>parser</code> or | ||
<code>parser2</code> must set to point to a parsing routine which will | ||
generate the tag entries. All other fields are optional. | ||
|
||
<p> | ||
Now all that is left is to implement the parser. In order to do its job, the | ||
parser should read the file stream using using one of the two I/O interfaces: | ||
either the character-oriented <code>fileGetc()</code>, or the line-oriented | ||
<code>fileReadLine()</code>. When using <code>fileGetc()</code>, the parser | ||
can put back a character using <code>fileUngetc()</code>. How our Swine parser | ||
actually parses the contents of the file is entirely up to the writer of the | ||
parser--it can be as crude or elegant as desired. You will note a variety of | ||
examples from the most complex (c.c) to the simplest (make.c). | ||
|
||
<p> | ||
When the Swine parser identifies an interesting token for which it wants to | ||
add a tag to the tag file, it should create a <code>tagEntryInfo</code> | ||
structure and initialize it by calling <code>initTagEntry()</code>, which | ||
initializes defaults and fills information about the current line number and | ||
the file position of the beginning of the line. After filling in information | ||
defining the current entry (and possibly overriding the file position or other | ||
defaults), the parser passes this structure to <code>makeTagEntry()</code>. | ||
|
||
<p> | ||
Instead of writing a parser, it may be possible to specify regular expressions | ||
which define the tags. In this case, instead of defining a parsing function, | ||
<code>SwineParser()</code>, sets <code>regex</code> to true, and points | ||
<code>initialize</code> to a function which calls | ||
<code>addLanguageRegex()</code> to install the regular expressions which | ||
define its tags. | ||
|
||
<p> | ||
This is all there is to it. All other details are specific to the parser and | ||
how it wants to do its job. There are some support functions which can take | ||
care of some commonly needed parsing tasks, such as keyword table lookups (see | ||
keyword.c), which you can make use of if desired (examples of its use can be | ||
found in c.c, eiffel.c, and fortran.c). Almost everyting is already taken care | ||
of automatically for you by the infrastructure. Writing the actual parsing | ||
algorithm is the hardest part, but is not constrained by any need to conform | ||
to anything in ctags other than that mentioned above. | ||
|
||
<p> | ||
There are several different approaches used in the parsers inside <b>Exuberant | ||
Ctags</b> and you can browse through these as examples of how to go about it. | ||
|
||
<h2>Examples</h2> | ||
|
||
Below you will find two example parsers. The first is for our mythical Swine | ||
parser, which provides tags for lines beginning with "<CODE>def</CODE>" | ||
followed by some name. The second is a simple regex parser for makefile | ||
macros. | ||
|
||
<pre> | ||
/*************************************************************************** | ||
* swine.c | ||
* Parser for Swine definitions | ||
**************************************************************************/ | ||
/* INCLUDE FILES */ | ||
#include "general.h" /* always include first */ | ||
|
||
#include <string.h> /* to declare strxxx() functions */ | ||
#include <string.h> /* to define isxxx() macros */ | ||
|
||
#include "parse.h" /* always include */ | ||
#include "read.h" /* to define file fileReadLine() */ | ||
|
||
/* DATA DEFINITIONS */ | ||
typedef enum eSwineKinds { | ||
K_DEFINE | ||
} swineKind; | ||
|
||
static kindOption SwineKinds [] = { | ||
{ TRUE, 'd', "definition", "pig definition" } | ||
}; | ||
|
||
/* FUNCTION DEFINITIONS */ | ||
|
||
static void findSwineTags (void) | ||
{ | ||
vString *name = vStringNew (); | ||
const unsigned char *line; | ||
|
||
while ((line = fileReadLine ()) != NULL) | ||
{ | ||
/* Look for a line beginning with "def" followed by name */ | ||
if (strncmp ((const char*) line, "def", (size_t) 3) == 0 && | ||
isspace ((int) line [3])) | ||
{ | ||
const unsigned char *cp = line + 4; | ||
while (isspace ((int) *cp)) | ||
++cp; | ||
while (isalnum ((int) *cp) || *cp == '_') | ||
{ | ||
vStringPut (name, (int) *cp); | ||
++cp; | ||
} | ||
vStringTerminate (name); | ||
makeSimpleTag (name, SwineKinds, K_DEFINE); | ||
vStringClear (name); | ||
} | ||
} | ||
vStringDelete (name); | ||
} | ||
|
||
/* Create parser definition stucture */ | ||
extern parserDefinition* SwineParser (void) | ||
{ | ||
static const char *const extensions [] = { "swn", NULL }; | ||
parserDefinition* def = parserNew ("Swine"); | ||
def->kinds = SwineKinds; | ||
def->kindCount = KIND_COUNT (SwineKinds); | ||
def->extensions = extensions; | ||
def->parser = findSwineTags; | ||
return def; | ||
} | ||
</pre> | ||
|
||
|
||
<pre> | ||
/*************************************************************************** | ||
* make.c | ||
* Regex-based parser for makefile macros | ||
**************************************************************************/ | ||
/* INCLUDE FILES */ | ||
#include "general.h" /* always include first */ | ||
#include "parse.h" /* always include */ | ||
|
||
/* FUNCTION DEFINITIONS */ | ||
|
||
static void installMakefileRegex (const langType language) | ||
{ | ||
addLanguageRegex (language, "/(^|[ \t])([A-Z0-9_]+)[ \t]*:?=/\\2/m,macro/i"); | ||
} | ||
|
||
/* Create parser definition stucture */ | ||
extern parserDefinition* MakefileParser (void) | ||
{ | ||
static const char *const patterns [] = { "[Mm]akefile", NULL }; | ||
static const char *const extensions [] = { "mak", NULL }; | ||
parserDefinition* const def = parserNew ("Makefile"); | ||
def->patterns = patterns; | ||
def->extensions = extensions; | ||
def->initialize = installMakefileRegex; | ||
def->regex = TRUE; | ||
return def; | ||
} | ||
</pre> | ||
|
||
<hr> | ||
<a href="http:index.html"> | ||
<img align=left border=0 src="http:../images/return.gif" alt="Return"> | ||
Back to <strong>Exuberant Ctags</strong> | ||
</a> | ||
|
||
</body> | ||
</html> |
Oops, something went wrong.