-
Notifications
You must be signed in to change notification settings - Fork 0
License
gitGNU/gnu_trie-gen
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Trie-Gen ~~~~~~~~ Stefan Vargyas, stvar@yahoo.com Jul 15, 2016 Table of Contents ----------------- 0. Copyright 1. The Trie-Gen Program 2. Configuring Trie-Gen 3. Building and Testing Trie-Gen 4. References 5. Appendix 0. Copyright ============ This program is GPL-licensed free software. Its author is Stefan Vargyas. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. You should have received a copy of the GNU General Public License along with this program (look up for the file COPYING in the top directory of the source tree). If not, see http://gnu.org/licenses/gpl.html. 1. The Trie-Gen Program ======================= The program came to existence upon a need for a small tool capable to generate C++ code for string look-up functions defined on small sets of keywords (see the Appendix for an edifying example). The generated code would have been integrated into a larger program that itself generates and processes a certain kind of abstract syntax trees. For solving the problem I relied on a well-known data structure -- the trie -- and on its associated algorithms. A standard reference for tries is [2]; follow also the references therein. 2. Configuring Trie-Gen ======================= The Trie-Gen program is written in modern C++ (ISO/IEC 14882:2011 compliant, aka C++11) and was developed under a GNU/Linux environment using the GCC C++ compiler v4.8.0 [1] and a few common GNU/Linux power-tools. While writing C++ code, I regularly pay a good deal of attention making as explicit as possible each of the low-level choices assumed. To that effect, I was reusing an useful header file 'config.h' which collects a handful of relevant configuration parameters: $ grep -how 'CONFIG_[A-Z0-9_]*[A-Z0-9]' src/config.h|sort -u CONFIG_ARITHMETIC_SHIFT_MACHINE CONFIG_BITOPS_SIGN_AND_BITS_MACHINE CONFIG_MEM_ALIGNOF CONFIG_PTR_TO_INT_IDENTOP CONFIG_TWOS_COMPLEMENT_MACHINE CONFIG_VA_END_NOOP From the above list, only one of the parameters was actually used: $ grep -how 'CONFIG_[A-Z0-9_]*[A-Z0-9]' src/*.?pp|sort -u CONFIG_VA_END_NOOP Before proceeding further to build 'trie', please assure yourself that the above parameter does have a meaningful value, in accordance with the target platform (machine and compiler) on which the program is supposed to be built. 3. Building and Testing Trie-Gen ================================ In the development of 'trie', I used an experimental GCC v4.8.0 (built from sources) and GNU make v3.81. If using a non-GNU make, then is very likely that the makefiles 'common.mk' and 'Makefile' would need a bit of tweaking. Note that a prerequisite of building the program is that the version of GCC be at least the above v4.8.0. That is because the program requires a substantial amount of C++11 features be implemented by the compiler and by its Standard C++ Library. Even though did not investigated thoroughly, I know no impediment using more recent versions of the tools named above. To build the program, simply issue the following command: $ make Expect to get neither warnings nor errors out of 'make'. If everything went OK, 'make' is supposed to have produced the binary 'trie'. To clean-up the directory containing the source files of generated object files or, alternatively, to remove all the files obtained from 'make', issue one of the following commands respectively: $ make clean $ make allclean After building the program by running 'make' successfully, first thing to do is to see the brief help info provided by the program itself: $ ./trie --help The received package comes along with a complete test-suite. For that look into the directory 'test' for shell script files of form 'test*.sh'. The main script is 'test.sh': it starts the whole regression testing process. Note that these bash shell scripts depend upon a few utility programs commonly found on every GNU/Linux installation. The 'test' directory contains additionally the source from which the 'test*.sh' bash scripts were generated: the file 'test.txt'. The shell script 'test/test.sh' can be invoked simply by issuing: $ make test test: NAME RESULT ... NAME is the name of the test case and RESULT is either 'OK' or 'failed'. The expected behaviour would be that of all test cases to succeed. In the case of things going the wrong way for a particular test case, more verbose output is obtainable when running the corresponding 'test-*.sh' shell script on its own. It will produce a diff between expected and actual output of 'trie'. Note that any user's explicit invocation of these bash test scripts must be initiated from within the 'test' directory. The programs used by the testing scripts 'test/test*.sh' are the following: * GNU bash 3.2.51, * GNU diff 2.8.7 (part of diffutils package), * GNU awk 3.1.8, and * GNU sed 4.1.5. 4. References ============= [1] GNU Compilers http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/ [2] Robert Sedgewick & Philippe Flajolet: An Introduction to the Analysis of Algorithms, 2nd ed. Addison-Wesley, 2013, ISBN: 978-0-321-90575-8 Chapter 8: Strings and Tries, pp. 448-471 5. Appendix =========== For the following set of keywords (which is part of a longer list of AST node names describing a certain subset of python language): 1 AndTest 2 ArgsCallExpr 3 KeyDatum 4 KeyDatumDictExpr 5 KeyDatumList the (compacted) trie structure is: { "A" { "ndTest": 1 "rgsCallExpr": 2 } "KeyDatum": 3 { "DictExpr": 4 "List": 5 } } and the corresponding 'lookup' function might look like the one below: int lookup(const char* p) { switch (*p ++) { case 'A': switch (*p ++) { case 'n': if (*p ++ == 'd' && *p ++ == 'T' && *p ++ == 'e' && *p ++ == 's' && *p ++ == 't' && *p == 0) return 1; return error; case 'r': if (*p ++ == 'g' && *p ++ == 's' && *p ++ == 'C' && *p ++ == 'a' && *p ++ == 'l' && *p ++ == 'l' && *p ++ == 'E' && *p ++ == 'x' && *p ++ == 'p' && *p ++ == 'r' && *p == 0) return 2; } return error; case 'K': if (*p ++ == 'e' && *p ++ == 'y' && *p ++ == 'D' && *p ++ == 'a' && *p ++ == 't' && *p ++ == 'u' && *p ++ == 'm') { if (*p == 0) return 3; switch (*p ++) { case 'D': if (*p ++ == 'i' && *p ++ == 'c' && *p ++ == 't' && *p ++ == 'E' && *p ++ == 'x' && *p ++ == 'p' && *p ++ == 'r' && *p == 0) return 4; return error; case 'L': if (*p ++ == 'i' && *p ++ == 's' && *p ++ == 't' && *p == 0) return 5; } } } return error; }
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published