-
Notifications
You must be signed in to change notification settings - Fork 1
Document parser library based on libxml2
License
karpet/libswish3
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
libswish3 ============================================================================ libswish3 is a document parser compatible with the Swish-e 2.4 -S prog API. libswish3 is a C library for parsing documents into a data structure that can then be stored and searched with a variety of IR backends. The libswish3 model is simple: you define a handler function and pass a pointer to that function to whichever parser input function you choose (in-memory, stdin, or filesystem). After each document is parsed, your handler function is passed the swish_ParserData object to do with as it pleases, most likely storing the data in an index. See the swish_lint.c program for how you might write a handler function. Differences from Swish-e 2.4 parser -------------------------------------- The following Swish-e configuration options are not supported as part of libswish3. It is up to the code that uses libswish3 to implement the features these configuration options represent. # Delay # EquivalentServer # FileFilter # FileFilterMatch # FileMatch # FileRules # FileRules # MaxDepth # ReplaceRules # SpiderDirectory # TmpDir Getting Started ----------------- See the INSTALL doc. But basically: ./bootstrap (only necessary if you are doing development) ./configure make && make test sudo make install Profiling with gprof --------------------------- The default Makefile.am files include the -pg flag in order to get profiling information. You should remove the -pg before compiling for a production (i.e., non-development) system. See the gprof man page. Basically: gprof .libs/swish_lint should give the 'make test' profile.
About
Document parser library based on libxml2
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published