Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.Attic working on pos tagger -- in progress Jul 20, 2018
3rdParty updated submodules Jan 17, 2019
cmake modified build rules for OSX, no pthread dependency needed Jun 1, 2018
dist added new library to markup tags selected by expressions with additio… Jan 12, 2019
doc updated version in doxygen.conf Aug 8, 2018
include added constructor for markup document tag definitions Jan 17, 2019
resources better unicode word tokenizer Dec 12, 2016
scripts make conversion of diacritical characters more aggressiv Feb 28, 2017
src fix bug in markup document tags: in case of matching tag that has an … Jan 17, 2019
tests fixed typo leading to a bug in detection of UTF-32be encodings, test … Jan 17, 2019
.gitignore changed definition of error codes in strusbase (no structure anymore) Mar 12, 2018
.gitmodules textwolf as submodule Apr 22, 2017
.travis.yml travis build: exclude gcc on OSX Sep 23, 2017
AUTHORS added notes to CONTRIBUTORS and AUTHORS file Apr 8, 2016
CHANGELOG version 0.16.0 May 11, 2018
CMakeLists.txt use new file locator interface of strus base, delete own implementati… Jun 6, 2018
CONTRIBUTING renamed CONTRIBUTORS file to CONTRIBUTING for compatibility with CLAs… Nov 10, 2017
INSTALL.FreeBSD removed FreeBSD for SPARC Dec 29, 2017
INSTALL.OSX updated note on gettext if using howebrew on OSX Dec 15, 2017
INSTALL.OpenBSD removed some old cpack artifacts Sep 17, 2016
INSTALL.Ubuntu.md temporary checkin Aug 8, 2018
LICENSE renamed COPYING to LICENSE Apr 13, 2016
LICENSE.3rdParty textwolf as submodule Apr 22, 2017
README updated README Dec 31, 2017
TODO updated TODO Apr 20, 2017
WELCOME working on packaging -- in progress Feb 18, 2015

README

Library for building the document analysis for information retrieval engines.
Used as an extension of Strus.

Licenced as MPLv2 (Mozilla Public License, Version 2 - https://www.mozilla.org/en-US/MPL/2.0)
For 3rdParty licenses see LICENSE.3rdParty

The project Strus implements a set of libraries, tools for building a competitive, 
scalable search engine for text retrieval.
It is a solution for small projects as well as larger scale applications.
Strus project homepage at http://project-strus.net with articles, links, documentation.

For installation see description files INSTALL.<platform> in the top level directory of the project.

The project is built regularly with Travis (https://travis-ci.org/patrickfrey/strusAnalyzer) 
and with OpenSuse (https://build.opensuse.org/package/show/home:PatrickFrey/strusanalyzer):