Encapsulation for multi-languages purpose.
-
Language identifier. The language identifier library is used to recognize languages, corresponding technical report could be accessed here.
-
Multi-language analyzerl. This is an encapsulation for icma and ijma. Language identifier is also used to decide which analyzer is used. Corresponding technical report could be accessed here.
-
Tokenizers. We also delivered several utility tokenizers for Chinese verticals. For search engine purpose, vertical portals always have different requirements on tokenization.
We've just switched to C++ 11
for SF1R recently, and GCC 4.8
is required to build SF1R correspondingly. We do not recommend to use Ubuntu for project building due to the nested references among lots of libraries. CentOS / Redhat / Gentoo / CoreOS are preferred platform. You also need CMake
and Boost 1.56
to build the repository . Here are the dependent repositories list:
-
cmake: The cmake modules required to build all iZENECloud C++ projects.
-
icma: The Chinese morphological analyzer.
-
ijma: The Japanese morphological analyzer.
The project is published under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0