Here is a second pre-release of Juman++V2.
The main focus was to get non-core corpora (e.g. web blog text) analysis more stable.
There should not be more serious features or modifications before the next non-rc release, but we want to fix some dictionary inconsistencies before making the final release.
- Windows support! Big thanks to @DoumanAsh! Vista+, XP is NOT supported. Builds with MSVC 2017 and gcc-mingw64 (we are testing those platforms on the internal CI), probably should build with MSVC 2015, but I haven't tried. No binaries yet, but you can help us by creating an installer.
- Can now output to file with
--segmentnow outputs a space-delimited segmentation result without other information. You can also change the delimiter with
--partial-inputtreats input as partially annotated and tries to produce analysis result with restrictions specified by partial annotation.
--auto-nbestautomatically changes beam widths (local, global left) and lattice output size depending on the input length.
Models should be significantly more robust for analyzing random web text than earlier.