Skip to content

@eiennohito eiennohito released this Mar 14, 2018 · 77 commits to master since this release

Here is a second pre-release of Juman++V2.
The main focus was to get non-core corpora (e.g. web blog text) analysis more stable.

There should not be more serious features or modifications before the next non-rc release, but we want to fix some dictionary inconsistencies before making the final release.

New Features

  • Windows support! Big thanks to @DoumanAsh! Vista+, XP is NOT supported. Builds with MSVC 2017 and gcc-mingw64 (we are testing those platforms on the internal CI), probably should build with MSVC 2015, but I haven't tried. No binaries yet, but you can help us by creating an installer.
  • Can now output to file with -o or --output.
  • --segment now outputs a space-delimited segmentation result without other information. You can also change the delimiter with --segment-separator flag.
  • --partial-input treats input as partially annotated and tries to produce analysis result with restrictions specified by partial annotation.
  • --auto-nbest automatically changes beam widths (local, global left) and lattice output size depending on the input length.

Model Stability

Models should be significantly more robust for analyzing random web text than earlier.

Assets 3
You can’t perform that action at this time.