Skip to content

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Jan 19, 2020 · 32 commits to develop since this release

ginza-3.1.1

  • 2020-01-19
  • API Changes
    • Extension fields
      • The values of Token._.sudachi field would be set after calling SudachipyTokenizer.enable_ex_sudachi(True), to avoid serializtion errors
import spacy
import pickle
nlp = spacy.load('ja_ginza')
doc1 = nlp('This example will be serialized correctly.')
doc1.to_bytes()
with open('sample1.pickle', 'wb') as f:
    pickle.dump(doc1, f)

nlp.tokenizer.set_enable_ex_sudachi(True)
doc2 = nlp('This example will cause a serialization error.')
doc2.to_bytes()
with open('sample2.pickle', 'wb') as f:
    pickle.dump(doc2, f)
Assets 3
  • v3.1.0
  • 832b6e2
  • Compare
    Choose a tag to compare
    Search for a tag
  • v3.1.0
  • 832b6e2
  • Compare
    Choose a tag to compare
    Search for a tag

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Jan 15, 2020 · 47 commits to develop since this release

ginza-3.1.0

  • 2020-01-16
  • Important changes
    • Distribute ja_ginza_dict from PyPI
  • API Changes
    • commands
      • ginza and ginzame
        • add -i option to initialize the files of ja_ginza_dict
Assets 5

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Jan 14, 2020 · 51 commits to develop since this release

ginza-3.0.0

  • 2020-01-15
  • Important changes
    • Distribute ginza and ja_ginza from PyPI
      • Simple installation; pip install ginza, and run ginza
      • The model package, ja_ginza, is also available from PyPI.
    • Model improvements
      • Change NER training data-set to GSK2014-A (2019) BCCWJ edition
        • Improved accuracy of NER
        • token.ent_type_ value is changed to Sekine's Extended Named Entity Hierarchy
          • Add ENE7 attribute to the last field of the output of ginza
        • Move OntoNotes5 -based label to token._.ne
          • We extended the OntoNotes5 named entity labels with PHONE, EMAIL, URL, and PET_NAME
      • Overall accuracy is improved by executing spacy pretrain over 100 epochs
        • Multi-task learning of spacy train effectively working on UD Japanese BCCWJ
      • The newest SudachiDict_core-20191224
    • ginzame
      • Execute sudachipy by multiprocessing.Pool and output results with mecab like format
      • Now sudachipy command requires additional SudachiDict package installation
  • Breaking API Changes
    • commands
      • ginza (ginza.command_line.main_ginza)
        • change option mode to sudachipy_mode
        • drop options: disable_pipes and recreate_corrector
        • add options: hash_comment, parallel, files
        • add mecab to the choices for the argument of -f option
        • add parallel NUM_PROCESS option (EXPERIMENTAL)
        • add ENE7 attribute to conllu miscellaneous field
          • ginza.ent_type_mapping.ENE_NE_MAPPING is used to convert ENE7 label to NE
      • add ginzame (ginza.command_line.main_ginzame)
        • a multi-process tokenizer providing mecab like output format
    • spaCy field extensions
      • add token._.ne for ner label
    • ginza/sudachipy_tokenizer.py
      • change SudachiTokenizer to SudachipyTokenizer
      • use SUDACHI_DEFAULT_SPLIT_MODE instead of SUDACHI_DEFAULT_SPLITMODE or SUDACHI_DEFAULT_MODE
  • Dependencies
    • upgrade spacy to v2.2.3
    • upgrade sudachipy to v0.4.2
Assets 5
  • v2.2.1
  • b39b98b
  • Compare
    Choose a tag to compare
    Search for a tag
  • v2.2.1
  • b39b98b
  • Compare
    Choose a tag to compare
    Search for a tag

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Oct 27, 2019 · 78 commits to develop since this release

ginza-2.2.1

  • 2019-10-28
  • Improvements
    • JapaneseCorrector can merge the as_* type dependencies completely
  • Bug fixes
    • command line tool failed at the specific situations
Assets 3
  • v2.2.0
  • fe6b9cc
  • Compare
    Choose a tag to compare
    Search for a tag
  • v2.2.0
  • fe6b9cc
  • Compare
    Choose a tag to compare
    Search for a tag

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Oct 4, 2019 · 94 commits to develop since this release

ginza-2.2.0

  • 2019-10-04, Ametrine
  • Important changes
    • split_mode has been set incorrectly to sudachipy.tokenizer from v2.0.0 (#43)
      • This bug caused split_mode incompatibility between the training phase and the ginza command.
      • split_mode was set to 'B' for training phase and python APIs, but 'C' for ginza command.
      • We fixed this bug by setting the default split_mode to 'C' entirely.
      • This fix may cause the word segmentation incompatibilities during upgrading GiNZA from v2.0.0 to v2.2.0.
  • New features
    • Add -f and --output-format option to ginza command:
    • Add custom token fields:
      • bunsetu_index : bunsetu index starting from 0
      • reading: reading of token (not a pronunciation)
      • sudachi: SudachiPy's morpheme instance (or its list when then tokens are gathered by JapaneseCorrector)
  • Performance improvements
    • Tokenizer
      • Use latest SudachiDict (SudachiDict_core-20190927.tar.gz)
      • Use Cythonized SudachiPy (v0.4.0)
    • Dependency parser
      • Apply spacy pretrain command to capture the language model from UD-Japanese BCCWJ, UD_Japanese-PUD and KWDLC.
      • Apply multitask objectives by using -pt 'tag,dep' option of spacy train
    • New model file
      • ja_ginza-2.2.0.tar.gz
Assets 5

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Jul 7, 2019 · 123 commits to develop since this release

ginza-2.0.0 (2019-07-08)

  • Add ginza command
    • run ginza from the console
  • Change package structure
    • module package as ginza
    • language model package as ja_ginza
    • spacy.lang.ja is overridden by ginza
  • Remove sudachipy related directories
    • SudachiPy and its dictionary are installed via pip during ginza installation
  • User dictionary available
  • Token extension fields
    • Added
      • token._.bunsetu_bi_label, token._.bunsetu_position_type
    • Remained
      • token._.inf
    • Removed
      • pos_detail (same value is set to token.tag_)
Assets 7

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Jul 7, 2019 · 123 commits to develop since this release

v2.2.1

Assets 5
  • v1.0.2
  • dd2a1ea
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.0.2
  • dd2a1ea
  • Compare
    Choose a tag to compare
    Search for a tag

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Apr 7, 2019 · 178 commits to master since this release

v1.0.2
Assets 3
  • v1.0.1
  • b8be4fb
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.0.1
  • b8be4fb
  • Compare
    Choose a tag to compare
    Search for a tag

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Apr 2, 2019 · 191 commits to master since this release

Merge pull request #10 from megagonlabs/develop

v1.0.1
Assets 3
  • v1.0.0
  • ea0cfd8
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.0.0
  • ea0cfd8
  • Compare
    Choose a tag to compare
    Search for a tag

@hiroshi-matsuda-rit hiroshi-matsuda-rit released this Apr 1, 2019 · 197 commits to master since this release

Merge pull request #7 from megagonlabs/develop

refine MDs
Assets 3
You can’t perform that action at this time.