Skip to content

arekit-0.22.1

Compare
Choose a tag to compare
@nicolay-r nicolay-r released this 06 Sep 08:36
· 265 commits to master since this release

Release Notes 🎉

arekit-21-1-0-s

Full Changelog

WHAT'S NEW:

  • 📓 Provide BRAT-based reader (refactoring) of documents and mentioned entities in it! 🥳
  • 🔧 Provide verbose treatment of values for SynonymsCollection (#327)
  • 🔧 Fixed embedding issues for Entity type for neural networks (#308)
  • 🔧 Refactoring RuSentRel reader, which is now repesents an ontop build over BRAT. (#287)
  • 🔧 Attitude annotation performed on a fly within a pipeline! (#281)
  • 🔧 Opinion annotation does not depend on the experiment (#250)
  • 🔧 #347
  • 🆕 added utils contrib part and there were moved 🥳
    - evaluation (2-3 scale)
    - cv-splittings (#324)
    - entity formatters
    - synonyms collections templates: stemmer-based
    - experiment handlers (#325)
    - np_utils -- utils to interact with np-serialized data (#348)
    - pipelines ➿ for opinions extraction and data serialization, text processing: we are now able to declare a custom pipeline and adopt serialization for a variety of RE tasks
    (#322),
    (#326)
    (#351)
  • 🆕 API for conversion of external text_opinions into parsed_news (#338)
  • 🆕 API for a variety of pipelines for data preparation, depending on DataType (#343)
  • 🆕 DataType now includes Dev and Etalon by default (#345)
  • 🆕 Evaluation refactoring, and support TextOpinion level results evaluation (#355)
  • 🗑️ experimential_rusentrel contrib part removed (#321)
  • 🗑️ OpinionRowsProvider should be removed [ARElight backlog] (#282)
  • fixed: #356

Implemented enhancements:

  • RuSentiFrames stat -- move script from source to the related UnitTest dir #391
  • Vocabulary for Embedding -- save it in .txt format. #388
  • BratSentence -- entities should be initialized via parameter #383
  • ModelIO -- move vocab and embedding related API to EmbeddingIO #382
  • BERT -- formatter differs only in TextB. #381
  • Provide JSON writer for OpenNRE library #378
  • ExperimentSerializationContext -- some parameters might be optional [Remove them] #369
  • ExperimentSerializationContext -- Annotator property is not used. #368
  • DocumentOperations -- iter_doc_ids actually wraps the ExperimentContext functionality #367
  • iter_tagget_doc_ids -- this might be treated as iter_doc_ids of an another instance #366
  • ExperimentIterationHandler -- switch to the PipelineItem for NN and BERT serialization [Remove ExperimentEngine and ExperimentHandler] #365
  • FixedFolding -- intersected parts are not supported [NIVTS project backlog] #364
  • InputDataSerializationHelper -- refactoring #362
  • exp_io.balance_samples-- remove Dependency from DataType.Train #360
  • NeuralNetwork -- for the fine-tunning it is impossible to pick a default embedding/vocabulary. #359
  • Evaluation -- support results evaluation for TextOpinion #355
  • DefaultOpinionAnnotator -- etalon_opinion logic might be moved outside [Remove DataType dependency, backlog] #354
  • StatesCount, StateIndex and iter_states of BaseDataFolding -- this is a part of CV-based method #353
  • Evaluator refactoring #352
  • Processing module -- Multiple Languages Scaling [Eng/Rus] [Contents Relocation] #351
  • ExperimentContext -- remove Evaluator from the base class. #349
  • np_utils -- move from networks to utils contrib part #348
  • StringWithEmbeddingNetworkTermMapping -- has hard-coded algorithms for tokens and terms embedding creation. #347
  • Existed in Embedding -- log (remove print) #346
  • DataType -- provide Dev and Etalon default types [QUICK fix] #345
  • Data Serialization -- update API that allow to provide a particular pipeline processor for each DataType [Backlog] #343
  • Model io utils -- move into contrib part #342
  • Engine -- provide states iterator as a parameter instead of DataFolding #341
  • Brat -- provide stability #340
  • BaseParsedNewsServiceProvider -- support conversion from Entity to DocumentEntity #338
  • OpinionEntityType -- this should be generalized #335
  • BratTextEntitiesParser and StringPartitioning -- nested entities are not supported. [Temp fix] #334
  • RuAttitudesLabelConverter -- required only for conversion (not for parsing) #332
  • SentenceOpinion -- no need to store entity values #331
  • Utils -- provide opinion converters from brat #330
  • RuAtttitudes -- move SentenceOpinion to brat #329
  • BratEntityCollectionHelper -- extract_entities considering for rows prefixed with T #328
  • SynonymsCollection -- value_to_group_id_func does not support expansion by default. #327
  • BERT and Network Serialization -- refactoring duplicated serialization implementations #322
  • exp_joined -- removed such experiment at experiment_rusentrel contrib #321
  • rusentrel_experiment -- organize a separated python project #320
  • "Uknown}" -- specific to RuSentRel entity case #319
  • BertExperimentInputSerializerIterationHandler -- Simplify API [Blog example backlog] #318
  • BaseRowsStorage -- consider rows shuffling [ARElight backlog] #316
  • EntityIds -- expected to be a part of the BaseSampleRowProvider [ARElight backlog] #312
  • iter_synonym_groups [Sources]-- refactor to common method [ARElight backlog] #310
  • term-embedding-pairs -- refactor chain of the parameter dependencies. #304
  • Move EntityFormatters outside #302
  • Sources -- RusentRel collection based on brat toolkit serialization format #287
  • BaseOpinionsRowProvider -- useless class and hence should be removed [refactoring IOUtils] #282
  • IOUtils -- replace experiment instance (and dependency) with string provider. #252
  • Annotator and algorithm is not related to experiment. #250
  • DocumentOperations -- parsed docs related API is not related to the expetiment concepts. #249
  • Remove sep_doc_id variable #131
  • Update Framework Description #74

Fixed bugs:

  • StringWithEmbeddingNetworkTermMapping -- map_token is expected a particular type of embedding which return embedding only #395
  • NetworksTrainingPipelineItem -- pass labels count #379
  • BertDefaultStringTextTermsMapper -- non masked entity values might be with separation between words #377
  • iter_rows_linked_by_text_opinions -- fixed bug with incorrect check. Removed doc-related check. #356
  • TextOpinion should be a part of a single sentence -- this limitation is not emphasized in any way of exceptions and assertions #339
  • BaseParsedNewsServiceProvider -- incorrect IDs assignation #337
  • Example -- Documents become mixed [RuAttitudes Affection] #292
  • RuAttitudes -- extract_text_opinions_linkages utilizes a different approach which is not covered by common impementation. #232

Closed issues:

  • SamplesIO -- view always intialized from tsv #397
  • SamplesIO -- make optional writer #396
  • NoLabel -- allow to customize so for annotators. #393
  • Source -- remove common labels #392
  • Tutorials #390
  • Embed SentiNEREL collection #389
  • RuSentRel and RuAttitudes data pipelines -- provide at utils contrib #387
  • Serialization pipelines -- move them to utils contrib [pipeline part] #386
  • Lexicons -- move to the utils contrib project #385
  • Remove Gensim dependency #384
  • Evaluation -- ability to extract errors [Backlog] #375
  • BaseSampleRowProvider -- has BERT dependencies from contrib #374
  • BaseIOUtils -- remove write_opinion_collection #373
  • BaseExperiment -- remove this class. #372
  • ExperimentTrainingContext -- this could be removed. #371
  • BaseTensorflowModel -- provide DataType parameter for fitting #370
  • ExperimentSerializationContext -- remove EntityFormatter [Backlog] #361
  • TextOpinion -- id may be a variety of types #358
  • TextOpinion -- remove owner field #357
  • Experiment pipelines to contrib.utils #326
  • Experiment handlers to contrib.utils #325
  • Experiment cv to contrib.utils #324
  • RuSentRelOpinionCollectionWriter -- provide encoding parameter [ARElight backlog] #317
  • LabelsFormatter for TextB [BERT] -- labels might be not supported [ARElight backlog] #315
  • RuSentRel experiment -- TextParser could not be customized [ARElight backlog] #314
  • InputSerializers (BERT/Networks) -- __init__ should not depend on data-related information [ARElight backlog] #313
  • StringEntitiesFormatter -- rename EntityType to OpnionEntityType [QUICK] #307
  • Annotation -- Opinion annotation should be implemented at OpinionOperations.iter_opinions_for_extraction #281
  • SampleView -- adopt multiple views provider [Refactoring] #269

v0.22.0-rc-p1 (2022-04-02)

Full Changelog

Implemented enhancements:

  • Remove non utilized flags in IterationHandlers [ARElight backlog] #309

Fixed bugs:

  • BertExperimentInputSerializerIterationHandler -- missed value_to_group_id_func parameter #311

v0.22.0-rc-p0 (2022-03-29)

Full Changelog

Fixed bugs:

  • Remove , presence assertion from Opinon __init__ class method #306
  • ModuleNotFoundError: No module named 'arekit.common.data.input.providers.instances' #301

Closed issues:

  • What's New -- Release 0.22.0 #227