arekit-0.22.1
Release Notes 🎉
WHAT'S NEW:
- 📓 Provide
BRAT-based reader
(refactoring) of documents and mentioned entities in it! 🥳 - 🔧 Provide verbose treatment of values for SynonymsCollection (#327)
- 🔧 Fixed embedding issues for
Entity
type for neural networks (#308) - 🔧 Refactoring
RuSentRel
reader, which is now repesents an ontop build over BRAT. (#287) - 🔧 Attitude annotation performed on a fly within a pipeline! (#281)
- 🔧 Opinion annotation does not depend on the experiment (#250)
- 🔧 #347
- 🆕 added
utils
contrib part and there were moved 🥳
- evaluation (2-3 scale)
- cv-splittings (#324)
- entity formatters
- synonyms collections templates: stemmer-based
- experiment handlers (#325)
- np_utils -- utils to interact with np-serialized data (#348)
- pipelines ➿ for opinions extraction and data serialization, text processing: we are now able to declare a custom pipeline and adopt serialization for a variety of RE tasks
(#322),
(#326)
(#351) - 🆕 API for conversion of external
text_opinions
intoparsed_news
(#338) - 🆕 API for a variety of pipelines for data preparation, depending on
DataType
(#343) - 🆕
DataType
now includesDev
andEtalon
by default (#345) - 🆕 Evaluation refactoring, and support
TextOpinion
level results evaluation (#355) - 🗑️
experimential_rusentrel
contrib part removed (#321) - 🗑️
OpinionRowsProvider
should be removed [ARElight backlog] (#282) - fixed: #356
Implemented enhancements:
- RuSentiFrames stat -- move script from
source
to the related UnitTest dir #391 - Vocabulary for Embedding -- save it in
.txt
format. #388 - BratSentence -- entities should be initialized via parameter #383
- ModelIO -- move vocab and embedding related API to EmbeddingIO #382
- BERT -- formatter differs only in TextB. #381
- Provide JSON writer for OpenNRE library #378
- ExperimentSerializationContext -- some parameters might be optional [Remove them] #369
ExperimentSerializationContext
--Annotator
property is not used. #368- DocumentOperations --
iter_doc_ids
actually wraps the ExperimentContext functionality #367 iter_tagget_doc_ids
-- this might be treated asiter_doc_ids
of an another instance #366ExperimentIterationHandler
-- switch to the PipelineItem for NN and BERT serialization [RemoveExperimentEngine
andExperimentHandler
] #365FixedFolding
-- intersected parts are not supported [NIVTS project backlog] #364InputDataSerializationHelper
-- refactoring #362exp_io.balance_samples
-- remove Dependency fromDataType.Train
#360- NeuralNetwork -- for the fine-tunning it is impossible to pick a default embedding/vocabulary. #359
- Evaluation -- support results evaluation for
TextOpinion
#355 DefaultOpinionAnnotator
--etalon_opinion
logic might be moved outside [RemoveDataType
dependency, backlog] #354StatesCount
,StateIndex
anditer_states
ofBaseDataFolding
-- this is a part of CV-based method #353- Evaluator refactoring #352
- Processing module -- Multiple Languages Scaling [Eng/Rus] [Contents Relocation] #351
- ExperimentContext -- remove Evaluator from the base class. #349
np_utils
-- move fromnetworks
toutils
contrib part #348StringWithEmbeddingNetworkTermMapping
-- has hard-coded algorithms for tokens and terms embedding creation. #347- Existed in Embedding -- log (remove print) #346
- DataType -- provide
Dev
andEtalon
default types [QUICK fix] #345 - Data Serialization -- update API that allow to provide a particular pipeline processor for each
DataType
[Backlog] #343 - Model io utils -- move into
contrib
part #342 Engine
-- provide states iterator as a parameter instead ofDataFolding
#341- Brat -- provide stability #340
- BaseParsedNewsServiceProvider -- support conversion from
Entity
toDocumentEntity
#338 - OpinionEntityType -- this should be generalized #335
- BratTextEntitiesParser and StringPartitioning -- nested entities are not supported. [Temp fix] #334
- RuAttitudesLabelConverter -- required only for conversion (not for parsing) #332
- SentenceOpinion -- no need to store entity values #331
- Utils -- provide opinion converters from brat #330
- RuAtttitudes -- move
SentenceOpinion
to brat #329 - BratEntityCollectionHelper --
extract_entities
considering for rows prefixed withT
#328 - SynonymsCollection --
value_to_group_id_func
does not support expansion by default. #327 - BERT and Network Serialization -- refactoring duplicated serialization implementations #322
exp_joined
-- removed such experiment atexperiment_rusentrel
contrib #321rusentrel_experiment
-- organize a separated python project #320- "Uknown}" -- specific to RuSentRel entity case #319
BertExperimentInputSerializerIterationHandler
-- Simplify API [Blog example backlog] #318- BaseRowsStorage -- consider rows shuffling [ARElight backlog] #316
- EntityIds -- expected to be a part of the BaseSampleRowProvider [ARElight backlog] #312
iter_synonym_groups
[Sources]-- refactor to common method [ARElight backlog] #310- term-embedding-pairs -- refactor chain of the parameter dependencies. #304
- Move EntityFormatters outside #302
- Sources -- RusentRel collection based on brat toolkit serialization format #287
BaseOpinionsRowProvider
-- useless class and hence should be removed [refactoring IOUtils] #282- IOUtils -- replace
experiment
instance (and dependency) with string provider. #252 - Annotator and algorithm is not related to experiment. #250
- DocumentOperations -- parsed docs related API is not related to the expetiment concepts. #249
- Remove
sep_doc_id
variable #131 - Update Framework Description #74
Fixed bugs:
StringWithEmbeddingNetworkTermMapping
--map_token
is expected a particular type of embedding which return embedding only #395- NetworksTrainingPipelineItem -- pass labels count #379
BertDefaultStringTextTermsMapper
-- non masked entity values might be withiter_rows_linked_by_text_opinions
-- fixed bug with incorrect check. Removed doc-related check. #356- TextOpinion should be a part of a single sentence -- this limitation is not emphasized in any way of exceptions and assertions #339
- BaseParsedNewsServiceProvider -- incorrect IDs assignation #337
- Example -- Documents become mixed [RuAttitudes Affection] #292
- RuAttitudes --
extract_text_opinions_linkages
utilizes a different approach which is not covered by common impementation. #232
Closed issues:
SamplesIO
-- view always intialized fromtsv
#397SamplesIO
-- make optional writer #396- NoLabel -- allow to customize so for annotators. #393
- Source -- remove
common
labels #392 - Tutorials #390
- Embed SentiNEREL collection #389
- RuSentRel and RuAttitudes data pipelines -- provide at
utils
contrib #387 - Serialization pipelines -- move them to
utils
contrib [pipeline part] #386 - Lexicons -- move to the
utils
contrib project #385 - Remove Gensim dependency #384
- Evaluation -- ability to extract errors [Backlog] #375
- BaseSampleRowProvider -- has BERT dependencies from contrib #374
BaseIOUtils
-- removewrite_opinion_collection
#373BaseExperiment
-- remove this class. #372ExperimentTrainingContext
-- this could be removed. #371- BaseTensorflowModel -- provide
DataType
parameter for fitting #370 - ExperimentSerializationContext -- remove EntityFormatter [Backlog] #361
TextOpinion
-- id may be a variety of types #358TextOpinion
-- removeowner
field #357- Experiment
pipelines
tocontrib.utils
#326 - Experiment
handlers
tocontrib.utils
#325 - Experiment
cv
tocontrib.utils
#324 - RuSentRelOpinionCollectionWriter -- provide encoding parameter [ARElight backlog] #317
- LabelsFormatter for TextB [BERT] -- labels might be not supported [ARElight backlog] #315
- RuSentRel experiment -- TextParser could not be customized [ARElight backlog] #314
- InputSerializers (BERT/Networks) --
__init__
should not depend on data-related information [ARElight backlog] #313 - StringEntitiesFormatter -- rename EntityType to OpnionEntityType [QUICK] #307
- Annotation -- Opinion annotation should be implemented at
OpinionOperations.iter_opinions_for_extraction
#281 - SampleView -- adopt multiple views provider [Refactoring] #269
v0.22.0-rc-p1 (2022-04-02)
Implemented enhancements:
- Remove non utilized flags in IterationHandlers [ARElight backlog] #309
Fixed bugs:
- BertExperimentInputSerializerIterationHandler -- missed
value_to_group_id_func
parameter #311
v0.22.0-rc-p0 (2022-03-29)
Fixed bugs:
- Remove
,
presence assertion from Opinon__init__
class method #306 - ModuleNotFoundError: No module named 'arekit.common.data.input.providers.instances' #301
Closed issues:
- What's New -- Release 0.22.0 #227