We explored two types of summarization: abstractive and extractive to derive pros and cons of the both approaches.
We parsed articles of the Russian Foreign Economic Bulletin for the source material in non-ML extractive approaches. We also used sentence tokinzer tool created by MIPT DeepPavlov Lab.
We started our research with non-ML approaches and used Text/LexRank for summarizations.
We relied on the experiences described in the following papers:
- LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
- TextRank: Bringing Order into Texts
In terms of ML-approach for extractive summarization we used BERTSUM approach:
Under the condition of the first iteration in RnD we used PGN-architecture on the base of AllenNLP framework. We relied on the experience described in the following paper:
Right now we are working on improvement of the study in both fields:
- Extractive approach: LM-tuning for specific domain, domain adaptation
- Abtractive approach: as ROUGE metric is discrete it cannot be optimised, we study RL+ML approach for objective function modelling