Skip to content

Text summarization for Chinese dialogues with seq2seq and attention

Notifications You must be signed in to change notification settings

haosida123/textsum

Repository files navigation

Textsum for chinese dialogues

生成式中文文本摘要模型

Use seq2seq with Bahdanau attention, tensorflow 2.0

Implementations:

  • fasttext embedding
  • attention coverage loss
  • sequence-length-based loss: loss_ *= 1.0 / exp(sequence_length / 100.0)
  • beam search
  • bucket batch, longer/varing sequence: low traing performance can be solved by adding input_signature, and when max length is very long, OOM happens often, which can be solved by decreasing batch_size, but this can slow down the training speed actually. Sequence trimming cannot resolve this either for unknown reasons.

TODO:

  • scheduled sampling
  • embedding finetuning
  • tf-idf based loss
  • implementation in paddlepaddle
  • Pgen

About

Text summarization for Chinese dialogues with seq2seq and attention

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages