Textsum for chinese dialogues

生成式中文文本摘要模型

Use seq2seq with Bahdanau attention, tensorflow 2.0

Implementations:

fasttext embedding
attention coverage loss
sequence-length-based loss: loss_ *= 1.0 / exp(sequence_length / 100.0)
beam search
bucket batch, longer/varing sequence: low traing performance can be solved by adding input_signature, and when max length is very long, OOM happens often, which can be solved by decreasing batch_size, but this can slow down the training speed actually. Sequence trimming cannot resolve this either for unknown reasons.

TODO:

scheduled sampling
embedding finetuning
tf-idf based loss
implementation in paddlepaddle
Pgen

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
paddle_model/seq2seq_att		paddle_model/seq2seq_att
references		references
tensorflow_model		tensorflow_model
transformer		transformer
utils		utils
.gitignore		.gitignore
README.md		README.md
beam_search.py		beam_search.py
load_data.py		load_data.py
preprocessing.py		preprocessing.py
test.py		test.py
tf_run.py		tf_run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paddle_model/seq2seq_att

paddle_model/seq2seq_att

references

references

tensorflow_model

tensorflow_model

transformer

transformer

utils

utils

.gitignore

.gitignore

README.md

README.md

beam_search.py

beam_search.py

load_data.py

load_data.py

preprocessing.py

preprocessing.py

test.py

test.py

tf_run.py

tf_run.py

Repository files navigation

Textsum for chinese dialogues

生成式中文文本摘要模型

Use seq2seq with Bahdanau attention, tensorflow 2.0

Implementations:

TODO:

About

Releases

Packages

Languages

haosida123/textsum

Folders and files

Latest commit

History

Repository files navigation

Textsum for chinese dialogues

生成式中文文本摘要模型

Use seq2seq with Bahdanau attention, tensorflow 2.0

Implementations:

TODO:

About

Topics

Resources

Stars

Watchers

Forks

Languages