Sentence Rewriting with Language Model

An implementation of the transformer-based language model for sentence rewriting tasks such as summarization, text simplification, paraphrase generation, style transfer, and grammatical error correction. The following figure shows the architecture overview. This model receives an input that joint original sentence and simplified sentence by special token <SEP>, which means the delimiter. Then, the model generates target sentences. This architecture is very simple, but have shown the great result in text summarization task and text simplification task.

Installation

This code are depend on the following.

python==3.6.5
pytorch==1.1.0
torchtext==0.3.1

git clone https://github.com/t080/pytorch-translm.git
cd ./pytorch-translm
pip install -r requirements.txt

Usages

Pre-training

The dataset for fine-tuning must be a text file. The input sentence must be segmented to words by whitespace. If you want to use GPU, please set the option --gpu.

python train.py pretrain \
    --train ./path/to/train.txt \
    --savedir ./checkpoints/pre-trained \
    --gpu

Fine-tuning

The dataset for fine-tuning must be TSV format. The source sentences and target sentences must be segmented to words by whitespace. If you want to use GPU, please set the option --gpu.

python train.py finetune \
    --model ./checkpoints/pre-trained/checkpoint_best.pt \
    --train ./path/to/train.tsv \
    --valid ./path/valid.tsv \
    --savedir ./checkpoints/fine-tuned \
    --gpu

Translation

In the translation step, you must set the option --model and --input. You can set sentence length of the model's output using the --maxlen option (default: 100 tokens).

python generate.py \
    --model ./checkpoints/fine-tuned/checkpoint_best.pt \
    --input ./path/to/test.txt \
    --gpu

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
models		models
samples		samples
README.md		README.md
datasets.py		datasets.py
generate.py		generate.py
options.py		options.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

samples

samples

README.md

README.md

datasets.py

datasets.py

generate.py

generate.py

options.py

options.py

requirements.txt

requirements.txt

train.py

train.py

utils.py

utils.py

Repository files navigation

Sentence Rewriting with Language Model

Installation

Usages

Pre-training

Fine-tuning

Translation

References

About

Releases

Packages

Languages

tm4roon/pytorch-translm

Folders and files

Latest commit

History

Repository files navigation

Sentence Rewriting with Language Model

Installation

Usages

Pre-training

Fine-tuning

Translation

References

About

Topics

Resources

Stars

Watchers

Forks

Languages