GitHub

text2text: Implementation of variants of Sequence to Sequence model:

Authors:

Sascha Rothe (rothe@google.com ),
Mostafa Dehghani (github:mostafadehghani)

Introduction

The code contains different implementations of sequence to sequence models:

Original Sequence to Sequence model with attention mechanism: Neural Machine Translation by Jointly Learning to Align and Translate
Bag of Words to Sequence model: Inspired by Order Matters: Sequence to sequence for sets
Incorporating copy mechanism with Sequence to Sequence model: Inspired by Incorporating Copying Mechanism in Sequence-to-Sequence Learning

DataSet

To prepare the dataset, see ExampleGen in data.py about the data format. data/data contains a toy example. Also see data/vocab for example vocabulary format. data/data_convert_example.py contains example of convert between binary and text.

How To Run

Pre-requesite:

Install TensorFlow and Bazel.

# cd to your workspace
# 1. Clone the text2text code to your workspace 'text2text' directory.
# 2. Create an empty 'WORKSPACE' file in your workspace.
# 3. Preapre the config file wrt the model you wish to run and put it in the
#    config directory.

ls -R
.:
text2text  WORKSPACE

./text2text:
batch_reader  beam_search.py  BUILD  config  data  data.py  decode.py  
__init__.py  library.py  main.py  metrics.py  model  README.md

./text2text/batch_reader:
copynet_batcher.py  __init__.py  vocab_batcher.py

./text2text/config:
cfg_copynet.py  cfg_seq2seq.py  cfg_bow2seq.py
 __init__.py 

./text2text/model:
copynet.py  __init__.py  seq2seq.py  bow2seq.py

./text2text/data:
data  data_convert_example.py  text_data  vocab


bazel build -c opt --copt=-mavx --config=cuda text2text:main

# Run the training.
bazel-bin/text2text/main \
    --mode=train \
    --config="cfg_seq2seq" \
    --log_root="text2text/log_root" \
    --override="eval_interval_secs=0" \
    --logtostderr

# Run the eval. Try to avoid running on the same machine as training.
bazel-bin/text2text/main \
    --mode=eval \
    --config="cfg_seq2seq" \
    --log_root="text2text/log_root" \
    --logtostderr

# Run the decode. Run it when the model is mostly converged.
bazel-bin/text2text/main \
  --mode=decode \
    --config="cfg_seq2seq" \
    --log_root="text2text/log_root" \
    --logtostderr

--config="config_file_name" determines the config file from the config dir in which the model you wish to run, paths to data, and hyperparameters of the model are specified. There are sample config files for each models in config directory. The output of the code and summaries will be written to a text2text/config_file_name directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text2text: Implementation of variants of Sequence to Sequence model:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
batch_reader		batch_reader
config		config
data		data
model		model
BUILD		BUILD
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
beam_search.py		beam_search.py
data.py		data.py
decode.py		decode.py
library.py		library.py
main.py		main.py
metrics.py		metrics.py

License

google/text2text

Folders and files

Latest commit

History

Repository files navigation

text2text: Implementation of variants of Sequence to Sequence model:

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages