<a href="https://colab.research.google.com/github/mihapetri/Chen-s-heartbeat-detection/blob/main/Machine_Translator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)

**Machine translation with OpenNMT.**

**1. Setting up the environment**

In [None]:
# install OpenNMT
!pip3 install git+https://github.com/OpenNMT/OpenNMT-py.git

In [None]:
# remember to restart runtime after the execution of the following line
!pip3 install --ignore-installed torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

In [None]:
# Install gdown for downloading files from Google Drive
!pip install gdown

**2. Downloading data**

Download all the files needed for training and testing.

In [None]:
# download all the files

# TC3 train, validation and test files
!gdown  https://drive.google.com/uc?id=1-t1CPbmi7GIVQxqKdYIxDPhmilBmOPaj
!unzip NLP_general.zip


# Military corpora for the domain translator
!gdown https://drive.google.com/uc?id=1BnbGF_hAWsHMbeQ5Xuz9yXaW-qdHbgii
!unzip NLP_domain.zip


# Sentences for testing
#!gdown https://drive.google.com/uc?id=1NoRuGY9GJcvUXJxz0Cw4VtSzyU4MljiH
#!unzip testset.zip

If you don't want to train you can use some of the pretrained models. If so you can just skip to **4. Translating sentences**.

In [None]:
# Download the source and target vocabulary

#!gdown https://drive.google.com/uc?id=1s40dbwpNfu7QO7HkXcP61pyPnm1ZCjeY
#!gdown https://drive.google.com/uc?id=1YdKH3AXPl0MKOBaOxcV4cg4SpJ7EM01v


# Pretrained general model
#!gdown https://drive.google.com/uc?id=1lNLgBY7LHveaUdd8_v0hKK07I58V9sJZ
# model with ADAM optimizer
#!gdown https://drive.google.com/uc?id=1tzeFIYh4cCk4GppO4ybrzwOPX7pOoV5o


# Pretrained domain model
#!gdown https://drive.google.com/uc?id=1-0-ar-kjLg2ArXYV8BCHd2uW9s1VUBHb

**3. Building the vocabulary and training the model**

In [None]:
# Create a configuration file for building the vocabulary
config = '''# military_config.yaml

## Where the samples will be written
save_data: run/example

## Where the vocab(s) will be written
src_vocab: example.vocab.src
tgt_vocab: example.vocab.tgt

## Where the model will be saved
save_model: model/model

# Prevent overwriting existing files in the folder
overwrite: False

# Corpus opts:
data:
    corpus_1:
        path_src: tc3_train_en.txt
        path_tgt: tc3_train_sl.txt
    valid:
        path_src: tc3_val_en.txt
        path_tgt: tc3_val_sl.txt

    corpus_2:
        path_src: mil_train_en.txt
        path_tgt: mil_train_sl.txt

    valid:
        path_src: mil_val_en.txt
        path_tgt: mil_val_sl.txt

world_size: 1
gpu_ranks: [0]
'''

with open("military_config.yaml", "w+") as config_yaml:
  config_yaml.write(config)

!cat military_config.yaml

In [None]:
# Build vocabulary

!onmt_build_vocab -config military_config.yaml -n_sample -1

You can find the training documentation [here](https://opennmt.net/OpenNMT-py/options/train.html).

In [None]:
# Update config for training

config = '''# military_config.yaml

## Where the samples will be written
save_data: run/example

## Where the vocab(s) will be written
src_vocab: example.vocab.src
tgt_vocab: example.vocab.tgt

## Where the model will be saved
save_model: model/model

# Prevent overwriting existing files in the folder
overwrite: False

# Corpus opts:
data:
    corpus_1:
        path_src: tc3_train_en.txt
        path_tgt: tc3_train_sl.txt
    valid:
        path_src: tc3_val_en.txt
        path_tgt: tc3_val_sl.txt


skip_empty_level: silent

train_steps: 10000
valid_steps: 2000

src_vocab_size: 150000
tgt_vocab_size: 150000

world_size: 1
gpu_ranks: [0]

## setting up some important parameters

encoder_type: "rnn"   # Possible choices: rnn, brnn, ggnn, mean, transformer, cnn, transformer_lm
decoder_type: "rnn"   # Possible choices: rnn, transformer, cnn, transformer_lm
enc_layers: 2
dec_layers: 2
n_node: 4
n_steps: 4
rnn_type: "LSTM"      # Possible choices: LSTM, GRU, SRU

optim: "sgd"          # Possible choices: sgd, adagrad, adadelta, adam, sparseadam, adafactor, fusedadam
dropout: 0.3
attention_dropout: 0.1
'''

with open("military_config.yaml", "w+") as config_yaml:
  config_yaml.write(config)

!cat military_config.yaml

In [None]:
# Train the model
!onmt_train -config military_config.yaml


# Or continue training a previous model 
#!onmt_train -train_from general_model.pt -config military_config.yaml

**4. Translating sentences**

In [None]:
# Translate

!onmt_translate -model general_model.pt -src tc_test_en.txt -output translation.txt -gpu 0 -verbose