# IBM Model 1, Variational Bayes
#### Authors: Adriaan de Vries, Féliciën Veldema, Verna Dankers

This notebook implements the variational bayes training algorithm for IBM Model 1. Run the cells in order to run the algorithm.

### 1. Requirements

In [6]:
# Python libraries to install
from __future__ import print_function, division
from collections import defaultdict, Counter
from tqdm import tqdm
from random import random
from scipy.special import digamma, loggamma, gammaln

import numpy as np
import pickle
import math
import os

# Custom requirements
from aer import read_naacl_alignments, AERSufficientStatistics, test
import data

### 2. Read in the data

Please set the paths to the data and run the code below. Functions for reading in the data have been placed outside of the notebook, as they are re-used by other notebooks.

In [7]:
english_train = 'training/hansards.36.2.e'
french_train = 'training/hansards.36.2.f'
english_val = 'validation/dev.e'
french_val = 'validation/dev.f'
fname = 'naacltest.txt'

training_data = data.read_data(english_train, french_train, True)
ext_data = list(zip(*training_data))
validation_data = data.read_data(english_val, french_val, True, ttype='validation', eng_data=ext_data[0], fre_data=ext_data[1])

Adding the -UNK- token to the data.
English data complete.
French data complete
Adding the -UNK- token to the data.
English data complete.
French data complete


### 3. Implementation of IBM1 VB

First, we implement the training algorithm, and the functions to calculate alignments, log likelihood and elbo.

In [8]:
def elbo(data, t, f_vocab, alpha, lambdas):
    """Calculate the ELBO for the training data.

    Args:
        data: zipped object with pairs of e and f sentences
        t: dictionary with translation probabilities e to f
        f_vocab: set of French words
        alpha: value for dirichlet prior
        lambdas: adapted counts from last iteration

    Returns:
        float: elbo
    """
    # Start by calculating log likelihood
    ll = log_likelihood(data, t)
    
    # Add -KL to the log likelihood
    elbo = ll
    gammaln_alpha = gammaln(alpha)
    c = gammaln(alpha * len(f_vocab))
    for e in tqdm(t):
        a = sum([(math.log(t[e][f]) if t[e][f] != 0 else 0) * (alpha - lambdas[e][f])
                 +  gammaln(lambdas[e][f]) - gammaln_alpha for f in lambdas[e] if f != "-REST-"])
        b = gammaln(sum([(lambdas[e][f] if f in lambdas[e] else alpha) for f in f_vocab]))
        elbo += a - b + c
    return elbo

def log_likelihood(data, translate_dict, add_constant=False):
    """Calculate the log likelihood for the training data.

    Args:
        data: zipped object with pairs of e and f sentences
        translate_dict: dictionary with translation probabilities e to f
        add_constant: whether to add the length normalisation constant

    Returns:
        float: log likelihood
    """
    log_likelihood = 0
    for e, f in data:
        alignment = VB_align(e, f, translate_dict, True)
        prob = 0
        for j, i in alignment:
            prob += math.log(translate_dict[e[j]][f[i-1]])
        log_likelihood += prob

        # Length normalisation constant
        if add_constant:
            log_likelihood += -len(f) * np.log(len(e) + 1)
    return log_likelihood

def initialize_t(data, uniform=True):
    """Initialise the translation probabilities.
    
    Args:
        data: list of tuples, english and french sentences
        uniform: boolean indicating initialisation type

    Returns:
        defaultdict(Counter)
    """
    # Initialise random or uniform
    t = defaultdict(Counter)
    for e, f in tqdm(data):
        for e_word in e:
            for f_word in f:
                if uniform:
                    t[e_word][f_word] = 1
                else:
                    t[e_word][f_word] = random()

    # Normalise counts for every English word
    for e_word in t:
        normalization_factor = sum(list(t[e_word].values()))
        for f_word in t[e_word]:
            t[e_word][f_word] = t[e_word][f_word] / normalization_factor
    return t


def VB_align(english_words, french_words, translate_dict, f_vocab, add_null=True):
    """Align one sentence pair, either with or without the NULL alignments.
    
    Args:
        english_words: list of english words
        french_words: list of french words
        translate_dict: dictionary with translation probabilities e to f
        add_null: boolean to indicate whether NULL alignments should be included

    Return:
        list of tuples
    """
    alignment = []
    for j, fword in enumerate(french_words):
        prior = 0.0
        alignment_j = 0
        for i, eword in enumerate(english_words):
            # Only include terms that are in the dictionary
            if eword in translate_dict:
                if fword in translate_dict[eword]:
                    prob = translate_dict[eword][fword]
                else:
                    prob = translate_dict[eword]["-REST-"]
                if prob > prior:
                    prior = prob
                    alignment_j = i
                # Add dependent on whether it's a NULL alignments
        if alignment_j != 0 or add_null:
            alignment.append((alignment_j, j + 1))
    return alignment

def VB_align_all(data, translate_dict, f_vocab, fname=None):
    """Create alignments for pairs of English and French sentences.
    Both save them as sets per sentence and pair and save to file.
    
    Args:
        validation: zipped object with pairs of e and f sentences
        translate_dict: dictionary with translation probabilities e to f
        fname: filename to save alignments in, in NAACL format

    Returns:
        list of sets
    """
    file = open(fname, 'w')
    alignments = []
    for k, (english_words, french_words) in enumerate(data):
        alignment = VB_align(english_words, french_words, translate_dict, f_vocab, False)
        for pos1, pos2 in alignment:
            file.write("{} {} {}\n".format(str(k+1), str(pos1), str(pos2)))
        alignments.append(set(alignment))
    return alignments

def VB_IBM1(data, validation, alpha, max_steps=20, translate_dict=None):
    print("Initializing translation dictionary.")
    if translate_dict is None:
        translate_dict = initialize_t(data)
    e_vocab = translate_dict.keys()
    f_vocab = {f for e in translate_dict for f in translate_dict[e]}
    for iteration in range(max_steps):
        change = False
        fname = 'iteration' + str(iteration) + '.txt'
        lambdas = defaultdict(lambda : defaultdict(lambda : alpha))

        print("Expectation step {}".format(iteration + 1))
        for e_s, f_s in tqdm(data):
            for f in f_s:
                sum_of_probs = sum([translate_dict[e2][f] for e2 in e_s])
                for e in e_s:
                    lambdas[e][f] += translate_dict[e][f] / sum_of_probs

        print("Maximisation step {}".format(iteration + 1))
        for e in tqdm(e_vocab):
            summation = 0
            for f2 in f_vocab:
                if f2 in lambdas[e]:
                    summation += lambdas[e][f2]
                else:
                    summation += alpha
            summation = digamma(summation)
            for f in translate_dict[e]:
                translate_dict[e][f] = np.exp(digamma(lambdas[e][f]) - summation)
            translate_dict[e]["-REST-"] = np.exp(digamma(alpha) - summation)

        alignments = VB_align_all(validation, translate_dict, f_vocab, fname)
        eb = elbo(data, translate_dict, f_vocab, alpha, lambdas)
        print("Elbo: {}".format(eb))
        aer = test("", alignments)
        print("AER: {}".format(aer))
        # pickle.dump(translate_dict, open("translate_dicts/ibm1_vb_epoch_{}.pickle".format(iteration + 1), 'wb'))
    return translate_dict

### 4. Train a model

In [4]:
alpha = 0.0005
translate_dict = VB_IBM1(training_data, validation_data, alpha, 5)

Initializing translation dictionary.


100%|████████████████████████████████| 231164/231164 [00:50<00:00, 4561.52it/s]


Expectation step 1


100%|████████████████████████████████| 231164/231164 [02:31<00:00, 1523.80it/s]


Maximisation step 1


100%|███████████████████████████████████| 25593/25593 [04:12<00:00, 101.21it/s]


-17573799.8262083


100%|███████████████████████████████████| 25593/25593 [04:05<00:00, 104.36it/s]


Elbo: -62703128.3598518
AER: 0.3697718631178707
Expectation step 2


100%|████████████████████████████████| 231164/231164 [03:37<00:00, 1063.92it/s]


Maximisation step 2


100%|████████████████████████████████████| 25593/25593 [05:38<00:00, 75.61it/s]


-12053928.908525312


100%|████████████████████████████████████| 25593/25593 [05:30<00:00, 77.34it/s]


Elbo: -25999655.36870337
AER: 0.3378509196515005
Expectation step 3


100%|████████████████████████████████| 231164/231164 [03:27<00:00, 1113.06it/s]


Maximisation step 3


100%|████████████████████████████████████| 25593/25593 [05:48<00:00, 73.40it/s]


-10373936.71996149


100%|████████████████████████████████████| 25593/25593 [06:02<00:00, 70.62it/s]


Elbo: -15908872.020047462
AER: 0.32748538011695905
Expectation step 4


100%|█████████████████████████████████| 231164/231164 [04:01<00:00, 957.23it/s]


Maximisation step 4


100%|████████████████████████████████████| 25593/25593 [06:38<00:00, 64.17it/s]


-9551558.112916136


100%|████████████████████████████████████| 25593/25593 [05:29<00:00, 77.72it/s]


Elbo: -13279577.700044598
AER: 0.3195266272189349
Expectation step 5


100%|████████████████████████████████| 231164/231164 [03:19<00:00, 1159.78it/s]


Maximisation step 5


100%|████████████████████████████████████| 25593/25593 [05:47<00:00, 73.58it/s]


-9069730.165113095


100%|████████████████████████████████████| 25593/25593 [06:08<00:00, 69.46it/s]


Elbo: -12195409.232732926
AER: 0.3168805528134254


In [5]:
translate_dict2 = VB_IBM1(training_data, validation_data, alpha, 10, translate_dict)

Initializing translation dictionary.
Expectation step 1


100%|████████████████████████████████| 231164/231164 [03:25<00:00, 1125.73it/s]


Maximisation step 1


100%|████████████████████████████████████| 25593/25593 [05:57<00:00, 71.65it/s]


-8756675.05166398


100%|████████████████████████████████████| 25593/25593 [05:49<00:00, 73.30it/s]


Elbo: -11604593.752000997
AER: 0.3165182987141444
Expectation step 2


100%|████████████████████████████████| 231164/231164 [03:43<00:00, 1034.19it/s]


Maximisation step 2


100%|████████████████████████████████████| 25593/25593 [06:00<00:00, 70.94it/s]


-8538643.967889402


100%|████████████████████████████████████| 25593/25593 [06:12<00:00, 68.71it/s]


Elbo: -11233624.917231584
AER: 0.3214638971315529
Expectation step 3


100%|████████████████████████████████| 231164/231164 [03:27<00:00, 1114.63it/s]


Maximisation step 3


100%|████████████████████████████████████| 25593/25593 [05:53<00:00, 72.33it/s]


-8379600.277170806


100%|████████████████████████████████████| 25593/25593 [06:22<00:00, 66.94it/s]


Elbo: -10979227.661674405
AER: 0.3204747774480712
Expectation step 4


100%|████████████████████████████████| 231164/231164 [03:20<00:00, 1155.59it/s]


Maximisation step 4


100%|████████████████████████████████████| 25593/25593 [05:42<00:00, 74.80it/s]


-8260401.581531495


100%|████████████████████████████████████| 25593/25593 [05:31<00:00, 77.21it/s]


Elbo: -10795209.766810227
AER: 0.32015810276679846
Expectation step 5


100%|████████████████████████████████| 231164/231164 [03:13<00:00, 1195.28it/s]


Maximisation step 5


100%|████████████████████████████████████| 25593/25593 [05:41<00:00, 75.01it/s]


-8168587.427075919


100%|████████████████████████████████████| 25593/25593 [05:46<00:00, 73.82it/s]


Elbo: -10657075.245848164
AER: 0.3184965380811078
Expectation step 6


100%|████████████████████████████████| 231164/231164 [03:11<00:00, 1205.55it/s]


Maximisation step 6


100%|████████████████████████████████████| 25593/25593 [05:38<00:00, 75.59it/s]


-8096092.05062316


100%|████████████████████████████████████| 25593/25593 [05:32<00:00, 76.97it/s]


Elbo: -10550704.063032232
AER: 0.3175074183976261
Expectation step 7


100%|████████████████████████████████| 231164/231164 [03:13<00:00, 1197.03it/s]


Maximisation step 7


100%|████████████████████████████████████| 25593/25593 [05:45<00:00, 74.15it/s]


-8037669.674119085


100%|████████████████████████████████████| 25593/25593 [06:17<00:00, 67.77it/s]


Elbo: -10465662.038439553
AER: 0.3177290836653387
Expectation step 8


100%|████████████████████████████████| 231164/231164 [03:12<00:00, 1202.68it/s]


Maximisation step 8


100%|████████████████████████████████████| 25593/25593 [05:37<00:00, 75.78it/s]


-7989495.226116224


100%|████████████████████████████████████| 25593/25593 [05:54<00:00, 72.28it/s]


Elbo: -10396671.125520742
AER: 0.31200000000000006
Expectation step 9


100%|████████████████████████████████| 231164/231164 [03:11<00:00, 1209.26it/s]


Maximisation step 9


100%|████████████████████████████████████| 25593/25593 [05:36<00:00, 76.10it/s]


-7949260.621093685


100%|████████████████████████████████████| 25593/25593 [06:04<00:00, 70.29it/s]


Elbo: -10339764.700858975
AER: 0.31299999999999994
Expectation step 10


100%|████████████████████████████████| 231164/231164 [03:11<00:00, 1205.42it/s]


Maximisation step 10


100%|████████████████████████████████████| 25593/25593 [05:38<00:00, 75.66it/s]


-7915413.05314147


100%|████████████████████████████████████| 25593/25593 [05:31<00:00, 77.30it/s]


Elbo: -10292240.937078912
AER: 0.31299999999999994
