# Various experiments

**NOTE** all of these experiments were run on the faulty Hungarian data (unless explicitly stated otherwise).

## Data modification

### POS moved

POS moved to the end of the lemma.

### Bigram

~~~
mé éz zé éd de es       mé éz zé éd de es se ek ki ig   N;TERM;PL
le ep pé én ny yh ha al le ep pé én ny yh ha al ln na ak        N;DAT;SG
ag gy yo on nv vá ág    ag gy yo on nv vá ág gn né ék   V;COND;PRS;INDF;1;SG
~~~

### Data augmentation - symmetric pairs

Generate every possible pair of inflections and lemma of the same word form.

~~~
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> LEMMA </T>  a b i o g é n
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> N IN+ALL SG </T>    a b i o g é n b e
<W> a b i o g é n </W> <S> LEMMA </S> <T> LEMMA </T>    a b i o g é n
<W> a b i o g é n </W> <S> LEMMA </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

### Reverse target sequence

~~~
borotva zohkávtorob     N;AT+ALL;PL
kigúnyol        avloynúgik      V.CVB
földcsuszamlás  lóbsálmazsuscdlöf       N;ON+ABL;SG
hírlap  kanpalríh       N;DAT;SG
~~~

### Mix with other languages

Merge and shuffle data in two or more languages.

I tried Finnish and Welsh.

### Filtering incorrect Hungarian examples

About 10% of the Hungarian train and dev data are incorrect due to Wiktionary parse errors. I filtered these and trained some of the models on the smaller correct dataset.

## Models

### Luong attention

Vanilla seq2seq + Luong attention.

Differences from the 2016 winner:

* Luong attention instead of Bahdanau attention. Reason: haven't implemented the other yet.
* LSTMs instead of GRUs. Reason: in all my other experiments LSTMs outperformed GRUs and I'm not sure why they use GRUs.

The input data is converted to:

~~~
<S> a b i o g é n </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

### Two-headed attention

The lemma and the tags are encoded separately and two attention separately attend to them while decoding.

### Misc variations

#### L1 regularization

Ran a few experiments, not planning anything with it right now.

# Common code

In [1]:
import pandas as pd
import os
import yaml


def compute_sparsity(model_fn, threshold=10e-3):
    is_zero = 0
    non_zero = 0
    for name, tensor in torch.load(model_fn).items():
        m = tensor.cpu().numpy()
        close = len(np.where(np.abs(m) <= threshold)[0])
        is_zero += close
        non_zero += (m.size - close)
    return is_zero, non_zero, is_zero / (is_zero + non_zero)
    
    
def get_min_loss(row):
    min_idx, min_dev_loss = min(enumerate(row['dev_loss']), key=lambda x: x[1])
    min_train_loss = row['train_loss'][min_idx]
    row['min_dev_loss'] = min_dev_loss
    row['min_train_loss'] = min_train_loss
    return row
    
    
def extract_language_name(field):
    fn = field.split('/')[-1]
    if 'dev' in fn:
        return '-'.join(fn.split('-')[:-1])
    return '-'.join(fn.split('-')[:-2])
    

def load_res_dir(basedir, include_sparsity=False):
    experiments = []
    for subdir in os.scandir(basedir):
        exp_d = {}
        with open(os.path.join(subdir.path, "config.yaml")) as f:
            exp_d.update(yaml.load(f))
        res_fn = os.path.join(subdir.path, "result.yaml")
        if os.path.exists(res_fn):
            with open(os.path.join(subdir.path, "result.yaml")) as f:
                exp_d.update(yaml.load(f))
        else:
            continue
        dev_acc_path = os.path.join(subdir.path, "dev.word_accuracy")
        if os.path.exists(dev_acc_path):
            with open(dev_acc_path) as f:
                exp_d['dev_acc'] = float(f.read())
        else:
            print("Dev accuracy file does not exist in dir: {}".format(subdir.path))
        if include_sparsity:
            exp_d['sparsity'] = compute_sparsity(os.path.join(subdir.path, "model"), 10e-4)
        experiments.append(exp_d)
    experiments = pd.DataFrame(experiments)
    if include_sparsity:
        experiments['sparsity_ratio'] = experiments['sparsity'].apply(lambda x: x[2])
    experiments['language'] = experiments.dev_file.apply(extract_language_name)
    experiments = experiments.apply(get_min_loss, axis=1)
    experiments = experiments[experiments['dev_acc'].notnull()]
    experiments = experiments[experiments['dev_loss'].notnull()]
    experiments['train_size'] = experiments['train_file'].apply(lambda fn: fn.split('-')[-1])
    return experiments

## Data modification

### POS moved

POS moved to the end of the lemma.

In [2]:
exp_dir = "../../exps/sigmorphon_2018/pos_moved/"

all_experiments = load_res_dir(exp_dir)
all_experiments['exp_type'] = 'pos_moved'
all_experiments.dev_acc.max()

0.854

### Bigram

~~~
mé éz zé éd de es       mé éz zé éd de es se ek ki ig   N;TERM;PL
le ep pé én ny yh ha al le ep pé én ny yh ha al ln na ak        N;DAT;SG
ag gy yo on nv vá ág    ag gy yo on nv vá ág gn né ék   V;COND;PRS;INDF;1;SG
~~~

In [3]:
exp_dir = "../../exps/sigmorphon_2018/bigram/"

experiments = load_res_dir(exp_dir)
experiments['exp_type'] = 'bigram'
all_experiments = pd.concat((all_experiments, experiments))
experiments.dev_acc.max()

0.569

### Data augmentation - symmetric pairs

Generate every possible pair of inflections and lemma of the same word form.

~~~
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> LEMMA </T>  a b i o g é n
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> N IN+ALL SG </T>    a b i o g é n b e
<W> a b i o g é n </W> <S> LEMMA </S> <T> LEMMA </T>    a b i o g é n
<W> a b i o g é n </W> <S> LEMMA </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

In [4]:
exp_dir = "../../exps/sigmorphon_2018/hun_enhanced/"

experiments = load_res_dir(exp_dir)
print("Max dev accuracy on the enhanced data: {}".format(experiments.dev_acc.max()))

for row in experiments.iterrows():
    with open(os.path.join(row[1].experiment_dir, 'real_dev.word_accuracy')) as f:
        experiments.loc[row[0], 'dev_acc'] = float(f.read())
        
experiments['exp_type'] = 'symmetric_augmented'
all_experiments = pd.concat((all_experiments, experiments))

Max dev accuracy on the enhanced data: 0.97668


### Reverse target sequence

~~~
borotva zohkávtorob     N;AT+ALL;PL
kigúnyol        avloynúgik      V.CVB
földcsuszamlás  lóbsálmazsuscdlöf       N;ON+ABL;SG
hírlap  kanpalríh       N;DAT;SG
~~~

In [5]:
exp_dir = "../../exps/sigmorphon_2018/hun_rev/"

experiments = load_res_dir(exp_dir)
experiments['exp_type'] = 'reverse_target'
all_experiments = pd.concat((all_experiments, experiments))
experiments.dev_acc.max()

0.859

### Mix with other languages

Merge and shuffle data in two or more languages.

#### Hungarian and Finnish

In [6]:
exp_dir = "../../exps/hun_fin/"

hun_fin = load_res_dir(exp_dir)
len(hun_fin), hun_fin.dev_acc.max()

(5, 0.792)

#### Hungarian and Welsh

In [7]:
exp_dir = "../../exps/hun_welsh/"

hun_welsh = load_res_dir(exp_dir)
len(hun_welsh), hun_welsh.dev_acc.max()

(5, 0.812727)

### Filtering incorrect Hungarian examples

About 10% of the Hungarian train and dev data are incorrect due to Wiktionary parse errors. I filtered these and trained some of the models on the smaller correct dataset.

## Models

### Luong attention

Vanilla seq2seq + Luong attention.

Differences from the 2016 winner:

* Luong attention instead of Bahdanau attention. Reason: haven't implemented the other yet.
* LSTMs instead of GRUs. Reason: in all my other experiments LSTMs outperformed GRUs and I'm not sure why they use GRUs.

The input data is converted to:

~~~
<S> a b i o g é n </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

### Two-headed attention

The lemma and the tags are encoded separately and two attention separately attend to them while decoding.

### Hard monotonic attention

TODO