# Various experiments

**NOTE** all of these experiments were run on the faulty Hungarian data (unless explicitly stated otherwise).

## Data modification

### POS moved

POS moved to the end of the lemma.

### Bigram

~~~
mé éz zé éd de es       mé éz zé éd de es se ek ki ig   N;TERM;PL
le ep pé én ny yh ha al le ep pé én ny yh ha al ln na ak        N;DAT;SG
ag gy yo on nv vá ág    ag gy yo on nv vá ág gn né ék   V;COND;PRS;INDF;1;SG
~~~

### Data augmentation - symmetric pairs

Generate every possible pair of inflections and lemma of the same word form.

~~~
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> LEMMA </T>  a b i o g é n
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> N IN+ALL SG </T>    a b i o g é n b e
<W> a b i o g é n </W> <S> LEMMA </S> <T> LEMMA </T>    a b i o g é n
<W> a b i o g é n </W> <S> LEMMA </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

### Reverse target sequence

~~~
borotva zohkávtorob     N;AT+ALL;PL
kigúnyol        avloynúgik      V.CVB
földcsuszamlás  lóbsálmazsuscdlöf       N;ON+ABL;SG
hírlap  kanpalríh       N;DAT;SG
~~~

### Mix with other languages

Merge and shuffle data in two or more languages.

I tried Finnish and Welsh.

### Filtering incorrect Hungarian examples

About 10% of the Hungarian train and dev data are incorrect due to Wiktionary parse errors. I filtered these and trained some of the models on the smaller correct dataset.

## Models

### Luong attention

Vanilla seq2seq + Luong attention.

Differences from the 2016 winner:

* Luong attention instead of Bahdanau attention. Reason: haven't implemented the other yet.
* LSTMs instead of GRUs. Reason: in all my other experiments LSTMs outperformed GRUs and I'm not sure why they use GRUs.

The input data is converted to:

~~~
<S> a b i o g é n </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

### Two-headed attention

The lemma and the tags are encoded separately and two attention separately attend to them while decoding.

### Misc variations

#### L1 regularization

Ran a few experiments, not planning anything with it right now.

# Common code

In [1]:
import pandas as pd
import os
import yaml
pd.options.display.max_rows = 999

In [2]:
def compute_sparsity(model_fn, threshold=10e-3):
    is_zero = 0
    non_zero = 0
    for name, tensor in torch.load(model_fn).items():
        m = tensor.cpu().numpy()
        close = len(np.where(np.abs(m) <= threshold)[0])
        is_zero += close
        non_zero += (m.size - close)
    return is_zero, non_zero, is_zero / (is_zero + non_zero)
    
    
def get_min_loss(row):
    min_idx, min_dev_loss = min(enumerate(row['dev_loss']), key=lambda x: x[1])
    min_train_loss = row['train_loss'][min_idx]
    row['min_dev_loss'] = min_dev_loss
    row['min_train_loss'] = min_train_loss
    return row
    
    
def extract_language_name(field):
    if "hun" in field:
        return "hungarian"
    fn = field.split('/')[-1]
    if 'dev' in fn:
        return '-'.join(fn.split('-')[:-1])
    return '-'.join(fn.split('-')[:-2])
    

def extract_train_file_size(field):
    if 'train' in field:
        return field.split('-')[-1]
    return 'high'


def load_res_dir(basedir, include_sparsity=False):
    experiments = []
    for subdir in os.scandir(basedir):
        exp_d = {}
        with open(os.path.join(subdir.path, "config.yaml")) as f:
            exp_d.update(yaml.load(f))
        res_fn = os.path.join(subdir.path, "result.yaml")
        if os.path.exists(res_fn):
            with open(os.path.join(subdir.path, "result.yaml")) as f:
                exp_d.update(yaml.load(f))
        else:
            continue
        dev_acc_path = os.path.join(subdir.path, "dev.word_accuracy")
        if os.path.exists(dev_acc_path):
            with open(dev_acc_path) as f:
                exp_d['dev_acc'] = float(f.read())
        else:
            print("Dev accuracy file does not exist in dir: {}".format(subdir.path))
        train_acc_path = os.path.join(subdir.path, "train.word_accuracy")
        if os.path.exists(train_acc_path):
            with open(train_acc_path) as f:
                exp_d['train_acc'] = float(f.read())
        else:
            print("Train accuracy file does not exist in dir: {}".format(subdir.path))
        if include_sparsity:
            exp_d['sparsity'] = compute_sparsity(os.path.join(subdir.path, "model"), 10e-4)
        experiments.append(exp_d)
    experiments = pd.DataFrame(experiments)
    if include_sparsity:
        experiments['sparsity_ratio'] = experiments['sparsity'].apply(lambda x: x[2])
    experiments['language'] = experiments.dev_file.apply(extract_language_name)
    experiments = experiments.apply(get_min_loss, axis=1)
    experiments = experiments[experiments['dev_acc'].notnull()]
    experiments = experiments[experiments['dev_loss'].notnull()]
    experiments['train_size'] = experiments['train_file'].apply(extract_train_file_size)
    return experiments

## Data modification

### POS moved

POS moved to the end of the lemma.

In [3]:
exp_dir = "../../exps/sigmorphon_2018/pos_moved/"

all_experiments = load_res_dir(exp_dir)
all_experiments['exp_type'] = 'pos_moved'
all_experiments['data_corrected'] = False
all_experiments.dev_acc.max()

0.854

### Bigram

~~~
mé éz zé éd de es       mé éz zé éd de es se ek ki ig   N;TERM;PL
le ep pé én ny yh ha al le ep pé én ny yh ha al ln na ak        N;DAT;SG
ag gy yo on nv vá ág    ag gy yo on nv vá ág gn né ék   V;COND;PRS;INDF;1;SG
~~~

In [4]:
exp_dir = "../../exps/sigmorphon_2018/bigram/"

experiments = load_res_dir(exp_dir)
experiments['exp_type'] = 'bigram'
experiments['data_corrected'] = False
all_experiments = pd.concat((all_experiments, experiments))
experiments.dev_acc.max()

0.569

### Data augmentation - symmetric pairs

This is done on **corrected pairs**.

Generate every possible pair of inflections and lemma of the same word form.

~~~
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> LEMMA </T>  a b i o g é n
<W> a b i o g é n b e </W> <S> N IN+ALL SG </S> <T> N IN+ALL SG </T>    a b i o g é n b e
<W> a b i o g é n </W> <S> LEMMA </S> <T> LEMMA </T>    a b i o g é n
<W> a b i o g é n </W> <S> LEMMA </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

In [5]:
exp_dir = "../../exps/sigmorphon_2018/hun_enhanced/"

experiments = load_res_dir(exp_dir)
print("Max dev accuracy on the enhanced data: {}".format(experiments.dev_acc.max()))

for row in experiments.iterrows():
    with open(os.path.join(row[1].experiment_dir, 'real_dev.word_accuracy')) as f:
        experiments.loc[row[0], 'dev_acc'] = float(f.read())
        
experiments['exp_type'] = 'symmetric_augmented'
experiments['data_corrected'] = True
all_experiments = pd.concat((all_experiments, experiments))

Max dev accuracy on the enhanced data: 0.97668


In [6]:
all_experiments.groupby(['exp_type', 'data_corrected']).dev_acc.max().to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,dev_acc
exp_type,data_corrected,Unnamed: 2_level_1
bigram,False,0.569
pos_moved,False,0.854
symmetric_augmented,True,0.940716


### Reverse target sequence

~~~
borotva zohkávtorob     N;AT+ALL;PL
kigúnyol        avloynúgik      V.CVB
földcsuszamlás  lóbsálmazsuscdlöf       N;ON+ABL;SG
hírlap  kanpalríh       N;DAT;SG
~~~

In [7]:
exp_dir = "../../exps/sigmorphon_2018/hun_rev/"

experiments = load_res_dir(exp_dir)
experiments['exp_type'] = 'reverse_target'
experiments['data_corrected'] = False
all_experiments = pd.concat((all_experiments, experiments))
experiments.dev_acc.max()

0.859

### Mix with other languages

Merge and shuffle data in two or more languages.

#### Hungarian and Finnish

In [8]:
exp_dir = "../../exps/hun_fin/"

hun_fin = load_res_dir(exp_dir)
hun_fin["language"] = "hungarian+finnish"
len(hun_fin), hun_fin.dev_acc.max()

(5, 0.792)

#### Hungarian and Welsh

In [9]:
exp_dir = "../../exps/hun_welsh/"

hun_welsh = load_res_dir(exp_dir)
hun_welsh["language"] = "hungarian+welsh"
len(hun_welsh), hun_welsh.dev_acc.max()

(5, 0.812727)

In [10]:
all_experiments = pd.concat((all_experiments, hun_fin, hun_welsh)).reset_index(drop=True)

### Filtering incorrect Hungarian examples

About 10% of the Hungarian train and dev data are incorrect due to Wiktionary parse errors. I filtered these and trained some of the models on the smaller correct dataset.

In [11]:
exp_dir = "../../exps/sigmorphon_2018/hun_correct/"
exps = load_res_dir(exp_dir)

exp_dir = "../../exps/sigmorphon_2018/hun_correct_luong/"
exps = pd.concat((exps, load_res_dir(exp_dir)))

exps['exp_type'] = 'basic'
exps['data_corrected'] = True

all_experiments = pd.concat((all_experiments, exps))
all_experiments = all_experiments.reset_index(drop=True)

In [12]:
all_experiments.language.value_counts()

hungarian            222
hungarian+finnish      5
hungarian+welsh        5
Name: language, dtype: int64

## General experiments with two basic models


### `LuongAttentionSeq2seq`

Vanilla seq2seq + Luong attention.

Differences from the 2016 winner:

* Luong attention instead of Bahdanau attention. Reason: haven't implemented the other yet.
* LSTMs instead of GRUs. Reason: in all my other experiments LSTMs outperformed GRUs and I'm not sure why they use GRUs.

The input data is converted to:

~~~
<S> a b i o g é n </S> <T> N IN+ALL SG </T>      a b i o g é n b e
~~~

### `ReinflectionSeq2seq`: Two-headed attention

The lemma and the tags are encoded separately and two attention separately attend to them while decoding.

In [13]:
exp_dir = "../../exps/sigmorphon_2018/luong_hyperparam_search/"

exps = load_res_dir(exp_dir)

exp_dir = "../../exps/reinflection/"
exps = pd.concat((exps, load_res_dir(exp_dir)))
exp_dir = "../../exps/reinflection_ron/"
exps = pd.concat((exps, load_res_dir(exp_dir)))

exps['data_corrected'] = False
exps['exp_type'] = 'basic'
all_experiments = pd.concat((all_experiments, exps))

all_experiments = all_experiments.reset_index(drop=True)

print("Number of experiments-per-experiment type")
all_experiments.groupby(['model', 'exp_type']).size().to_frame()

Number of experiments-per-experiment type


Unnamed: 0_level_0,Unnamed: 1_level_0,0
model,exp_type,Unnamed: 2_level_1
LuongAttentionSeq2seq,basic,418
LuongAttentionSeq2seq,pos_moved,30
LuongAttentionSeq2seq,reverse_target,55
LuongAttentionSeq2seq,symmetric_augmented,35
ReinflectionSeq2seq,basic,822
ReinflectionSeq2seq,bigram,5
ReinflectionSeq2seq,pos_moved,31


## Highest and average Hungarian dev accuracy by experiment and data type

(size is the number of entries in that group)

In [14]:
hun = all_experiments[all_experiments.language=='hungarian']
hun.groupby(['exp_type', 'data_corrected', 'model']).dev_acc.agg(['max', 'mean', 'size'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,max,mean,size
exp_type,data_corrected,model,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
basic,False,LuongAttentionSeq2seq,0.856,0.697031,327
basic,False,ReinflectionSeq2seq,0.85,0.559888,412
basic,True,LuongAttentionSeq2seq,0.939597,0.668775,61
basic,True,ReinflectionSeq2seq,0.9217,0.581879,5
bigram,False,ReinflectionSeq2seq,0.569,0.2318,5
pos_moved,False,LuongAttentionSeq2seq,0.846,0.670133,30
pos_moved,False,ReinflectionSeq2seq,0.854,0.632581,31
reverse_target,False,LuongAttentionSeq2seq,0.859,0.783309,55
symmetric_augmented,True,LuongAttentionSeq2seq,0.940716,0.807734,35


# 10 best Hungarian experiments

In [15]:
hun.loc[hun.dev_acc.sort_values(ascending=False)[:10].index][['model', 'exp_type', 'data_corrected', 'dev_acc', 'train_acc']]

Unnamed: 0,model,exp_type,data_corrected,dev_acc,train_acc
100,LuongAttentionSeq2seq,symmetric_augmented,True,0.940716,0.997445
192,LuongAttentionSeq2seq,basic,True,0.939597,0.981328
75,LuongAttentionSeq2seq,symmetric_augmented,True,0.931767,0.995542
193,LuongAttentionSeq2seq,basic,True,0.92953,0.994633
215,LuongAttentionSeq2seq,basic,True,0.928412,0.984571
84,LuongAttentionSeq2seq,symmetric_augmented,True,0.927293,0.987501
67,LuongAttentionSeq2seq,symmetric_augmented,True,0.927293,0.990958
196,LuongAttentionSeq2seq,basic,True,0.927293,0.963104
189,LuongAttentionSeq2seq,basic,True,0.927293,0.963886
96,LuongAttentionSeq2seq,symmetric_augmented,True,0.925056,0.981264


# Other languages

I ran the two basic models on all languages and data sizes at least once.

## 100% languages

In [16]:
m = all_experiments.groupby('language').dev_acc.max().to_frame().reset_index()
m[m.dev_acc == 1]

Unnamed: 0,language,dev_acc
20,friulian,1.0
35,kabardian,1.0
60,occitan,1.0
66,pashto,1.0
78,swahili,1.0
87,uzbek,1.0


## 0 accuracy languages :(

In [19]:
highest[highest.dev_acc==0][['language', 'train_size']]

Unnamed: 0,language,train_size
1020,greenlandic,low
1037,ingrian,low
1048,karelian,low
1050,kashubian,low
1052,kazakh,low
1054,khakas,low
1086,mapudungun,low
1091,middle-high-german,low
1093,murrinhpatha,low
1101,norman,low


### Two-headed attention

The lemma and the tags are encoded separately and two attention separately attend to them while decoding.

### Hard monotonic attention

TODO