# Experiments
We check where typological grouping can be most effective,
and which type of typological grouping works best.

1. Grouping in function aggregation
- Phylogeny OR typology inspired stacks
- Greater dataset for adapter training
2. Parameter aggregation
- Arithmetic: typology-informed weights for aggregation
3. Representation aggregation
- A bit what EMEA does, not efficient at inference time
    - EMEA even worse as they "learn" at inference

## 1. Stacks
Train a joint language adapter on a group of languages through MLM
Here, a distinction could still be made between:
- training jointly, no stack
    - equal presence of all languages
    - weighted presence of all languages
- training jointly in a stack with target language adapter on top
    - e.g. We already have a "Romance" adapter, train "Asturian" adapter on top of this
- training jointly with a *changing stack*, activating the adapter for the language batch
    - What Faisal does?
# 2. Parameter aggregation
Arithmetic operations on adapters:
- adding existing adapters and compare with jointly trained family adapters
    - "average" of adapters == jointly trained? (cf. Linear mode connectivity)
- re-creating typological profile of a language
    - preparation step to then "fine-tune" on little data (typologically inspired initialization)



In [4]:
from adapters import AutoAdapterModel, Stack


model = AutoAdapterModel.from_pretrained("xlm-roberta-base")
# we load in two adapters
model.load_adapter("./trained_adapters/mono/de", load_as="de")
model.load_adapter("./trained_adapters/mono/en", load_as="en")
model.load_adapter("./trained_adapters/mono/eus", load_as="eu")
# model.load_adapter("./trained_adapters/family/en-de-nl-af/mlm", load_as="fam")

model.active_adapters = Stack("de", "eu", "en")

Some weights of XLMRobertaAdapterModel were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
import re

sd = model.state_dict()
organized_layers = {}
# for each layer:
# group 1: layer number
# group 2: adapter name
# group 3: projection
# group 4: projection weight/bias
pattern = "roberta\.encoder\.layer\.([\d\w]+)\.output\.adapters\.(\w+)\.(\w+)(?:\.0)?\.(\w+)"

inv_adapters = {}
# For invertible adapters
# group 1: adapter name
# group 2: F/G identifier
# group 3: 0/2 layer number
# group 4: projection weight/bias
inv_pattern = "roberta\.invertible_adapters\.(\w+)\.(\w+)\.(\d)\.(\w+)"
for key in model.state_dict().keys():
    match = re.search(pattern, key)
    if match:
        layer_num = str(match.group(1))
        if layer_num not in organized_layers:
            organized_layers[layer_num] = {}
        adapter_name = match.group(2)
        projection = match.group(3)
        projection_type = match.group(4)
        # print(f"Layer: {layer_num}, Adapter: {adapter_name}, Projection: {projection}, Type: {projection_type}")
        if projection not in organized_layers[layer_num]:
            organized_layers[layer_num][projection] = {}
        if projection_type not in organized_layers[layer_num][projection]:
            organized_layers[layer_num][projection][projection_type] = []
        organized_layers[layer_num][projection][projection_type].append(key)
    inv_match = re.search(inv_pattern, key)
    if inv_match:
        adapter_name = inv_match.group(1)
        identifier = inv_match.group(2)
        layer_num = inv_match.group(3)
        projection_type = inv_match.group(4)
        if identifier not in inv_adapters:
            inv_adapters[identifier] = {}
        if layer_num not in inv_adapters[identifier]:
            inv_adapters[identifier][layer_num] = {}
        if projection_type not in inv_adapters[identifier][layer_num]:
            inv_adapters[identifier][layer_num][projection_type] = []
        inv_adapters[identifier][layer_num][projection_type].append(key)

In [6]:
from collections import OrderedDict

# we now average the weights and biases of all layers over all adapters
new_state_dict = OrderedDict()
# to ensure we don't get problems, we check the config of all adapters
all_adapters = list(model.adapters_config.adapters.keys())
config_id = model.adapters_config.adapters[all_adapters[0]]
config = model.adapters_config.config_map[config_id]
for i in range(1, len(all_adapters)):
    config_id = model.adapters_config.adapters[all_adapters[i]]
    config_i = model.adapters_config.config_map[config_id]
    assert config == config_i, (
        f"Config mismatch: {config} vs {config_i}\nCurrent methodology only works for same config"
    )

# if no problem, we go to the next step
for layer_num, projections in organized_layers.items():
    for projection, types in projections.items():
        for projection_type, keys in types.items():
            if projection_type == "weight":
                # average the weights
                avg_weight = sum([sd[key] for key in keys]) / len(keys)
                # test: 2/3 "en", 1/3 "de"
                # avg_weight = (2/3) * sd[keys[0]] + (1/3) * sd[keys[1]]
                # print(f"Layer: {layer_num}, Projection: {projection}, Type: {projection_type}, Avg. Weight Shape: {avg_weight.shape}")
                if projection == "adapter_down":
                    new_state_dict[
                        f"roberta.encoder.layer.{layer_num}.output.adapters.joined_adapter.{projection}.0.{projection_type}"
                    ] = avg_weight
                else:
                    new_state_dict[
                        f"roberta.encoder.layer.{layer_num}.output.adapters.joined_adapter.{projection}.{projection_type}"
                    ] = avg_weight

            if projection_type == "bias":
                # average the biases
                avg_bias = sum([sd[key] for key in keys]) / len(keys)
                # test: 2/3 "en", 1/3 "de"
                # avg_bias = (2/3) * sd[keys[0]] + (1/3) * sd[keys[1]]
                # print(f"Layer: {layer_num}, Projection: {projection}, Type: {projection_type}, Avg. Bias Shape: {avg_bias.shape}")
                if projection == "adapter_down":
                    new_state_dict[
                        f"roberta.encoder.layer.{layer_num}.output.adapters.joined_adapter.{projection}.0.{projection_type}"
                    ] = avg_bias
                else:
                    new_state_dict[
                        f"roberta.encoder.layer.{layer_num}.output.adapters.joined_adapter.{projection}.{projection_type}"
                    ] = avg_bias
for identifier, layer_num in inv_adapters.items():
    for layer_num, projections in layer_num.items():
        for projection_type, keys in projections.items():
            if projection_type == "weight":
                # average the weights
                avg_weight = sum([sd[key] for key in keys]) / len(keys)
                # test: 2/3 "en", 1/3 "de"
                # avg_weight = (2/3) * sd[keys[0]] + (1/3) * sd[keys[1]]
                # print(f"Layer: {layer_num}, Projection: {projection}, Type: {projection_type}, Avg. Weight Shape: {avg_weight.shape}")
                new_state_dict[
                    f"roberta.invertible_adapters.joined_adapter.{identifier}.{layer_num}.{projection_type}"
                ] = avg_weight
            if projection_type == "bias":
                # average the biases
                avg_bias = sum([sd[key] for key in keys]) / len(keys)
                # test: 2/3 "en", 1/3 "de"
                # avg_bias = (2/3) * sd[keys[0]] + (1/3) * sd[keys[1]]
                # print(f"Layer: {layer_num}, Projection: {projection}, Type: {projection_type}, Avg. Bias Shape: {avg_bias.shape}")
                new_state_dict[
                    f"roberta.invertible_adapters.joined_adapter.{identifier}.{layer_num}.{projection_type}"
                ] = avg_bias

In [46]:
# we have config saved from the last step, we create a new one in the same form
if "joined_adapter" in model.adapters_config.adapters.keys():
    # remove the old one
    model.delete_adapter("joined_adapter")
model.add_adapter("joined_adapter", config=config)

In [47]:
for name, param in model.named_parameters():
    # e.g. "roberta.encoder.layer.0.output.adapters.joined_adapter.adapter_down.0.weight"
    if "joined_adapter" in name and name in new_state_dict:
        param.data.copy_(new_state_dict[name])

In [48]:
for key in list(model.adapters_config.adapters.keys()):
    if key != "joined_adapter":
        model.delete_adapter(key)
model.roberta.invertible_adapters

ModuleDict(
  (joined_adapter): NICECouplingBlock(
    (F): Sequential(
      (0): Linear(in_features=384, out_features=192, bias=True)
      (1): Activation_Function_Class(
        (f): ReLU()
      )
      (2): Linear(in_features=192, out_features=384, bias=True)
    )
    (G): Sequential(
      (0): Linear(in_features=384, out_features=192, bias=True)
      (1): Activation_Function_Class(
        (f): ReLU()
      )
      (2): Linear(in_features=192, out_features=384, bias=True)
    )
  )
)

In [35]:
model.save_adapter("./trained_adapters/mono/huge_avg_adapter", "huge_avg_adapter")

In [29]:
# we evaluated the adapter (along with de and en) on ner in another script
import json

results = json.load(open("methods/eval_dict_joined.json"))

In [30]:
for (name, de), (_, en), (_, joined) in zip(
    results["de"].items(), results["en"].items(), results["joined_adapter"].items()
):
    print(f"{name}, avg en/de: {(en + de) / 2}, joined: {joined}")

eval_loss, avg en/de: 0.467184379696846, joined: 0.4572905898094177
eval_model_preparation_time, avg en/de: 0.0086, joined: 0.006
eval_precision, avg en/de: 0.5284809848704373, joined: 0.5575268817204301
eval_recall, avg en/de: 0.7186234817813766, joined: 0.6997300944669366
eval_f1, avg en/de: 0.6085899656003557, joined: 0.6205864751645721
eval_accuracy, avg en/de: 0.8567322573513155, joined: 0.8606566438204731
eval_runtime, avg en/de: 4.626099999999999, joined: 4.6817
eval_samples_per_second, avg en/de: 216.16500000000002, joined: 213.599
eval_steps_per_second, avg en/de: 27.0205, joined: 26.7


In [18]:
from huggingface_hub import HfApi

api = HfApi()
# Fetch all AdapterHub xlm-roberta-base adapters
models = api.list_models(author="AdapterHub", library="adapter-transformers", search="xlm-roberta-base-")
# we print all found models

to_load = {
    m.modelId: m.modelId.split("xlm-roberta-base-")[-1].rsplit("-wiki_pfeiffer", 1)[0]
    for m in models
    if m.modelId.startswith("AdapterHub/xlm-roberta-base-") and m.modelId.endswith("-wiki_pfeiffer")
}

In [39]:
from adapters import AutoAdapterModel, Stack

model = AutoAdapterModel.from_pretrained("xlm-roberta-base")
didnt_load = []
for link, id in to_load.items():
    try:
        model.load_adapter(link, load_as=id)
    except OSError:
        print(f"Could not load {link}")
        didnt_load.append(link)
        continue

Some weights of XLMRobertaAdapterModel were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

In [52]:
import re
from collections import OrderedDict


def merge_loaded_adapters(model, merge_adapter_name="joined_adapter"):
    def check_compatibility():
        # to ensure we don't get problems, we check the config of all adapters
        all_adapters = list(model.adapters_config.adapters.keys())
        config_id = model.adapters_config.adapters[all_adapters[0]]
        config = model.adapters_config.config_map[config_id]

        for i in range(1, len(all_adapters)):
            config_id = model.adapters_config.adapters[all_adapters[i]]
            config_i = model.adapters_config.config_map[config_id]
            assert config == config_i, (
                f"Config mismatch: {config} vs {config_i}\nCurrent methodology only works for same config"
            )

    check_compatibility()

    organized_layers = {}
    # for each layer:
    # group 1: layer number
    # group 2: adapter name
    # group 3: projection
    # group 4: projection weight/bias
    pattern = "roberta\.encoder\.layer\.([\d\w]+)\.output\.adapters\.(\w+)\.(\w+)(?:\.0)?\.(\w+)"

    inv_adapters = {}
    # For invertible adapters
    # group 1: adapter name
    # group 2: F/G identifier
    # group 3: 0/2 layer number
    # group 4: projection weight/bias
    inv_pattern = "roberta\.invertible_adapters\.(\w+)\.(\w+)\.(\d)\.(\w+)"

    for key in model.state_dict().keys():
        layer_match = re.search(pattern, key)
        if layer_match:
            layer_num = layer_match.group(1)
            # adapter_name = layer_match.group(2)
            projection = layer_match.group(3)
            projection_type = layer_match.group(4)
            # print(f"Layer: {layer_num}, Adapter: {adapter_name}, Projection: {projection}, Type: {projection_type}")
            if layer_num not in organized_layers:
                organized_layers[layer_num] = {}
            if projection not in organized_layers[layer_num]:
                organized_layers[layer_num][projection] = {}
            if projection_type not in organized_layers[layer_num][projection]:
                organized_layers[layer_num][projection][projection_type] = []
            organized_layers[layer_num][projection][projection_type].append(key)

        inv_match = re.search(inv_pattern, key)
        if inv_match:
            # adapter_name = inv_match.group(1)
            identifier = inv_match.group(2)
            layer_num = inv_match.group(3)
            projection_type = inv_match.group(4)
            if identifier not in inv_adapters:
                inv_adapters[identifier] = {}
            if layer_num not in inv_adapters[identifier]:
                inv_adapters[identifier][layer_num] = {}
            if projection_type not in inv_adapters[identifier][layer_num]:
                inv_adapters[identifier][layer_num][projection_type] = []
            inv_adapters[identifier][layer_num][projection_type].append(key)

    new_state_dict = OrderedDict()
    sd = model.state_dict()

    # if no problem, we go to the next step
    for layer_num, projections in organized_layers.items():
        for projection, types in projections.items():
            for projection_type, keys in types.items():
                result = sum([sd[key] for key in keys]) / len(keys)
                if projection == "adapter_down":
                    new_state_dict[
                        f"roberta.encoder.layer.{layer_num}.output.adapters.joined_adapter.{projection}.0.{projection_type}"
                    ] = result
                else:
                    new_state_dict[
                        f"roberta.encoder.layer.{layer_num}.output.adapters.joined_adapter.{projection}.{projection_type}"
                    ] = result

    for identifier, layer_num in inv_adapters.items():
        for layer_num, projections in layer_num.items():
            for projection_type, keys in projections.items():
                result = sum([sd[key] for key in keys]) / len(keys)
                new_state_dict[
                    f"roberta.invertible_adapters.joined_adapter.{identifier}.{layer_num}.{projection_type}"
                ] = result
    return new_state_dict

In [209]:
import re
from collections import OrderedDict


def improved_merge_loaded_adapters(
    model, merge_adapter_name="joined_adapter", weights=None, delete_other=False, patterns=False, model_type="roberta"
):
    # to ensure we don't get problems, we check the config of all adapters
    all_adapters = list(model.adapters_config.adapters.keys())
    config_id = model.adapters_config.adapters[all_adapters[0]]
    config = model.adapters_config.config_map[config_id]

    for i in range(1, len(all_adapters)):
        config_id = model.adapters_config.adapters[all_adapters[i]]
        config_i = model.adapters_config.config_map[config_id]
        assert config == config_i, (
            f"Config mismatch: {config} vs {config_i}\nCurrent methodology only works for same config"
        )

    if weights is None:
        weights = [1 / len(all_adapters)] * len(all_adapters)
    if len(weights) != len(all_adapters):
        raise ValueError(f"Weights length {len(weights)} does not match number of adapters {len(all_adapters)}")

    if not patterns:
        patterns = [
            f"{model_type}\.encoder\.layer\.([\d\w]+)\.output\.adapters\.(?:\w+)\.(\w+)(?:\.0)?\.(\w+)",
            f"{model_type}\.invertible_adapters\.(?:\w+)\.(\w+)\.(\d)\.(\w+)",
        ]
    comp_patterns = [re.compile(pattern) for pattern in patterns]
    organized_layers = {}
    for i, pattern in enumerate(patterns):
        # we make a dictionary for each pattern
        organized_layers[i] = {}

    for key in model.state_dict().keys():
        for i, pattern in enumerate(comp_patterns):
            match = re.search(pattern, key)
            if match:
                one = match.group(1)
                two = match.group(2)
                three = match.group(3)
                if one not in organized_layers[i]:
                    organized_layers[i][one] = {}
                if two not in organized_layers[i][one]:
                    organized_layers[i][one][two] = {}
                if three not in organized_layers[i][one][two]:
                    organized_layers[i][one][two][three] = []
                organized_layers[i][one][two][three].append(key)

    new_state_dict = OrderedDict()
    sd = model.state_dict()

    for i, one in organized_layers.items():
        for one, two in one.items():
            for two, three in two.items():
                for three, keys in three.items():
                    result = sum([sd[key] * weights[j] for j, key in enumerate(keys)])
                    if two == "adapter_down":
                        new_state_dict[
                            f"{model_type}.encoder.layer.{one}.output.adapters.{merge_adapter_name}.{two}.0.{three}"
                        ] = result
                    elif two == "adapter_up":
                        new_state_dict[
                            f"{model_type}.encoder.layer.{one}.output.adapters.{merge_adapter_name}.{two}.{three}"
                        ] = result
                    else:
                        # we are in the second pattern
                        new_state_dict[f"{model_type}.invertible_adapters.{merge_adapter_name}.{one}.{two}.{three}"] = (
                            result
                        )

    # we now load in the new model
    if merge_adapter_name in model.adapters_config.adapters.keys():
        # remove the old one
        model.delete_adapter(merge_adapter_name)
    model.add_adapter(merge_adapter_name, config=config)
    for name, param in model.named_parameters():
        # e.g. "roberta.encoder.layer.0.output.adapters.joined_adapter.adapter_down.0.weight"
        if merge_adapter_name in name and name in new_state_dict:
            param.data.copy_(new_state_dict[name])
    if delete_other:
        for key in list(model.adapters_config.adapters.keys()):
            if key != merge_adapter_name:
                model.delete_adapter(key)

    # no need to return anything as the model is changed in place

In [210]:
import copy

model1 = copy.deepcopy(model.cpu())
improved_merge_loaded_adapters(model1, delete_other=True)
new_state_dict = merge_loaded_adapters(model)
# we have config saved from the last step, we create a new one in the same form
if "joined_adapter" in model.adapters_config.adapters.keys():
    # remove the old one
    model.delete_adapter("joined_adapter")
model.add_adapter("joined_adapter", config=config)
for name, param in model.named_parameters():
    # e.g. "roberta.encoder.layer.0.output.adapters.joined_adapter.adapter_down.0.weight"
    if "joined_adapter" in name and name in new_state_dict:
        param.data.copy_(new_state_dict[name])
for key in list(model.adapters_config.adapters.keys()):
    if key != "joined_adapter":
        model.delete_adapter(key)

2025-04-25 16:43:58,490 - adapters.configuration.model_adapters_config - INFO - Adding adapter 'joined_adapter'.
2025-04-25 16:43:58,713 - adapters.configuration.model_adapters_config - INFO - Adding adapter 'joined_adapter'.


In [211]:
import torch

sd1 = model1.state_dict()
sd2 = model.state_dict()
for key in sd1.keys():
    if key in sd2.keys():
        assert torch.equal(sd1[key], sd2[key]), f"Key {key} is not equal"
print("Models are equal")

Models are equal


In [117]:
import re
from qq import LanguageData, TagType
from typdiv_sampling.evaluation import Evaluator
from pprint import pprint
from bs4 import BeautifulSoup
import requests

ld = LanguageData.from_db()

evaluator = Evaluator()
distances = evaluator.distances

all_english = {}
not_found = []
for key in distances.keys():
    try:
        all_english[ld.get(key, tag_type=TagType.Glottocode).english_name] = key
    except KeyError:
        not_found.append(key)
        continue

for glot in not_found:
    url = f"https://glottolog.org/resource/languoid/id/{glot}"
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    # find the first h1 tag
    h3 = soup.find("h3")
    if h3:
        span = h3.find("span")
        if span:
            # print(span.text)  # This will print only the language name
            all_english[span.text] = glot
        else:
            print("No <span> with class 'level-language' found.")
            print(h3.text)  # This will print the entire h3 text
    else:
        print("No <h3> tag found.")


def lookup_lang(snippet):
    results = []
    for lang in all_english.keys():
        if re.search(snippet, lang):
            results.append((lang, all_english[lang]))
    return results

KeyboardInterrupt: 

In [118]:
manuals = {
    "Arabic": "arab1267",
    "Swahili": "swah1253",
    "Bengali": "beng1282",
    "Chinese": "mand1415",
    "Persian": "west2369",
    "Yoruba": "ilaa1246",
    "Nepali": "nepa1254",
    "Quechua": "cusc1236",
    "Estonian": "esto1258",
    "Guarani": "east2555",
}

glots = {}
probs = []

for lang in to_load.values():
    eng = ld.get(lang, tag_type=TagType.BCP_47_CODE).english_name
    glot = ld.get(lang, tag_type=TagType.BCP_47_CODE).glottocode
    # we need to find if glot is in distances
    if not glot:
        options = lookup_lang(eng[:-1])
        # print(options)
        if options:
            verb_name, glot = options[0]
            print(f"Found {verb_name, glot} for {eng}")
            glots[lang] = glot
        else:
            print(f"Error for {lang}: {eng} - {glot}")
            probs.append((lang, eng, glot))
            continue
    if glot not in distances.keys():
        print(f"Error: {lang} - {eng} - {glot}")
        probs.append((lang, eng, glot))
        continue
    if eng and glot:
        glots[lang] = glot

pprint(glots)
print(probs)

Found ('Congo Swahili', 'cong1236') for Swahili
Found ('Ayacucho Quechua', 'ayac1239') for Quechua
Error: de - German - stan1295
Found ('Standard Estonian', 'esto1258') for Estonian
Found ('Eastern Bolivian Guaraní', 'east2555') for Guarani
Found ('Baharna Arabic', 'baha1259') for Arabic
Error: es - Spanish - stan1288
Found ('Hakka Chinese', 'hakk1236') for Chinese
Error: mhr - Eastern Mari - east2328
Error: cdo - Min Dong Chinese - mind1253
Error: xmf - Mingrelian - ming1252
{'ar': 'baha1259',
 'el': 'mode1248',
 'en': 'stan1293',
 'et': 'esto1258',
 'gn': 'east2555',
 'hi': 'hind1269',
 'ht': 'hait1244',
 'id': 'indo1316',
 'ilo': 'ilok1237',
 'is': 'icel1247',
 'it': 'ital1282',
 'ja': 'nucl1643',
 'jv': 'java1254',
 'mi': 'maor1246',
 'my': 'nucl1310',
 'qu': 'ayac1239',
 'ru': 'russ1263',
 'sw': 'cong1236',
 'ta': 'tami1289',
 'th': 'thai1261',
 'tk': 'turk1304',
 'tr': 'nucl1301',
 'vi': 'viet1252',
 'zh': 'hakk1236'}
[('de', 'German', 'stan1295'), ('es', 'Spanish', 'stan1288'), 

In [200]:
from urielplus import urielplus

u = urielplus.URIELPlus()
u.reset()
u.set_cache(True)
u.integrate_databases()
u.set_glottocodes()

Resetting to URIEL involves copying the files ['family_features.npz', 'features.npz', 'geocoord_features.npz'] into the data directory. Any files with the same name will be replaced. Continue? [Y/n] 

2025-04-25 16:35:20,332 - root - INFO - Importing all databases....
2025-04-25 16:35:20,335 - root - INFO - Importing updated SAPHON from "saphon_data.csv"....
2025-04-25 16:35:21,840 - root - INFO - Updated SAPHON integration complete..
2025-04-25 16:35:21,842 - root - INFO - Importing BDPROTO from "bdproto_data.csv"....
2025-04-25 16:35:21,843 - root - INFO - Converting ISO 639-3 codes to Glottocodes....
2025-04-25 16:35:22,236 - root - INFO - Conversion to Glottocodes complete.


KeyboardInterrupt: 

In [217]:
def typological_approximation_dev(target, languages):
    """
    This function takes a target language and a list of languages.
    It weights the other languages depending on their closeness to the target language.
    """
    # 1. check if all languages are in the distances
    for lang in languages:
        if glots[lang] not in distances.keys():
            print(f"Language {lang}, {glots[lang]} not in distances")

    # 2. retrieve closeness score of all languages to target language
    weights = []
    for lang in languages:
        # get the distance
        try:
            dist = 1 - u.new_distance("featural", [glots[lang], target])
            print(f"Distance {lang} to {target}: {dist}")
        except SystemExit:
            print(f"Error: {lang} - {glots[lang]} - {target}")
            dist = 0
        weights.append(dist)

    # 1. softmax over weights
    print(f"Weights before softmax: {weights}")
    weights = torch.softmax(torch.tensor(weights), dim=0)
    # we need to convert to list
    weights = weights.tolist()
    print(f"Weights after softmax: {weights}")

In [219]:
weights = typological_approximation_dev("afri1274", list(glots.keys()))

2025-04-25 16:48:23,105 - root - INFO - In new_distance, calculated angular distance for featural with thai1261 and afri1274: 0.011420488357543945 seconds
2025-04-25 16:48:23,119 - root - INFO - In new_distance, calculated angular distance for featural with nucl1310 and afri1274: 0.010793685913085938 seconds
2025-04-25 16:48:23,129 - root - INFO - In new_distance, calculated angular distance for featural with hind1269 and afri1274: 0.008425235748291016 seconds
2025-04-25 16:48:23,141 - root - INFO - In new_distance, calculated angular distance for featural with ilok1237 and afri1274: 0.009675741195678711 seconds
2025-04-25 16:48:23,150 - root - INFO - In new_distance, calculated angular distance for featural with hait1244 and afri1274: 0.00613713264465332 seconds
2025-04-25 16:48:23,160 - root - INFO - In new_distance, calculated angular distance for featural with nucl1301 and afri1274: 0.008888959884643555 seconds
2025-04-25 16:48:23,172 - root - INFO - In new_distance, calculated ang

Distance th to afri1274: 0.5305
Distance my to afri1274: 0.5578
Distance hi to afri1274: 0.5374
Distance ilo to afri1274: 0.5095
Distance ht to afri1274: 0.5662
Distance tr to afri1274: 0.4901
Distance mi to afri1274: 0.5587
Distance vi to afri1274: 0.5451
Distance is to afri1274: 0.4266
Distance it to afri1274: 0.4516
Distance ta to afri1274: 0.5581
Distance jv to afri1274: 0.4661
Distance ja to afri1274: 0.5726
Error: sw - cong1236 - afri1274
Distance qu to afri1274: 0.6357
Distance el to afri1274: 0.5302
Distance et to afri1274: 0.2952


2025-04-25 16:48:23,305 - root - INFO - In new_distance, calculated angular distance for featural with east2555 and afri1274: 0.009303569793701172 seconds
2025-04-25 16:48:23,316 - root - INFO - In new_distance, calculated angular distance for featural with indo1316 and afri1274: 0.008577585220336914 seconds
2025-04-25 16:48:23,328 - root - INFO - In new_distance, calculated angular distance for featural with stan1293 and afri1274: 0.008578062057495117 seconds
2025-04-25 16:48:23,333 - root - ERROR - No shared featural features between baha1259 and afri1274 for which the two languages have information.
Unable to calculate featural distance.
2025-04-25 16:48:23,343 - root - INFO - In new_distance, calculated angular distance for featural with turk1304 and afri1274: 0.007827520370483398 seconds
2025-04-25 16:48:23,354 - root - INFO - In new_distance, calculated angular distance for featural with hakk1236 and afri1274: 0.007961750030517578 seconds


Distance ru to afri1274: 0.4596
Distance gn to afri1274: 0.5552
Distance id to afri1274: 0.5453
Distance en to afri1274: 0.4907
Error: ar - baha1259 - afri1274
Distance tk to afri1274: 0.3918
Distance zh to afri1274: 0.5288
Weights before softmax: [np.float64(0.5305), np.float64(0.5578), np.float64(0.5374), np.float64(0.5095), np.float64(0.5662), np.float64(0.4901), np.float64(0.5587), np.float64(0.5451), np.float64(0.4266), np.float64(0.4516), np.float64(0.5581), np.float64(0.4661), np.float64(0.5726), 0, np.float64(0.6357), np.float64(0.5302), np.float64(0.2952), np.float64(0.4596), np.float64(0.5552), np.float64(0.5453), np.float64(0.4907), 0, np.float64(0.3918), np.float64(0.5288)]
Weights after softmax: [0.043923663689886254, 0.0451392976126194, 0.04422778498120124, 0.04301088447850758, 0.045520064695405295, 0.0421845150209015, 0.04517994126737195, 0.04456965342997089, 0.03958907558236617, 0.04059127780218449, 0.045152841433374724, 0.041184139187974006, 0.04581232735236898, 0.0258

# Limiting the activated adapters

In [140]:
def merge_loaded_adapters(
    model, merge_adapter_name="joined_adapter", weights=None, delete_other=False, patterns=False, model_type="roberta"
):
    # to ensure we don't get problems, we check the config of all adapters
    all_adapters = list(model.adapters_config.adapters.keys())
    config_id = model.adapters_config.adapters[all_adapters[0]]
    config = model.adapters_config.config_map[config_id]

    for i in range(1, len(all_adapters)):
        config_id = model.adapters_config.adapters[all_adapters[i]]
        config_i = model.adapters_config.config_map[config_id]
        assert config == config_i, (
            f"Config mismatch: {config} vs {config_i}\nCurrent methodology only works for same config"
        )

    if weights is None or weights == {}:
        weights = {adapter: 1 / len(all_adapters) for adapter in all_adapters}
    print("weights:", weights)
    if not patterns:
        patterns = [
            f"{model_type}\.encoder\.layer\.(?P<one>[\d\w]+)\.output\.adapters\.(?P<adapter>\w+)\.(?P<two>\w+)(?:\.0)?\.(?P<three>\w+)",
            f"{model_type}\.invertible_adapters\.(?P<adapter>\w+)\.(?P<one>\w+)\.(?P<two>\d)\.(?P<three>\w+)",
        ]
    comp_patterns = [re.compile(pattern) for pattern in patterns]
    organized_layers = {}
    for i, pattern in enumerate(patterns):
        # we make a dictionary for each pattern
        organized_layers[i] = {}

    for key in model.state_dict().keys():
        for i, pattern in enumerate(comp_patterns):
            match = re.search(pattern, key)
            if match:
                one = match.group("one")
                two = match.group("two")
                three = match.group("three")
                adapter_name = match.group("adapter")
                if adapter_name not in weights.keys():
                    print(f"Adapter {adapter_name} not in weights")
                    continue
                if one not in organized_layers[i]:
                    organized_layers[i][one] = {}
                if two not in organized_layers[i][one]:
                    organized_layers[i][one][two] = {}
                if three not in organized_layers[i][one][two]:
                    organized_layers[i][one][two][three] = []
                organized_layers[i][one][two][three].append((key, adapter_name))
    pprint(organized_layers)
    new_state_dict = OrderedDict()
    sd = model.state_dict()

    for i, one in organized_layers.items():
        for one, two in one.items():
            for two, three in two.items():
                for three, keys in three.items():
                    # result = sum([sd[layer] * weights[adapter_name] for layer, adapter_name in keys])
                    result = 0
                    for layer, adapter_name in keys:
                        print(layer, adapter_name, weights[adapter_name])
                        if adapter_name in weights.keys():
                            result += sd[layer] * weights[adapter_name]

                    if two == "adapter_down":
                        new_state_dict[
                            f"{model_type}.encoder.layer.{one}.output.adapters.{merge_adapter_name}.{two}.0.{three}"
                        ] = result
                    elif two == "adapter_up":
                        new_state_dict[
                            f"{model_type}.encoder.layer.{one}.output.adapters.{merge_adapter_name}.{two}.{three}"
                        ] = result
                    else:
                        # we are in the second pattern
                        new_state_dict[f"{model_type}.invertible_adapters.{merge_adapter_name}.{one}.{two}.{three}"] = (
                            result
                        )

    # we now load in the new model
    if merge_adapter_name in model.adapters_config.adapters.keys():
        # remove the old one
        model.delete_adapter(merge_adapter_name)
    model.add_adapter(merge_adapter_name, config=config)
    for name, param in model.named_parameters():
        # e.g. "roberta.encoder.layer.0.output.adapters.joined_adapter.adapter_down.0.weight"
        if merge_adapter_name in name and name in new_state_dict:
            param.data.copy_(new_state_dict[name])
    if delete_other:
        for key in list(model.adapters_config.adapters.keys()):
            if key != merge_adapter_name:
                model.delete_adapter(key)


def typological_approximation(target, glots, limit=None):
    """
    This function takes a target language and a list of languages.
    It weights the other languages depending on their closeness to the target language.
    If limit is specified and is <1, it will remove all languages with a distance lower than limit.
    If limit is specified and is >=1, it works as a top-k languages filter with the highest similarity.
    """

    # 1. retrieve closeness score of all languages to target language
    weights = {}
    for lang, glot in glots.items():
        # get the distance
        try:
            dist = 1 - u.new_distance("featural", [glot, target])
            print(f"Distance {lang} to {target}: {dist}")
        except SystemExit:
            print(f"Error: {lang} - {glot} - {target}")
            dist = 0
        weights[lang] = dist
    # we add basque
    eu_glot = ld.get("eu", tag_type=TagType.BCP_47_CODE).glottocode
    dist = 1 - u.new_distance("featural", [eu_glot, target])
    print(f"Distance Basque to {target}: {dist}")
    weights["eu"] = dist
    if limit:
        if limit < 1:
            for lang, dist in list(weights.items()):
                if dist < limit:
                    print(f"Removing {lang} with distance {dist}")
                    del weights[lang]
        else:  # we take the best n (limit) languages
            n = min(limit, len(weights))
            # we sort the weights
            sorted_weights = sorted(weights.items(), key=lambda x: x[1], reverse=True)
            # we take the first n
            weights = dict(sorted_weights[:n])

    # 1. softmax over weights
    print(f"Weights before softmax: {weights}")
    soft_weights = torch.softmax(torch.tensor(list(weights.values())), dim=0)
    # we need to convert to list
    soft_weights = soft_weights.tolist()
    # we zippedly print the keys and values
    weights = {k: v for k, v in zip(weights.keys(), soft_weights)}
    print(f"Weights after softmax: {weights}")
    return weights


def get_glots(to_load):
    manuals = {
        "Arabic": "arab1267",
        "Swahili": "swah1253",
        "Bengali": "beng1282",
        "Chinese": "mand1415",
        "Persian": "west2369",
        "Yoruba": "ilaa1246",
        "Nepali": "nepa1254",
        "Quechua": "cusc1236",
        "Estonian": "esto1258",
        "Guarani": "east2555",
    }

    glots = {}
    probs = []

    for lang in to_load.values():
        eng = ld.get(lang, tag_type=TagType.BCP_47_CODE).english_name
        glot = ld.get(lang, tag_type=TagType.BCP_47_CODE).glottocode
        # we need to find if glot is in distances
        if not glot:
            if eng in manuals.keys():
                glot = manuals[eng]
        if eng and glot:
            glots[lang] = glot
        else:
            probs.append(lang)

    print("no glottocodes found for these languages: ", probs)
    print("removing them from further consideration")
    for prob in probs:
        del to_load[prob]
        # happens in-place
    return glots

In [141]:
to_load = {
    "English": "en",
    "German": "de",
    "Spanish": "es",
}
glots = get_glots(to_load)

no glottocodes found for these languages:  []
removing them from further consideration


In [146]:
weights = typological_approximation("dutc1256", glots, limit=3)
sorted_ad = max(weights.items(), key=lambda x: x[1])
closest_adapter = max(weights, key=weights.get)
print(f"Best adapter: {closest_adapter}")

2025-05-01 16:08:44,078 - root - INFO - In new_distance, calculated angular distance for featural with stan1293 and dutc1256: 0.009697198867797852 seconds
2025-05-01 16:08:44,086 - root - INFO - In new_distance, calculated angular distance for featural with stan1295 and dutc1256: 0.006561279296875 seconds
2025-05-01 16:08:44,094 - root - INFO - In new_distance, calculated angular distance for featural with stan1288 and dutc1256: 0.007330417633056641 seconds
2025-05-01 16:08:44,104 - root - INFO - In new_distance, calculated angular distance for featural with basq1248 and dutc1256: 0.008893966674804688 seconds


Distance en to dutc1256: 0.5473
Distance de to dutc1256: 0.6954
Distance es to dutc1256: 0.46630000000000005
Distance Basque to dutc1256: 0.393
Weights before softmax: {'de': np.float64(0.6954), 'en': np.float64(0.5473), 'es': np.float64(0.46630000000000005)}
Weights after softmax: {'de': 0.3762802161893739, 'en': 0.3244833164247225, 'es': 0.29923646738590365}
Best adapter: de


In [143]:
merge_loaded_adapters(model, merge_adapter_name="joined_adapter", weights=weights)
model.delete_adapter("joined_adapter")

2025-05-01 15:33:09,644 - adapters.configuration.model_adapters_config - INFO - Adding adapter 'joined_adapter'.


weights: {'de': 0.3333333333333333, 'en': 0.3333333333333333, 'eu': 0.3333333333333333}
{0: {'0': {'adapter_down': {'bias': [('roberta.encoder.layer.0.output.adapters.de.adapter_down.0.bias',
                                      'de'),
                                     ('roberta.encoder.layer.0.output.adapters.en.adapter_down.0.bias',
                                      'en'),
                                     ('roberta.encoder.layer.0.output.adapters.eu.adapter_down.0.bias',
                                      'eu')],
                            'weight': [('roberta.encoder.layer.0.output.adapters.de.adapter_down.0.weight',
                                        'de'),
                                       ('roberta.encoder.layer.0.output.adapters.en.adapter_down.0.weight',
                                        'en'),
                                       ('roberta.encoder.layer.0.output.adapters.eu.adapter_down.0.weight',
                                        'eu')]