# Merge Large Language Models with mergekit

Reference: [mergekit](https://github.com/cg123/mergekit)

Note: only requires CPU

### Merge configurations:

#### SLERP

```yaml
slices:
  - sources:
      - model: OpenPipe/mistral-ft-optimized-1218
        layer_range: [0, 32]
      - model: mlabonne/NeuralHermes-2.5-Mistral-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: OpenPipe/mistral-ft-optimized-1218
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
```

#### [TIES-Merging](https://arxiv.org/abs/2306.01708)

```yaml
models:
  - model: mistralai/Mistral-7B-v0.1
    # no parameters necessary for base model
  - model: OpenPipe/mistral-ft-optimized-1218
    parameters:
      density: 0.5
      weight: 0.5
  - model: mlabonne/NeuralHermes-2.5-Mistral-7B
    parameters:
      density: 0.5
      weight: 0.3
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
  normalize: true
dtype: float16
```

#### [DARE](https://arxiv.org/abs/2311.03099)

```yaml
models:
  - model: mistralai/Mistral-7B-v0.1
    # No parameters necessary for base model
  - model: samir-fama/SamirGPT-v1
    parameters:
      density: 0.53
      weight: 0.4
  - model: abacusai/Slerp-CM-mist-dpo
    parameters:
      density: 0.53
      weight: 0.3
  - model: EmbeddedLLM/Mistral-7B-Merge-14-v0.2
    parameters:
      density: 0.53
      weight: 0.3
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
  int8_mask: true
dtype: bfloat16
```

#### Passthrough

```yaml
slices:
  - sources:
    - model: OpenPipe/mistral-ft-optimized-1218
      layer_range: [0, 32]
  - sources:
    - model: mlabonne/NeuralHermes-2.5-Mistral-7B
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
```


In [1]:
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .

!pip install -qU huggingface_hub

Cloning into 'mergekit'...
remote: Enumerating objects: 1100, done.[K
remote: Counting objects: 100% (563/563), done.[K
remote: Compressing objects: 100% (234/234), done.[K
remote: Total 1100 (delta 417), reused 433 (delta 327), pack-reused 537[K
Receiving objects: 100% (1100/1100), 299.11 KiB | 1.58 MiB/s, done.
Resolving deltas: 100% (746/746), done.
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m381.9/381.9 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [3

In [6]:
import yaml

MODEL_NAME = "neural-chat-7b-v3-1-slerp"
yaml_config = """
slices:
  - sources:
      - model: meta-math/MetaMath-Mistral-7B
        layer_range: [0, 32]
      - model: Intel/neural-chat-7b-v3-1
        layer_range: [0, 32]
merge_method: slerp
base_model: meta-math/MetaMath-Mistral-7B
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
"""

# Save config as yaml file
with open("config.yaml", "w", encoding="utf-8") as f:
    f.write(yaml_config)

In [7]:
# Merge models
!mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle

config.json: 100% 620/620 [00:00<00:00, 3.00MB/s]
config.json: 100% 625/625 [00:00<00:00, 2.97MB/s]
Fetching 9 files:   0% 0/9 [00:00<?, ?it/s]
pytorch_model.bin.index.json:   0% 0.00/23.9k [00:00<?, ?B/s][A

tokenizer_config.json: 100% 1.14k/1.14k [00:00<00:00, 4.83MB/s]
pytorch_model.bin.index.json: 100% 23.9k/23.9k [00:00<00:00, 5.79MB/s]

generation_config.json: 100% 116/116 [00:00<00:00, 545kB/s]

special_tokens_map.json: 100% 552/552 [00:00<00:00, 1.99MB/s]

added_tokens.json: 100% 21.0/21.0 [00:00<00:00, 75.4kB/s]
Fetching 9 files:  11% 1/9 [00:01<00:08,  1.11s/it]
pytorch_model-00002-of-00002.bin:   0% 0.00/4.54G [00:00<?, ?B/s][A

tokenizer.model:   0% 0.00/493k [00:00<?, ?B/s][A[A


pytorch_model-00001-of-00002.bin:   0% 0.00/9.94G [00:00<?, ?B/s][A[A[A

tokenizer.model: 100% 493k/493k [00:00<00:00, 2.51MB/s]

pytorch_model-00002-of-00002.bin:   0% 10.5M/4.54G [00:00<02:50, 26.6MB/s][A
pytorch_model-00002-of-00002.bin:   0% 21.0M/4.54G [00:00<02:12, 34.0MB/s][A
pytor

In [8]:
from huggingface_hub import ModelCard, ModelCardData
from jinja2 import Template

username = "kesamet"

template_text = """
---
license: apache-2.0
tags:
- merge
- mergekit
- lazymergekit
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## 🧩 Configuration

```yaml
{{- yaml_config -}}
```
"""

# Get list of models from config
data = yaml.safe_load(yaml_config)
if "models" in data:
    # models = [data["models"][i]["model"] for i in range(len(data["models"])) if "parameters" in data["models"][i]]
    models = [x["model"] for x in data["models"] if "parameters" in x]
elif "parameters" in data:
    # models = [data["slices"][0]["sources"][i]["model"] for i in range(len(data["slices"][0]["sources"]))]
    models = [x["model"] for x in data["slices"][0]["sources"]]
elif "slices" in data:
    # models = [data["slices"][i]["sources"][0]["model"] for i in range(len(data["slices"]))]
    models = [x["sources"][0]["model"] for x in data["slices"]]
else:
    raise Exception("No models or slices found in yaml config")

# Fill the template
content = Template(template_text.strip()).render(
    model_name=MODEL_NAME,
    models=models,
    yaml_config=yaml_config,
    username=username,
)

# Save the model card
card = ModelCard(content)
card.save("merge/README.md")

In [9]:
from google.colab import userdata
from huggingface_hub import HfApi

# Defined in the secrets tab in Google Colab
api = HfApi(token=userdata.get("HF_TOKEN"))

api.create_repo(
    repo_id=f"{username}/{MODEL_NAME}",
    repo_type="model"
)
api.upload_folder(
    repo_id=f"{username}/{MODEL_NAME}",
    folder_path="merge",
)

model-00001-of-00008.safetensors:   0%|          | 0.00/2.00G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Upload 9 LFS files:   0%|          | 0/9 [00:00<?, ?it/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/2.00G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/638M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/kesamet/neural-chat-7b-v3-1-slerp/commit/b2ee0a73486df2aebc42689fd08b357c13618fc5', commit_message='Upload folder using huggingface_hub', commit_description='', oid='b2ee0a73486df2aebc42689fd08b357c13618fc5', pr_url=None, pr_revision=None, pr_num=None)