# Large Language Models MergeKit

[![Python](https://img.shields.io/pypi/pyversions/tensorflow.svg)](https://badge.fury.io/py/tensorflow)
![Maintainer](https://img.shields.io/badge/maintainer-@louisbrulenaudet-blue)

Mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

When you have a merged model you're happy with, you may want to share it on the Hugging Face Hub. mergekit generates a README.md for your merge with some basic information for a model card. You can edit it to include more details about your merge, like giving it a good name or explaining what it's good at; rewrite it entirely ; or use the generated README.md as-is. It is also possible to edit your README.md online once it has been uploaded to the Hub.

## Citing this project

If you use this code in your research, please use the following BibTeX entry.

```BibTeX
@misc{louisbrulenaudet2023,
  author =       {Louis Brulé Naudet},
  title =        {Large Language Models MergeKit},
  year =         {2023}
}
```

## Feedback

If you have any feedback, please reach out at [louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com).


## Merge Configuration

Merge configurations are YAML documents specifying the operations to perform in order to produce your merged model.
Below are the primary elements of a configuration file:

- `merge_method`: Specifies the method to use for merging models. See [Merge Methods](#merge-methods) for a list.
- `slices`: Defines slices of layers from different models to be used. This field is mutually exclusive with `models`.
- `models`: Defines entire models to be used for merging. This field is mutually exclusive with `slices`.
- `base_model`: Specifies the base model used in some merging methods.
- `parameters`: Holds various parameters such as weights and densities, which can also be specified at different levels of the configuration.
- `dtype`: Specifies the data type used for the merging operation.
- `tokenizer_source`: Determines how to construct a tokenizer for the merged model.

## Merge Methods

### Spherical Linear Interpolation

Spherical Linear Interpolation (SLERP) serves as a technique for seamlessly interpolating between two vectors while maintaining a constant rate of change and upholding the geometric properties of the spherical space in which these vectors exist.

Opting for SLERP over traditional linear interpolation is motivated by various considerations. Linear interpolation in high-dimensional spaces may result in a reduction in the magnitude of the interpolated vector, diminishing the scale of weights. Additionally, in many cases, the alteration in the weights' direction conveys more meaningful information, such as feature learning and representation, compared to the magnitude of change.

The implementation of SLERP involves the following steps:

- Normalize the input vectors to unit length, ensuring they signify directions rather than magnitudes.
- Calculate the angle between these vectors using their dot product.
- If the vectors are nearly collinear, the method defaults to linear interpolation for efficiency. Otherwise, SLERP calculates scale factors based on the interpolation factor t (where t=0 corresponds to 100% of the first vector, and t=1 corresponds to 100% of the second vector) and the angle between the vectors.
- Utilize these computed factors to weigh the original vectors, and then sum them to derive the interpolated vector.

### Ties merging

TIES-Merging is a method designed to facilitate the efficient merging of multiple task-specific models into a consolidated multitask model. It addresses two primary challenges encountered in the process of model merging with a focus on maintaining objectivity.

One key challenge tackled by TIES-Merging involves addressing redundancy in model parameters. This is achieved by identifying and eliminating redundant parameters within task-specific models, emphasizing the changes made during fine-tuning and selectively retaining the top-k% most significant changes while discarding the rest.

Another challenge pertains to conflicts arising from disagreements between parameter signs across different models. TIES-Merging resolves these conflicts by creating a unified sign vector representing the most dominant direction of change across all models.

The TIES-Merging process consists of three steps:

- Trim: Reduces redundancy in task-specific models by retaining a fraction of the most significant parameters (density parameter) and resetting the remaining parameters to zero.
- Elect Sign: Resolves sign conflicts across different models by creating a unified sign vector based on the most dominant direction (positive or negative) in terms of cumulative magnitude.
- Disjoint Merge: Averages parameter values aligned with the unified sign vector, excluding zero values.

# Hugging Face login

In [None]:
!git config --global credential.helper store
!huggingface-cli login

# Model configuration

In [1]:
model_name = "Pearl-7B-0212-ties"
yaml_config = """
models:
  - model: OpenPipe/mistral-ft-optimized-1227
  - model: louisbrulenaudet/Pearl-7B-slerp
    parameters:
      density: 0.6
      weight: 0.3
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      density: 0.55
      weight: 0.2
  - model: macadeliccc/WestLake-7B-v2-laser-truthy-dpo
    parameters:
      density: 0.55
      weight: 0.25
  - model: CultriX/NeuralTrix-7B-dpo
    parameters:
     density: 0.6
     weight: 0.25
merge_method: ties
base_model: OpenPipe/mistral-ft-optimized-1227
parameters:
  normalize: true
  int8_mask: true
dtype: float16
"""

In [None]:
def model_merging(runtime: str = "CPU", branch: str = "main", trust_remote_code: bool = False) -> None:
    """
    Merges models using Mergekit with specified configurations.

    Parameters
    ----------
    runtime : str, optional
        Selects the runtime type for merging. Default is "CPU".

    branch : str, optional
        Selects the branch to use for Mergekit. Default is "main".

    trust_remote_code : bool, optional
        Specifies whether to trust remote code during merging. Default is False.

    Returns
    -------
    None
        This function does not return anything. It merges models according to the specified configurations.
    """
    try:
        !git clone https://github.com/louisbrulenaudet/mergekit.git

        if branch == "main":
            !git clone https://github.com/cg123/mergekit.git
            !cd mergekit && pip install -qqq -e . --progress-bar off

        elif branch == "mixtral":
            !git clone -b mixtral https://github.com/cg123/mergekit.git
            !cd mergekit && pip install -qqq -e . --progress-bar off
            !pip install -qqq -U transformers --progress-bar off

        # Save config as yaml file
        with open("config.yaml", "w", encoding="utf-8") as f:
            f.write(yaml_config)

        # Base CLI
        if branch == "main":
            cli = "mergekit-yaml config.yaml merge --copy-tokenizer"

        elif branch == "mixtral":
            cli = "mergekit-moe config.yaml merge --copy-tokenizer"

        # Additional arguments
        if runtime == "CPU":
            cli += " --allow-crimes --out-shard-size 1B --lazy-unpickle"

        elif runtime == "GPU":

            cli += " --cuda --low-cpu-memory"
        if trust_remote_code:

            cli += " --trust-remote-code"

        print(cli)

        # Merge models
        !{cli}

    except Exception as e:
        print("An error occurred during model merging:", e)

    return None


# @title ## Model merging

# @markdown ### Runtime type
# @markdown Select your runtime (CPU, High RAM, GPU)

runtime = "CPU" # @param ["CPU", "CPU + High-RAM", "GPU"]

# @markdown ### Mergekit arguments
# @markdown Use the `main` branch by default, [`mixtral`](https://github.com/cg123/mergekit/blob/mixtral/moe.md) if you want to create a Mixture of Experts.

branch = "main" # @param ["main", "mixtral"]
trust_remote_code = False # @param {type:"boolean"}

model_merging(
    runtime=runtime,
    branch=branch,
    trust_remote_code=trust_remote_code
)

In [None]:
!pip install -qU huggingface_hub
import yaml
from huggingface_hub import ModelCard, HfApi
from google.colab import userdata
from jinja2 import Template

main_template = """
---
license: {{ license }}
language:
- {{ language }}
library_name: transformers
base_model:
{%- for model in models %}
- {{ model }}
{%- endfor %}
tags:
- merge
- mergekit
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a merge of the following models :

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## Configuration

```yaml
{{- yaml_config -}}
```

## Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "{{ username }}/{{ model_name }}"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
## Feedback

If you have any feedback, please reach out at [louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com).
"""

moe_template = """
---
license: {{ license }}
language:
- {{ language }}
library_name: transformers
base_model:
{%- for model in models %}
  - {{ model }}
{%- endfor %}
tags:
- moe
- merge
- mergekit
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a Mixure of Experts (MoE) made with the following models :

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## Configuration

```yaml
{{- yaml_config -}}
```

## Usage

```python
!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "{{ username }}/{{ model_name }}"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
## Feedback

If you have any feedback, please reach out at [louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com).
"""


class Card:
    def __init__(self, template_text:str, yaml_config:str, model_name:str, username:str, license:str) -> None:
        """
        Initialize a Card object.

        Parameters
        ----------
        template_text : str
            The template text for creating the model card.

        yaml_config : str
            The YAML configuration containing model information.

        model_name : str
            The name of the model.

        username : str
            The Hugging Face username.

        license : str
            The license for the model.

        path : str, optional
            The path to save the model files. Default is "merge".

        Returns
        -------
        None
        """
        self.template_text = template_text
        self.yaml_config = yaml_config
        self.model_name = model_name
        self.username = username
        self.license = license
        self.path = path


    def create(self, models:list) -> None:
        """
        Create a model card.

        Parameters
        ----------
        models : list
            List of models to merge.

        Returns
        -------
        None
            No return value.
        """
        jinja_template = Template(self.template_text.strip())

        content = jinja_template.render(
            model_name=self.model_name,
            models=models,
            yaml_config=self.yaml_config,
            username=self.username,
            license=self.license
        )

        card = ModelCard(content)
        card.save("merge/README.md")

        return None


    def upload(self, path:str="./merge") -> None:
        """
        Upload the model to Hugging Face.

        Parameters
        ----------
        path : str, optional
            The path to the folder containing model files. Default is "./merge".

        Returns
        -------
        None
            No return value.
        """
        api = HfApi(
            token=userdata.get(token)
        )

        api.create_repo(
            repo_id=f"{self.username}/{self.model_name}",
            repo_type="model",
            exist_ok=True
        )

        api.upload_folder(
            repo_id=f"{self.username}/{self.model_name}",
            folder_path=self.path
        )

        return None


class Main(Card):
    def __init__(self, template_text:str, yaml_config:str, model_name:str, username:str, license:str):
        """
        Initialize a Main object.

        Parameters
        ----------
        template_text : str
            The template text for creating the model card.

        yaml_config : str
            The YAML configuration containing model information.

        model_name : str
            The name of the model.

        username : str
            The Hugging Face username.

        license : str
            The license for the model.

        Returns
        -------
        None
        """
        super().__init__(
            template_text=template_text,
            yaml_config=yaml_config,
            model_name=model_name,
            username=username,
            license=license
        )


    def create_card(self) -> None:
        """
        Create a model card and upload it to Hugging Face.

        Parameters
        ----------
        None

        Returns
        -------
        None
            No return value.
        """
        data = yaml.safe_load(
            self.yaml_config
        )

        models = [model["model"] for model in data.get("models", [])]

        super().create(
            models=models
        )

        super().upload(
            path="./merge"
        )

        return None


class Moe(Card):
    def __init__(self, template_text:str, yaml_config:str, model_name:str, username:str, license:str) -> None:
        """
        Initialize a Moe object.

        Parameters
        ----------
        template_text : str
            The template text for creating the model card.

        yaml_config : str
            The YAML configuration containing model information.

        model_name : str
            The name of the model.

        username : str
            The Hugging Face username.

        license : str
            The license for the model.

        Returns
        -------
        None
        """
        super().__init__(
            template_text=template_text,
            yaml_config=yaml_config,
            model_name=model_name,
            username=username,
            license=license
        )


    def create_card(self) -> None:
        """
        Create a MoE model card and upload it to Hugging Face.

        Parameters
        ----------
        None

        Returns
        -------
        None
            No return value.
        """
        data = yaml.safe_load(
            self.yaml_config
        )

        models = [model["source_model"] for model in data.get("experts", [])]

        super().create(
            models=models
        )

        super().upload(
            path="./merge"
        )

        return None


# @title ## Upload model to Hugging Face { display-mode: "form" }
# @markdown Enter your HF username and the name of Colab secret that stores your [Hugging Face access token](https://huggingface.co/settings/tokens).
username = "louisbrulenaudet" # @param {type:"string"}
token = "hf" # @param {type:"string"}
license = "apache-2.0" # @param ["apache-2.0", "cc-by-nc-4.0", "mit", "openrail"] {allow-input: true}
language = "en" # @param {type:"string"}

if branch == "main":
    instance = Main(
        template_text=main_template,
        yaml_config=yaml_config,
        model_name=model_name,
        username=username,
        license=license
    )

elif branch == "mixtral":
    instance = Moe(
        template_text=moe_template,
        yaml_config=yaml_config,
        model_name=model_name,
        username=username,
        license=license
    )

instance.create_card()