Introduction

MatMul-Free LM

If you like our project, please give us a star ⭐ on GitHub for the latest updates.

This repo is adapted from flash-linear-attention.

Introduction

MatMul-Free LM is a language model architecture that eliminates the need for Matrix Multiplication (MatMul) operations. This repository provides an implementation of MatMul-Free LM that is compatible with the 🤗 Transformers library.

Scaling Law

We evaluate how the scaling law fits to the 370M, 1.3B and 2.7B parameter models in both Transformer++ and our model. For a fair comparison, each operation is treated identically, though our model uses more efficient ternary weights in some layers. Interestingly, the scaling projection for our model exhibits a steeper descent compared to Transformer++, suggesting our architecture is more efficient in leveraging additional compute to improve performance.

Installation

The following requirements should be satisfied

PyTorch >= 2.0
Triton >=2.2
einops

pip install -U git+https://github.com/ridgerchu/matmulfreellm

Usage

Pre-trained Model Zoo

Model Size	Layer	Hidden dimension	Trained tokens
370M	24	1024	15B
1.3B	24	2048	100B
2.7B	32	2560	100B

Model

We provide the implementations of models that are compatible with 🤗 Transformers library. Here's an example of how to initialize a model from the default configs in matmulfreelm: This is a huggingface-compatible library that you can use such command to initialize the model with huggingface AutoModel:

>>> from mmfreelm.models import HGRNBitConfig
>>> from transformers import AutoModel
>>> config = HGRNBitConfig()
>>> AutoModel.from_config(config)
HGRNBitModel(
  (embeddings): Embedding(32000, 2048)
  (layers): ModuleList(
    (0): HGRNBitBlock(
      (attn_norm): RMSNorm(2048, eps=1e-06)
      (attn): HGRNBitAttention(
        (i_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (f_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (g_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (g_norm): FusedRMSNormSwishGate()
        (o_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
      )
      (mlp_norm): RMSNorm(2048, eps=1e-06)
      (mlp): HGRNBitMLP(
        (gate_proj): FusedBitLinear(
          in_features=2048, out_features=11264, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (down_proj): FusedBitLinear(
          in_features=5632, out_features=2048, bias=False
          (norm): RMSNorm(5632, eps=1e-08)
        )
        (act_fn): SiLU()
      )
    )
    
)
>>>

Generation

Upon successfully pretraining a model, it becomes accessible for generating text using the 🤗 text generation APIs. In the following, we give a generation example in generate.py:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
import mmfreelm
from transformers import AutoModelForCausalLM, AutoTokenizer
#Change here to our open-sourced model
name = ''
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name).cuda().half()
input_prompt = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, "
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=32,  do_sample=True, top_p=0.4, temperature=0.6)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

Citation

If you use this repo in your work, please cite our preprint:

@article{zhu2024scalable,
title={Scalable MatMul-free Language Modeling},
author={Zhu, Rui-Jie and Zhang, Yu and Sifferman, Ethan and Sheaves, Tyler and Wang, Yiqiao and Richmond, Dustin and Zhou, Peng and Eshraghian, Jason K},
journal={arXiv preprint arXiv:2406.02528},
year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
__assets__		__assets__
mmfreelm		mmfreelm
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
mmfreelm_DockerFile		mmfreelm_DockerFile
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatMul-Free LM

If you like our project, please give us a star ⭐ on GitHub for the latest updates.

This repo is adapted from flash-linear-attention.

Introduction

Scaling Law

Installation

Usage

Pre-trained Model Zoo

Model

Generation

Citation

About

Releases

Packages

Contributors 7

Languages

License

ridgerchu/matmulfreellm

Folders and files

Latest commit

History

Repository files navigation

MatMul-Free LM

If you like our project, please give us a star ⭐ on GitHub for the latest updates.

This repo is adapted from flash-linear-attention.

Introduction

Scaling Law

Installation

Usage

Pre-trained Model Zoo

Model

Generation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages