MPT

Overview

The MPT model was proposed by the MosaicML team and released with multiple sizes and finetuned variants. The MPT models is a series of open source and commercially usable LLMs pre-trained on 1T tokens.

MPT models are GPT-style decoder-only transformers with several improvements: performance-optimized layer implementations, architecture changes that provide greater training stability, and the elimination of context length limits by replacing positional embeddings with ALiBi.

MPT base: MPT base pre-trained models on next token prediction
MPT instruct: MPT base models fine-tuned on instruction based tasks
MPT storywriter: MPT base models fine-tuned for 2500 steps on 65k-token excerpts of fiction books contained in the books3 corpus, this enables the model to handle very long sequences

The original code is available at the llm-foundry repository.

Read more about it in the release blogpost

Usage tips

Learn more about some techniques behind training of the model in this section of llm-foundry repository
If you want to use the advanced version of the model (triton kernels, direct flash attention integration), you can still use the original model implementation by adding trust_remote_code=True when calling from_pretrained.

Resources

Fine-tuning Notebook on how to fine-tune MPT-7B on a free Google Colab instance to turn the model into a Chatbot.

MptConfig

[[autodoc]] MptConfig - all

MptModel

[[autodoc]] MptModel - forward

MptForCausalLM

[[autodoc]] MptForCausalLM - forward

MptForSequenceClassification

[[autodoc]] MptForSequenceClassification - forward

MptForTokenClassification

[[autodoc]] MptForTokenClassification - forward

MptForQuestionAnswering

[[autodoc]] MptForQuestionAnswering - forward

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpt.md

mpt.md

MPT

Overview

Usage tips

Resources

MptConfig

MptModel

MptForCausalLM

MptForSequenceClassification

MptForTokenClassification

MptForQuestionAnswering

Files

mpt.md

Latest commit

History

mpt.md

File metadata and controls

MPT

Overview

Usage tips

Resources

MptConfig

MptModel

MptForCausalLM

MptForSequenceClassification

MptForTokenClassification

MptForQuestionAnswering