xTuring
provides fast, efficient and simple fine-tuning of LLMs, such as LLaMA, GPT-J, Galactica, and more.
By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it
simple to build, customize and control LLMs. The entire process can be done inside your computer or in your
private cloud, ensuring data privacy and security.
With xTuring
you can,
- Ingest data from different sources and preprocess them to a format LLMs can understand
- Scale from single to multiple GPUs for faster fine-tuning
- Leverage memory-efficient methods (i.e. INT4, LoRA fine-tuning) to reduce hardware costs by up to 90%
- Explore different fine-tuning methods and benchmark them to find the best performing model
- Evaluate fine-tuned models on well-defined metrics for in-depth analysis
We are excited to announce the latest enhancements to our xTuring
library:
Falcon LLM
integration - You can use and fine-tune theFalcon-7B
model in different configurations: off-the-shelf, off-the-shelf with INT8 precision, LoRA fine-tuning, and LoRA fine-tuning with INT8 precision.GenericModel
wrapper - This new integration allows you to test and fine-tune any new model onxTuring
without waiting for it to be integrated using classGenericModel
.
You can check the Falcon LoRA INT8 working example repository to see how it works. Also, you can check the GenericModel working example repository to see how it works.
pip install xturing
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
# Load the dataset
instruction_dataset = InstructionDataset("./alpaca_data")
# Initialize the model
model = BaseModel.create("llama_lora")
# Finetune the model
model.finetune(dataset=instruction_dataset)
# Perform inference
output = model.generate(texts=["Why LLM models are becoming so important?"])
print("Generated output by the model: {}".format(output))
You can find the data folder here.
$ xturing chat -m "<path-to-model-folder>"
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
from xturing.ui import Playground
dataset = InstructionDataset("./alpaca_data")
model = BaseModel.create("<model_name>")
model.finetune(dataset=dataset)
model.save("llama_lora_finetuned")
Playground().launch() ## launches localhost UI
- Preparing your dataset
- Cerebras-GPT fine-tuning with LoRA and INT8
- Cerebras-GPT fine-tuning with LoRA
- LLaMA fine-tuning with LoRA and INT8
- LLaMA fine-tuning with LoRA
- LLaMA fine-tuning
- GPT-J fine-tuning with LoRA and INT8
- GPT-J fine-tuning with LoRA
- GPT-2 fine-tuning with LoRA
Here is a comparison for the performance of different fine-tuning techniques on the LLaMA 7B model. We use the Alpaca dataset for fine-tuning. The dataset contains 52K instructions.
Hardware:
4xA100 40GB GPU, 335GB CPU RAM
Fine-tuning parameters:
{
'maximum sequence length': 512,
'batch size': 1,
}
LLaMA-7B | DeepSpeed + CPU Offloading | LoRA + DeepSpeed | LoRA + DeepSpeed + CPU Offloading |
---|---|---|---|
GPU | 33.5 GB | 23.7 GB | 21.9 GB |
CPU | 190 GB | 10.2 GB | 14.9 GB |
Time/epoch | 21 hours | 20 mins | 20 mins |
Contribute to this by submitting your performance results on other GPUs by creating an issue with your hardware specifications, memory consumption and time per epoch.
We have already fine-tuned some models that you can use as your base or start playing with. Here is how you would load them:
from xturing.models import BaseModel
model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")
model | dataset | Path |
---|---|---|
DistilGPT-2 LoRA | alpaca | x/distilgpt2_lora_finetuned_alpaca |
LLaMA LoRA | alpaca | x/llama_lora_finetuned_alpaca |
- Support for
LLaMA
,GPT-J
,GPT-2
,OPT
,Cerebras-GPT
,Galactica
andBloom
models - Dataset generation using self-instruction
- Low-precision LoRA fine-tuning and unsupervised fine-tuning
- INT8 low-precision fine-tuning support
- OpenAI, Cohere and AI21 Studio model APIs for dataset generation
- Added fine-tuned checkpoints for some models to the hub
- INT4 LLaMA LoRA fine-tuning demo
- INT4 LLaMA LoRA fine-tuning with INT4 generation
- Support for a
Generic model
wrapper - Support for
Falcon-7B
model - INT4 low-precision fine-tuning support
- Evaluation of LLM models
- INT3, INT2, INT1 low-precision fine-tuning support
- Support for Stable Diffusion
If you have any questions, you can create an issue on this repository.
You can also join our Discord server and start a discussion in the #xturing
channel.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.