Skip to content

Selectus2/trainers-rb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

trainers-rb

Fine-tune transformer models in Ruby.

trainers-rb provides a training loop, LoRA (Low-Rank Adaptation), learning rate scheduling, and model serialization for HuggingFace transformer models loaded via transformers-rb. It builds on torch-rb for autograd, optimizers, and tensor operations.

All the heavy lifting happens in LibTorch C++ kernels. Ruby is the conductor.

Installation

Add to your Gemfile:

gem "trainers-rb"

Or install directly:

gem install trainers-rb

Prerequisites

trainers-rb depends on torch-rb, which requires LibTorch:

# macOS arm64
curl -L -o /tmp/libtorch.zip https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.4.0.zip
unzip /tmp/libtorch.zip -d ~/libtorch
bundle config set build.torch-rb --with-torch-dir=$HOME/libtorch/libtorch

Quick Start

require "trainers-rb"

# Load a pre-trained model and tokenizer
model, tokenizer = Trainers.from_pretrained(
  "distilbert-base-uncased",
  task: :sequence_classification,
  num_labels: 2
)

# Prepare your dataset
train_data = texts.map.with_index do |text, i|
  encoded = tokenizer.(text, truncation: true, max_length: 128)
  {
    input_ids:      encoded["input_ids"],
    attention_mask: encoded["attention_mask"],
    labels:         labels[i]
  }
end
train_dataset = Trainers::Dataset.new(train_data)

# Configure and train
args = Trainers::TrainingArguments.new(
  output_dir:        "./output",
  num_train_epochs:  3,
  learning_rate:     2e-5,
  eval_strategy:     :epoch
)

trainer = Trainers::Trainer.new(
  model:         model,
  args:          args,
  train_dataset: train_dataset,
  eval_dataset:  val_dataset,
  tokenizer:     tokenizer,
  data_collator: Trainers::DataCollatorWithPadding.new(tokenizer: tokenizer),
  compute_metrics: ->(eval_pred) {
    preds   = eval_pred.predictions.argmax(1)
    correct = preds.eq(eval_pred.label_ids).sum.item
    { accuracy: correct.to_f / eval_pred.label_ids.size(0) }
  }
)

trainer.train
trainer.save_model("./my-model")

LoRA (Parameter-Efficient Fine-Tuning)

Freeze 99% of parameters and train only small low-rank adapter matrices:

# Apply LoRA to specific layers
config = Trainers::LoraConfig.new(
  r:              8,         # rank
  lora_alpha:     16,        # scaling factor
  lora_dropout:   0.1,
  target_modules: ["query", "value"],  # which Linear layers to adapt
  bias:           :none      # :none, :all, or :lora_only
)

Trainers::LoraModel.apply(model, config)
# => LoRA applied to 12 modules: ...
# => trainable params: 294,912 || all params: 66,955,010 || trainable%: 0.4404%

# Train as usual
trainer.train

# Save just the adapters (tiny files)
Trainers::LoraModel.save_adapters(model, "./lora-adapters")

# Or merge into base model for inference
Trainers::LoraModel.merge(model)
trainer.save_model("./merged-model")

Loading saved LoRA adapters

model, tokenizer = Trainers.from_pretrained("distilbert-base-uncased", num_labels: 2)
Trainers::LoraModel.apply(model, config)
Trainers::LoraModel.load_adapters(model, "./lora-adapters")

Training Arguments

Argument Default Description
output_dir "./output" Directory for checkpoints and saved models
num_train_epochs 3 Number of training epochs
per_device_train_batch_size 8 Training batch size
per_device_eval_batch_size 8 Evaluation batch size
learning_rate 5e-5 Peak learning rate for AdamW
weight_decay 0.0 Weight decay (applied to non-bias, non-norm params)
max_grad_norm 1.0 Max gradient norm for clipping
gradient_accumulation_steps 1 Accumulate gradients over N steps
warmup_steps 0 Linear warmup steps
warmup_ratio 0.0 Warmup as fraction of total steps (alternative to warmup_steps)
lr_scheduler_type :linear :linear, :cosine, or :constant
eval_strategy :no When to evaluate: :no, :epoch, or :steps
eval_steps nil Evaluate every N steps (when eval_strategy: :steps)
save_strategy :epoch When to save: :no, :epoch, or :steps
save_total_limit nil Keep only the last N checkpoints
logging_steps 500 Log every N steps
seed 42 Random seed
no_mps false Force CPU even if MPS is available

Callbacks

Built-in callbacks:

# Early stopping
early_stop = Trainers::EarlyStoppingCallback.new(
  patience:    3,
  threshold:   0.01,
  metric_name: "eval_loss"
)

trainer = Trainers::Trainer.new(
  model: model,
  args: args,
  callbacks: [early_stop],
  # ...
)

Custom callbacks:

class WandbCallback < Trainers::TrainerCallback
  def on_log(args, state, control, logs: nil, **)
    # send logs to Weights & Biases, MLflow, etc.
  end

  def on_evaluate(args, state, control, metrics: nil, **)
    # log evaluation metrics
  end
end

Callback hooks

Hook When it fires
on_train_begin Before the first step
on_train_end After the last step
on_epoch_begin Start of each epoch
on_epoch_end End of each epoch
on_step_begin Before each training step
on_step_end After each training step
on_log When metrics are logged
on_evaluate After evaluation
on_save After saving a checkpoint

Learning Rate Schedulers

Three schedules are available, all with optional linear warmup:

# Linear warmup then linear decay to 0 (default)
args = Trainers::TrainingArguments.new(lr_scheduler_type: :linear, warmup_steps: 100)

# Linear warmup then cosine decay to 0
args = Trainers::TrainingArguments.new(lr_scheduler_type: :cosine, warmup_steps: 100)

# Linear warmup then constant
args = Trainers::TrainingArguments.new(lr_scheduler_type: :constant, warmup_steps: 100)

Data Utilities

Dataset

Wrap an array of hashes:

data = [
  { input_ids: [101, 2023, 2003], attention_mask: [1, 1, 1], labels: 1 },
  { input_ids: [101, 2919, 2143], attention_mask: [1, 1, 1], labels: 0 },
]
dataset = Trainers::Dataset.new(data)

Data Collators

Dynamic padding collator (pads each batch to the longest sequence in that batch):

collator = Trainers::DataCollatorWithPadding.new(tokenizer: tokenizer)

Default collator (no padding, expects uniform-length inputs):

collator = Trainers::DefaultDataCollator.new

Supported Tasks

trainers-rb works with any Torch::NN::Module. The Trainers.from_pretrained convenience method supports these transformers-rb model classes:

Task Model class
:sequence_classification AutoModelForSequenceClassification
:token_classification AutoModelForTokenClassification
:question_answering AutoModelForQuestionAnswering

You can also use any custom model:

trainer = Trainers::Trainer.new(model: my_custom_model, args: args, ...)

Device Support

trainers-rb auto-detects the best available device:

  • CPU — always available
  • MPS — Apple Silicon GPU, used automatically when available
# Force CPU
args = Trainers::TrainingArguments.new(no_mps: true)

# Or set explicitly
args = Trainers::TrainingArguments.new(device: Torch.device("mps"))

Architecture

trainers-rb
  -> transformers-rb    (model loading, tokenizers, HF Hub)
    -> torch-rb         (autograd, nn modules, optimizers)
    -> tokenizers        (HuggingFace Rust tokenizers via FFI)
    -> safetensors       (weight file I/O)

trainers-rb adds the training layer that transformers-rb intentionally omits. Both gems call into the same LibTorch C++ kernels for the actual computation.

Roadmap

  • More model architectures in transformers-rb (GPT-2, Llama for text generation)
  • Mixed precision training (fp16/bf16)
  • Gradient checkpointing for memory efficiency
  • Dataset streaming for large datasets
  • Distributed training
  • Integration with ONNX export for deployment
  • QLoRA (quantized base model + LoRA)

Contributing

Bug reports and pull requests are welcome on GitHub.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages