v1.4.0-rc0

Pre-release

Pre-release

imezx released this 16 Nov 11:59

· 78 commits to main since this release

197994e

# What's new?

Gradien now available on Wally!
(Added) Prebuilt models at Gradien.Models.
- MLP, ResMLP, ConvNet, TransformerEncoder, SequenceClassifier, AutoEncoder.
(Added) Multi-head self-attention at Gradien.NN.Attention.
- Attention.new(embedDim, numHeads, opts) with support for dropout, custom initializers, causal masking, and getLastAttention() inspection.
- Backed by a numerically stable softmax in Gradien.Ops.Softmax.
(Added) Multi-agent RL wrapper at Gradien.RL.MultiAgent.
- Wraps a list of agents with:
  - getAgent(i), size()
  - act(i, state, step) and actAll(states, step)
  - parameters() and zeroGrad() that fan out across agents.
(Added) Buffer utilities at Gradien.Util.Buffer.
- High-level encode/decode of Luau values and tensors into buffer objects.
(Added) Profiler at Gradien.Util.Profiler.
- API: .start/stop, .scope, .wrap, .instrument, .snapshot, .report, .withEnabled, .get, .reset/flush.
(Added) Stable Softmax at Gradien.Ops.Softmax.
- SoftmaxOps.forward(logits) implements a max-shifted, numerically stable softmax used by nn.Softmax and nn.Attention.
(Added) Classification-oriented trainer constructor at Gradien.Trainer.
- Trainer.newClassification(cfg, opts?):
  - Default loss: nn.Losses.cross_entropy_backward (with optional label smoothing).
  - Default metric: Metrics.accuracy.
  - Returns a regular Trainer wired for supervised classification.
(Added) Cosine schedule with warmup in Gradien.Optim.Schedulers.
- S.linearWarmupThenCosine(lr, warmupSteps, totalSteps, lrMin):
  - Linear warmup phase followed by cosine decay toward lrMin.
(Added) Snapshot ↔ buffer helpers at Gradien.State.
- State.toBuffer(snap): buffer – serializes a Types.Snapshot via Util.Buffer.
- State.fromBuffer(buf): Snapshot? – reconstructs snapshots from a buffer.
(Added) New initializers at Gradien.Init.
- heNormal(W), heUniform(W), lecunNormal(W), lecunUniform(W) – fan-in/out aware weight initializers.
(Added) Tensor view & shape helpers at Gradien.Tensor.
- Tensor.reshape(t, newShape) – view-style reshape (shared storage) with size checking.
- Tensor.slice(t, dim, startIdx, endIdx?, step?) – strided slices with view-based implementation.
- Tensor.transpose(t, dim1?, dim2?) – generic axis swap; defaults to 2D transpose when dims are omitted.
- Tensor.narrow(t, dim, startIdx, length) – thin wrapper around slice for PyTorch-style narrowing.
- Tensor.noGrad(t) – in-place: marks a tensor as non-differentiable and clears _grad.

# What's changed?

(Changed) Tensor & Autograd
- Tensor now uses explicit computeStrides and view objects internally to implement reshape, slice, and transpose without copying storage, while still propagating gradients.
- Tensor.detach() still returns a detached view, but Tensor.noGrad() was added for in-place disabling of gradients.
- autograd.Tape.matmul:
  - Allocates A._grad / B._grad with the correct dtype (Tensor.zeros(..., x._dtype)).
  - Accumulates gradients into existing .grad buffers instead of overwriting them.
- Tape.noGrad(fn) no longer wraps fn in pcall.
(Changed) Initializers
- All initializer functions now operate on Types.Tensor and share the internal _randn() normal sampler.
- Existing initializers (e.g. xavierUniform) are updated to use fan-in/fan-out computations consistent with the new He/LeCun variants.
(Changed) BatchNorm & Metrics
- nn.BatchNorm1d:
  - Running statistics now have shape {D, 1} instead of {1, 1} and are tracked per-feature.
  - Training mode computes per-channel means and variances and updates runningMean / runningVar with the configured momentum.
  - Eval mode uses the stored per-channel statistics for normalization.
- Metrics:
  - Multi-class precision/recall/F1 now pre-init tpC, fpC, fnC arrays with zeros to avoid nil indexing on unseen classes.
  - Confusion matrix allocation uses table.create(C, 0) and fills with zeros, fixing edge cases when some classes never appear.
(Changed) Convolutions & Softmax
- ops.Conv2d:
  - Reimplemented using helper routines (copyShape, makeMatrixView, im2col, col2im, reshapeInPlace, transposeMatrix, addInto) and BLAS.matmul.
  - Keeps public signature the same (Conv2d(X, W)), but forwards now use an im2col + GEMM approach for better performance.
- nn.Conv2d:
  - Continues to delegate to ops.Conv2d, inheriting the new, more efficient kernel without changing its module API.
- nn.Softmax:
  - Simplified to delegate to Ops.Softmax.forward, consolidating the softmax implementation in ops/Softmax.luau.
(Changed) RL Replay Buffers
- Gradien.RL.Replay:
  - Now requires t.state and t.nextState to be tensors and asserts their presence.
  - Infers stateDim and dtype on first push and stores state vectors as dense arrays in Util.Buffer buffers instead of raw tables.
  - sample(batchSize) reconstructs batched state / next-state tensors from the underlying buffers.
- Gradien.RL.UniformReplay & Gradien.RL.PrioritizedReplay:
  - Similarly updated to serialize state vectors into buffers via Util.Buffer and to rebuild batched S / NS tensors on sampling.
  - Insert logic and bookkeeping (head, size_) are clarified and wrapped in explicit conditionals.
(Changed) Schedulers & Trainer
- Trainer.fit now works through a typed FitOptions table (epochs, stepsPerEpoch, onMetric), assigning defaults via a local fitOpts but remaining backwards compatible with previous usage.
(Changed) Small fixes
- nn.BatchNorm1d, replay buffers, and metrics all gained more explicit shape checks, zero-initialization, and assert messages to catch configuration errors earlier.
(Changes) Several small improvements
- slightly performance bumps specially on heavy ops compared to previous versions.
- small Types fix for Tensor

Assets 3