Alucard

A small (32M parameter) text-to-sprite generative model using flow matching. Generates 128x128 RGBA sprites from text prompts, with optional reference frame input for animation generation.

Architecture

Text Prompt --> [Frozen CLIP ViT-B/32] --> 512-dim embedding
                                               |
                                          AdaLN-Zero (every ResBlock)
                                               |
Noise (128x128x4) --cat--> [UNet 32M params] --> Sprite (128x128x4 RGBA)
                    |
Previous Frame (128x128x4) or zeros

UNet with 8-channel input (4 noisy RGBA + 4 reference RGBA), base channels 64, multipliers [1, 2, 4, 4]
AdaLN-Zero conditioning from CLIP text embeddings + sinusoidal timestep
Flow matching (rectified flow) training objective
Dual classifier-free guidance for independent text and reference frame control
Self-attention at 32x32 and 16x16 resolutions
Gradient checkpointing for 16GB GPU training

Two Modes

Text to Sprite - generate a sprite from a text prompt alone
Text + Reference to Sprite - generate the next animation frame conditioned on a previous frame and text describing the change

Installation

pip install git+https://github.com/evilsocket/alucard.git

Quick Start

from alucard import Alucard

# Load model (downloads weights automatically from HuggingFace)
model = Alucard.from_pretrained("evilsocket/alucard")

# Generate a sprite from text
sprite = model("a pixel art knight sprite, idle pose")
sprite.save("knight.png")

# Generate multiple variations
sprites = model("a pixel art dragon enemy sprite", num_samples=4, seed=42)
for i, s in enumerate(sprites):
    s.save(f"dragon_{i}.png")

Animation: generate frame by frame

# Generate the first frame
frame_1 = model("a pixel art knight sprite, walking right, frame 1")
frame_1.save("walk_01.png")

# Generate subsequent frames using the previous frame as reference
frame_2 = model("a pixel art knight sprite, walking right, frame 2", ref=frame_1)
frame_2.save("walk_02.png")

frame_3 = model("a pixel art knight sprite, walking right, frame 3", ref=frame_2)
frame_3.save("walk_03.png")

Dataset Preparation

1. Prepare sprite images

Place 128x128 RGBA PNG sprites in a directory with .txt caption files:

data/processed/
    sprite_000001.png      # 128x128 RGBA
    sprite_000001.txt      # "a pixel art knight sprite, idle pose"
    sprite_000001.prev.png # optional: previous animation frame

2. Pre-compute CLIP embeddings

alucard-precompute --data-dir data/processed

This creates .clip.pt files containing 512-dim CLIP text embeddings for each caption.

3. Build dataset from public sources (optional)

python scripts/build_dataset.py
python scripts/process_extra_sources.py
python scripts/fix_captions_and_embed.py

Training

alucard-train \
    --data-dir data/processed \
    --output-dir checkpoints \
    --epochs 200 \
    --batch-size 64 \
    --lr 1e-4 \
    --grad-accum 2 \
    --save-every 10 \
    --sample-every 10

Training with Docker

docker build -t alucard .
docker run --gpus all -v $(pwd)/data:/app/data -v $(pwd)/checkpoints:/app/checkpoints alucard \
    alucard-train --data-dir data/processed --output-dir checkpoints --epochs 200 --batch-size 64

Resume training

alucard-train --data-dir data/processed --resume checkpoints/checkpoint_0050.pt

VRAM usage (with gradient checkpointing)

Batch Size	Peak VRAM
32	~3.7 GB
64	~7.3 GB
96	~10.9 GB

Sampling

# Text-only generation
alucard-sample \
    --checkpoint checkpoints/checkpoint_0200.pt \
    --prompt "a pixel art knight sprite, idle pose" \
    --output knight.png

# Animation: generate next frame from reference
alucard-sample \
    --checkpoint checkpoints/checkpoint_0200.pt \
    --prompt "a pixel art knight sprite, walking, next frame" \
    --ref knight.png \
    --output knight_frame2.png

# Multiple samples
alucard-sample \
    --checkpoint checkpoints/checkpoint_0200.pt \
    --prompt "a pixel art dragon enemy sprite" \
    --num-samples 4 \
    --seed 42

Sampling parameters

Parameter	Default	Description
`--num-steps`	20	Euler ODE integration steps
`--cfg-text`	5.0	Text guidance scale
`--cfg-ref`	2.0	Reference image guidance scale

Model Export

Convert a training checkpoint to safetensors for distribution:

alucard-convert \
    --checkpoint checkpoints/best.pt \
    --output alucard_model.safetensors \
    --half

License

Released under the FAIR License (Free for Attribution and Individual Rights) v1.0.0.

Non-commercial use (personal, educational, research, non-profit) is freely permitted under the terms of the license.
Commercial use (SaaS, paid apps, any monetization) requires visible attribution to the project and its author. See the license for details.
Business use (any use by or on behalf of a business entity) requires a signed commercial agreement with the author. Contact evilsocket@gmail.com for inquiries.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
alucard		alucard
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alucard

Architecture

Two Modes

Installation

Quick Start

Animation: generate frame by frame

Dataset Preparation

1. Prepare sprite images

2. Pre-compute CLIP embeddings

3. Build dataset from public sources (optional)

Training

Training with Docker

Resume training

VRAM usage (with gradient checkpointing)

Sampling

Sampling parameters

Model Export

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alucard

Architecture

Two Modes

Installation

Quick Start

Animation: generate frame by frame

Dataset Preparation

1. Prepare sprite images

2. Pre-compute CLIP embeddings

3. Build dataset from public sources (optional)

Training

Training with Docker

Resume training

VRAM usage (with gradient checkpointing)

Sampling

Sampling parameters

Model Export

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages