FlowC2S: Flowing from Current to Succeeding Frames for Fast and Memory-Efficient Video Continuation

📄 Abstract

This paper introduces a novel methodology for generating fast and memory-efficient video continuations. Our method, dubbed FlowC2S, fine-tunes a pre-trained text-to-video flow model to learn a vector field between the current and succeeding video chunks. Two design choices are key. First, we introduce inherent optimal couplings, utilizing temporally adjacent video chunks during training as a practical proxy for true optimal couplings, resulting in straighter flows. Second, we incorporate target inversion, injecting the inverted latent of the target chunk into the input representation to strengthen correspondences and improve visual fidelity. By flowing directly from current to succeeding frames, instead of the common combination of current frames with noise to generate a video continuation, we reduce the dimensionality of the model input by a factor of two. The proposed method, fine-tuned from LTXV and Wan, surpasses the state-of-the-art scores across quantitative evaluations with FID and FVD, with as few as five neural function evaluations.

📊 Results

Qualitative Examples

Ablation on Design Choices

Ablation on Neural Function Evaluations (NFE) and Number of Frames

Ablation on NFE

Ablation on Long Video Continuation

Failure Cases on Long Video Continuation

🛠️ Installation

Clone the repo and create an environment using the requirements.txt

git clone https://github.com/your-username/flowc2s.git
cd flowc2s
conda env create -f environment.yml -n flowc2s
conda activate flowc2s

⚡ Inference with LTXV-based model

export PYTHONPATH=.
python scripts/infer.py \
  --transformer_path "./models/FlowC2S-ltxv" \
  --logging_dir "./video_continuation_results" \
  --exp_name "FlowC2S" \
  --num_inference_steps 10 \
  --cfg 3.5 \
  --num_frames 41 \
  --height 256 \
  --width 384 \
  --downsample_factor 1 \
  --pretrained_model_name_or_path "Lightricks/LTX-Video-0.9.5" \
  --device "cuda" \
  --inference_pipeline_type RawDataInferencePipelineFlowC2S \
  --max_num_of_generated_videos 2001 \
  --data_path "./datasets/openvid-2k-validation.json" \
  --starting_idx 0 \
  --individual_videos \
  --save_grid \
  --individual_videos

⚙️ Training

Use LTXV Trainer for dataset preparation and precomputation.

Fine-tuning from LTXV

export PYTHONPATH=.
accelerate --mixed_precision bf16 scripts/.py \
  --pretrained_model_name_or_path "Lightricks/LTX-Video-0.9.5" \
  --text_encoder_model_name_or_path "PixArt-alpha/PixArt-XL-2-1024-MS" \
  --logging_dir "logs" \
  --video_init_dataset_root "path/to/precomputed/initial/data/distribution" \
  --video_data_dataset_root "path/to/precomputed/data/data/distribution" \
  --validation_init_video_dataset_root "path/to/validation/precomputed/initial/data/distribution" \
  --validation_data_video_dataset_root "path/to/validation/precomputed/data/data/distribution" \
  --train_dataset_path "" \
  --validation_dataset_path "path/to/validation/val.json" \
  --video_reshape_mode  "center" \
  --output_dir "./experiments/FlowC2S" \
  --caption_column "caption" \
  --video_column "video" \
  --tracker_name "FlowC2S" \
  --seed 779878798 \
  --seed_x1 4324421 \
  --mixed_precision bf16 \
  --transformer_dtype f32 \
  --validation_height 256 --validation_width 384 --fps 25 --validation_num_frames 45 --skip_frames_start 0 --skip_frames_end 0 \
  --max_num_frames 145 \
  --height_buckets 256 \
  --width_buckets 384 \
  --frame_buckets 45 \
  --frame_rate 25 \
  --train_batch_size 64 \
  --validation_batch_size 1 \
  --max_train_steps 1450 \
  --checkpointing_steps 38 \
  --validation_steps 38 \
  --gradient_accumulation_steps 8 \
  --gradient_checkpointing \
  --learning_rate 2e-4 \
  --lr_scheduler linear \
  --lr_warmup_steps 30 \
  --lr_num_cycles 1 \
  --optimizer adamw \
  --adam_beta1 0.9 \
  --adam_beta2 0.99 \
  --max_grad_norm 1.0 \
  --validation_negative_prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
  --validation_num_inference_steps 50 \
  --validation_guidance_scale 3.5 \
  --validation_strength 2.0 \
  --report_to wandb \
  --dataset_type "precomputed" \
  --sigma_sampler_type "ShiftedLogitNormalTimestepSampler" \
  --dataloader_num_workers 20 \
  --validation_only_caption \
  --dist_regularization_prob 0.7 \
  --offload

Citations

If FlowC2S contributes to your research, please cite the paper:

@article{margaryan2026flowc2s,
  title   = {FlowC2S: Flowing from Current to Succeeding Frames for Fast and Memory-Efficient Video Continuation},
  author  = {Hovhannes Margaryan and Quentin Bammey and Christian Sandor},
  journal = {arXiv preprint arXiv:2604.17625},
  year    = {2026},
}

Acknowledgments

We thank the authors of Diffusers and LTX-Video-Trainer for their valuable open-source contributions. We also acknowledge the broader open-source ecosystem (e.g., PyTorch, Hugging Face, etc.) that made our research possible.

Contact

This repository is under development. If you encounter any issues or have questions, please open a GitHub issue or reach out via email at marg.hovo@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
assets		assets
core		core
datasets		datasets
inference		inference
pipelines		pipelines
schedulers		schedulers
scripts		scripts
utils		utils
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowC2S: Flowing from Current to Succeeding Frames for Fast and Memory-Efficient Video Continuation

📄 Abstract

📊 Results

Qualitative Examples

Ablation on Design Choices

Ablation on Neural Function Evaluations (NFE) and Number of Frames

Ablation on NFE

Ablation on Long Video Continuation

Failure Cases on Long Video Continuation

🛠️ Installation

⚡ Inference with LTXV-based model

⚙️ Training

Fine-tuning from LTXV

Citations

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlowC2S: Flowing from Current to Succeeding Frames for Fast and Memory-Efficient Video Continuation

📄 Abstract

📊 Results

Qualitative Examples

Ablation on Design Choices

Ablation on Neural Function Evaluations (NFE) and Number of Frames

Ablation on NFE

Ablation on Long Video Continuation

Failure Cases on Long Video Continuation

🛠️ Installation

⚡ Inference with LTXV-based model

⚙️ Training

Fine-tuning from LTXV

Citations

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages