Temporal IP2P-ControlNet

This project implements a video frame prediction and editing model based on InstructPix2Pix and ControlNet. It is designed to work with the Something-Something V2 (SSv2) dataset.

The model takes a sequence of history frames (e.g., 20 frames) and a text prompt (e.g., "Moving something down") to predict the next frame (Target). It utilizes a Temporal Adapter to aggregate historical context and feeds it into a ControlNet that conditions a frozen InstructPix2Pix UNet.

📂 Project Structure

.
├── config.yaml            # Main configuration file (Create this based on template below)
├── data.py                # SSv2 Dataset loader & transforms
├── model_control_v2.py    # Architecture: TemporalAdapter + IP2P_ControlNet
├── preprocess.py          # Data extraction: Video to Frames + Metadata CSV
├── sample.py              # Selects tasks/IDs from raw SSv2 JSON labels
├── train_control.py       # Main training script (ControlNet + Adapter + LoRA)
├── test_control_v2.py     # Quantitative evaluation (PSNR/SSIM)
├── test_vis.py            # Qualitative evaluation (Visual generation)
└── utils_config.py        # Helper to load config.yaml

🛠 Requirements

Install the required Python packages. It is recommended to use a virtual environment (Conda or venv).

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install diffusers transformers peft accelerate
pip install pandas numpy pillow decord scikit-image matplotlib tqdm pyyaml

📥 Model Weights Download

This project is based on timbrooks/instruct-pix2pix (and AI-ModelScope/instruct-pix2pix in Modelscope). First, download the pretrained InstructPix2Pix weights from Huggingface. Then, place them in a local folder (e.g., models/instruct-pix2pix) and update your config.yaml:

base_model_path: "./models/instruct-pix2pix"

📂 Data Preparation

This project uses the Something-Something V2 dataset.

Step 1: Filter the samples (`sample.py`)

Filter video IDs from raw JSON tags based on specified tasks (such as move_object, drop_object).

Modify the task_plan dictionary in sample.py to define the task.
Run:
```
python sample.py
```
Generate file:selected_samples_train_test_small.json

Step 2: Video Preprocessing (`preprocess.py`)

Segment the video into historical frames and target frames, and generate a Metadata CSV.

Ensure that config.yaml or VIDEO_DIR in the script points to your SSv2 video folder.
Run:
```
python preprocess.py
```

Output directory structure:

processed_dataset/
├── train/
│   └── drop_object/
│       ├── history_images/ (序列帧)
│       └── target_images/  (GT帧)
├── metadata_train.csv
└── metadata_test.csv

🚀 How to run

1. Configuration file

All parameters are managed in config.yaml.

Before running, please check that task_name and the path are set correctly.

2. Training

Start training using train_control.py. The script will automatically load the parameters from config.yaml.

python train_control.py

Training Process Description:

Load the IP2P UNet and VAE (freeze VAE and Text Encoder).
UNet: Fine-tune using LoRA.
ControlNet: Copy the UNet weights for initialization and perform full fine-tuning.
Temporal Adapter: Process the past 20 frames and output conditional features to ControlNet.
Weights are stored in the experiments/{task_name}_{resolution}/ directory.

3. Testing

Use test_control_v2.py or test_vis.py for inference and evaluation.

# Pure indicator calculation (PSNR/SSIM)
python test_control_v2.py
# Indicator calculation + visualization (Recommended)
python test_vis.py

Note:

Ensure that the checkpoint_folder (e.g., checkpoint_epoch_25) in config.yaml exists in the experimental directory.
The test script will generate a comparison chart of input vs prediction vs ground_truth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Temporal IP2P-ControlNet

📂 Project Structure

🛠 Requirements

📥 Model Weights Download

📂 Data Preparation

Step 1: Filter the samples (`sample.py`)

Step 2: Video Preprocessing (`preprocess.py`)

🚀 How to run

1. Configuration file

2. Training

3. Testing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
results		results
README.md		README.md
config.yaml		config.yaml
data.py		data.py
model_control_v2.py		model_control_v2.py
preprocess.py		preprocess.py
sample.py		sample.py
test_control_v2.py		test_control_v2.py
test_vis.py		test_vis.py
train_control.py		train_control.py
utils_config.py		utils_config.py

eseriles/Temporal-IP2P-ControlNet

Folders and files

Latest commit

History

Repository files navigation

Temporal IP2P-ControlNet

📂 Project Structure

🛠 Requirements

📥 Model Weights Download

📂 Data Preparation

Step 1: Filter the samples (sample.py)

Step 2: Video Preprocessing (preprocess.py)

🚀 How to run

1. Configuration file

2. Training

3. Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 1: Filter the samples (`sample.py`)

Step 2: Video Preprocessing (`preprocess.py`)

Packages