# üé® USO: Unified Style and Subject-Driven Generation

[![GitHub](https://img.shields.io/static/v1?label=GitHub&message=Code&color=green&logo=github)](https://github.com/bytedance/USO)
[![Project Page](https://img.shields.io/badge/Project%20Page-USO-yellow)](https://bytedance.github.io/USO/)
[![arXiv](https://img.shields.io/badge/arXiv%20paper-USO-b31b1b.svg)](https://arxiv.org/abs/2508.18966)
[![Hugging Face Model](https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=orange)](https://huggingface.co/bytedance-research/USO)

This notebook runs the **USO Gradio Web App** in Google Colab with GPU acceleration.

**USO** is a unified framework for style-driven and subject-driven image generation that can freely combine any subjects with any styles in any scenarios.

## Features
- **Subject/Identity-driven generation**: Place subjects into new scenes
- **Style-driven generation**: Generate images matching a given style
- **Style-subject driven generation**: Combine content and style references
- **Multi-style generation**: Blend multiple style references

---
‚ö†Ô∏è **Important**: This notebook requires a GPU runtime. Go to **Runtime ‚Üí Change runtime type ‚Üí GPU (T4 or higher)**.

## 1. Setup Environment

First, we'll clone the repository and install the required dependencies.

In [None]:
# Clone the USO repository
!git clone https://github.com/bytedance/USO.git
%cd USO

In [None]:
# Install PyTorch with CUDA support
!pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

# Install project dependencies
!pip install -r requirements.txt

## 2. Configure Hugging Face Token

You need a Hugging Face token to download the FLUX.1-dev model weights.

1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) and create a token
2. Accept the [FLUX.1-dev license](https://huggingface.co/black-forest-labs/FLUX.1-dev)
3. Enter your token below

In [None]:
from getpass import getpass
import os

# Securely input your Hugging Face token
HF_TOKEN = getpass("Enter your Hugging Face token: ")
os.environ["HF_TOKEN"] = HF_TOKEN

# Write the .env file
env_content = f"""# Hugging face token
HF_TOKEN={HF_TOKEN}

# Core Flux weights
FLUX_DEV=./weights/FLUX.1-dev/flux1-dev.safetensors
FLUX_DEV_FP8=./weights/FLUX.1-dev/flux1-dev.safetensors
AE=./weights/FLUX.1-dev/ae.safetensors

# Text + vision encoders
T5=./weights/t5-xxl
CLIP=./weights/clip-vit-l14
LORA=./weights/USO/uso_flux_v1.0/dit_lora.safetensors

# USO LoRA + projector
PROJECTION_MODEL=./weights/USO/uso_flux_v1.0/projector.safetensors
SIGLIP_PATH=./weights/siglip
"""

with open(".env", "w") as f:
    f.write(env_content)

print("‚úÖ .env file created successfully!")

## 3. Download Model Weights

This will download all necessary model weights from Hugging Face. This may take several minutes depending on your connection speed.

**Models being downloaded:**
- FLUX.1-dev (main diffusion model)
- USO LoRA weights and projector
- T5-XXL (text encoder)
- CLIP ViT-L/14 (text encoder)
- SigLIP (vision encoder)

In [None]:
# Download all required model weights
!python ./weights/downloader.py

print("\n‚úÖ All model weights downloaded successfully!")

## 4. Launch the Gradio Web App

Now we'll launch the USO Gradio interface. The app will be accessible via a public URL.

**Options:**
- `--offload`: Enables sequential offloading of models to CPU when not in use (reduces VRAM usage)
- `--name flux-dev-fp8`: Uses FP8 quantized model for lower memory usage (~16-18GB VRAM)

For Colab T4 GPU (16GB VRAM), we use the memory-efficient configuration.

In [None]:
# Launch the Gradio app with public sharing enabled
# Using offload mode for lower VRAM usage on Colab
import subprocess
import threading
import time

# Run the app in the background
!python app.py --offload --name flux-dev-fp8 --port 7860 &

# Wait for the server to start
time.sleep(10)

# Create a public URL using localtunnel
!npm install -g localtunnel
!npx localtunnel --port 7860

### Alternative: Launch with Gradio's Built-in Sharing

If localtunnel doesn't work, you can modify `app.py` to enable Gradio's built-in sharing.

In [None]:
# Alternative method: Enable Gradio sharing directly
# This modifies app.py to use share=True

import re

# Read the original app.py
with open("app.py", "r") as f:
    content = f.read()

# Modify the launch line to include share=True
content = content.replace(
    "demo.launch(server_port=args.port)",
    "demo.launch(server_port=args.port, share=True)"
)

# Write the modified app.py
with open("app.py", "w") as f:
    f.write(content)

print("‚úÖ Modified app.py to enable public sharing")
print("\nüöÄ Now run the cell below to start the app with a public Gradio URL")

In [None]:
# Run the app with Gradio sharing enabled
!python app.py --offload --name flux-dev-fp8

## üìù Usage Tips

### Model Supports 3 Types of Usage:

**1. Only Content Image (Subject/Identity-driven)**
- Use natural prompts like "A dog on the beach" or "The woman near the sea"
- For style editing: "Transform the image into Ghibli style"

**2. Only Style Image (Style-driven)**
- Upload one or more style reference images
- Use any prompt to generate content in that style

**3. Content + Style Images (Style-subject driven)**
- Layout-preserved: Set prompt to **empty**
- Layout-shifted: Use a natural prompt

### Best Practices:
- For portraits, use half-body close-ups for half-body prompts
- Use full-body images when the pose changes significantly
- The model is trained on 1024x1024 resolution

---

‚≠ê If you find USO helpful, please star the [GitHub repository](https://github.com/bytedance/USO)!

## üîß Troubleshooting

### Out of Memory (OOM) Errors
- Make sure you're using `--offload --name flux-dev-fp8` flags
- Reduce the image resolution (e.g., 768x768 instead of 1024x1024)
- Restart the runtime to clear GPU memory

### Model Download Issues
- Verify your Hugging Face token is correct
- Make sure you've accepted the FLUX.1-dev license on Hugging Face
- Check your internet connection

### GPU Not Detected
- Go to **Runtime ‚Üí Change runtime type ‚Üí GPU**
- If no GPU is available, wait and try again later