# Latent Vandalism: The Joy of Productive Damage to Text-to-Image Synthesis Pipelines

**Workshop by Laura Wagner**

ðŸ”— [laurajul.github.io](https://laurajul.github.io/)  
ðŸ“¦ [Workshop Repository](https://github.com/laurajul/latent-vandalism-workshop)

---

## Abstract

Text-to-image models have evolved into sophisticated engines of template culture (Grund and Scherffig), systems trained to reproduce standardized aesthetics. Fatigued by the constant flood of polished results and the arms race for images benchmarked on visual coherence, commercial value and consumer-friendliness, this workshop explores once again the charm of AI weirdness (Shane) - the failure in generative AI and the epistemic value of productive damage.

Drawing inspiration from glitch studies (Menkman), we embrace glitches and artifacts as revelatory moments. Through gently violating the consumer-friendly, polished norms meant to please, we surface the model's implicit assumptions about how things are supposed to look. Participants will work directly with Diffusion Transformers (DiT), focusing on the role of **embeddings** in image-text correlation from embedding space to latent space back into pixel space. Through hands-on meddling with the pipeline, we'll systematically **damage** and **reconfigure** the **semantic substrate** that guides image generation, deliberately perturbing inputs to understand this system's sensitivity and dynamics.

This counterfactual, gently adversarial approach, positions productive damage as a research method. Through **iatrogenic techniques** performed on text-to-image models, we probe the layers of technological inscription (Latour) embedded in these systems. Values, design choices, and visual norms inscribed become legible where the system breaks down. By deliberately coaxing the model into failure, we trace the contours of what has been encoded into them.

---

## Very high level Diagram of the text-to-image pipeline



```mermaid
graph LR
    A[Text Prompt] --> B[Embedding Space]
    B --> C[Latent Space]
    C --> D[Pixel Space]
    
    style A stroke:#e1f5ff
    style B stroke:#fff4e1
    style C stroke:#ffe1f5
    style D stroke:#d4edda
```
---

1. **Embedding Space** (High-dimensional semantic vectors)
   - Where text meaning is encoded numerically from tokens

2. **Latent Space** (Compressed image representation)
   - Where diffusion actually happens
   - Much smaller than pixel space (e.g., 64Ã—64Ã—16 instead of 1024Ã—1024Ã—3)
   - Embeddings guide the denoising process here

3. **Pixel Space** (Final RGB image)
   - The inference result
   - Decoded from latent space by VAE
---

### Workshop Focus:

```mermaid
graph TD
    A[Text Prompt] --> B[Text Encoders]
    B --> C[Embedding Space]
    
    C -.->|WE INTERVENE HERE| D[Modified Embeddings]
    
    D --> E[Diffusion in Latent Space]
    E --> F[VAE Decoder]
    F --> G[Pixel Space / Image]
    
    H[Random Noise] --> E
    
    style C stroke:#fff4e1
    style D stroke:#ffcccc
    style E stroke:#ffe1f5
    style G stroke:#d4edda
```

---


## Key Differences: SD 3.5 vs FLUX-Schnell

| Aspect | SD 3.5 | FLUX-Schnell |
|--------|--------|-------------|
| **Text Encoders** | T5-XXL, CLIP-L, CLIP-G | T5-XXL, CLIP-L |
| **T5 Sequence Length** | 77 tokens | 512 tokens (longer context!) |
| **Pooled Embeddings** | CLIP-L + CLIP-G (2048 dims) | CLIP-L only (768 dims) |
| **Architecture** | MMDiT (Multimodal Diffusion Transformer) | FLUX Transformer |
| **Denoising Process** | Standard diffusion (20-50 steps) | Flow matching (4 steps) |
| **Embedding Usage** | Cross-attention + AdaLN modulation | Attention + guidance embedding |
|**Speed** | Slower (more steps) | Faster (fewer steps) |



Also: No negative embeddings in FLUX!!!

#
#

```mermaid
graph TD
    subgraph "Embedding Vandalism"
    
        A[Original Prompt] --> B[Generate Embeddings]
        B --> C[Save to JSON]
        C --> D{Vandalism Techniques}
        
        D --> E[Scaling]
        D --> F[Inversion]
        D --> G[Contradictory T5 and CLIP]
        D --> H[Replace weights with zeroes]
        
        E --> J[Damaged Embeddings]
        F --> J
        G --> J
        H --> J
  
    end
    
    subgraph "Effect on Latent Diffusion: Productive Failures"
        J --> K[Shortcut Pipeline]
        
        K --> L[Transformer Attention<br/>with corrupted guidance]
    end
    
    
    style J stroke:#ffcccc
    style L stroke:#ffe1f5
```

## Summary

### Embedding Dimensions:
- **T5-XXL**: 77 tokens Ã— 4096 dimensions (SD 3.5) / 512 tokens Ã— 4096 (FLUX)
- **CLIP-L**: 77 tokens Ã— 768 dimensions + 768 pooled
- **CLIP-G**: 77 tokens Ã— 1280 dimensions + 1280 pooled (SD 3.5 only)

### How They're Used:
1. **Text embeddings** (concatenated) â†’ Cross-attention in transformer
2. **Pooled embeddings** (concatenated) â†’ Global conditioning (AdaLN/guidance)

### Workshop Method:
- **Productive damage** as epistemological tool
- **Iatrogenic techniques** to probe system boundaries
- **Glitch aesthetics** as revelatory moments
- **Counterfactual experiments** to understand inscription

### What Vandalism Reveals:
- Direct manipulation bypasses text encoding limitations
- Systematic damage exposes training data biases
- Failures make visible the inscribed norms and assumptions
- AI weirdness provides epistemic value beyond polish
- Each encoder contributes distinct, separable semantic information
- Template culture's boundaries become legible where it breaks

---

**Remember:** The goal isn't to make "better" imagesâ€”it's to understand what "better" means to these systems, and to find creative freedom in the spaces where that definition breaks down.

## Theoretical Framework: References

This workshop draws on several theoretical traditions:

### Glitch Studies
- **Menkman, Rosa.** *The Glitch Moment(um)*. Network Notebooks, 2011.
  - Glitches as revelatory moments that expose normally invisible structures
  - Productive failures as aesthetic and epistemic resources

### Template Culture
- **Grund, Katja and Scherffig, Lasse.** Work on template culture and standardized aesthetics in generative AI
  - How models reproduce homogeneous visual languages
  - The political economy of aesthetic standardization

### AI Weirdness
- **Shane, Janelle.** Research on AI failures and unexpected behaviors
  - The epistemic value of AI mistakes
  - How failures reveal system structure

### Science and Technology Studies
- **Latour, Bruno.** "Technology is society made durable." *Sociological Review*, 1990.
  - Technological inscription: How values and choices become embedded in systems
  - Making visible the social and political dimensions of technical artifacts

### Iatrogenic Methods
- Medical concept of harm caused by treatment itself, repurposed as deliberate intervention
  - Systematic damage as research methodology
  - Counterfactual reasoning through controlled failures

---

### About This Workshop

**Workshop by Laura Wagner**

ðŸ”— Website: [laurajul.github.io](https://laurajul.github.io/)  
ðŸ“¦ Repository: [github.com/laurajul/latent-vandalism-workshop](https://github.com/laurajul/latent-vandalism-workshop)

For questions, feedback, or collaborations on productive damage to generative AI systems, please reach out via the website or repository.

---

*"The charm of AI weirdness is not just in the strange outputs, but in what those outputs reveal about the system that produced them."*