In [None]:
from experiments.viz.viz_grids import *
# from experiments.viz.guidance_shedulers_shape_visualization import *
import PIL

In [None]:
plot_taxonomy_v2()

# Experiments

## A Core sweep

**Goal**: Broad schedule sweep (standard ranges)

**Setup**: multiple prompts × multiple seeds × monotone/non-monotone schedule families at typical CFG ceilings.

**Hypothesis**:

# Simple schedules with the Prompt: "A man riding a horse on mars"
## The range scheduling Here is small, we tried with w_min = 1.0 and w_max = 7.5, Where the baseline output of the model using constatn cfg with w = 7.5. we can observe that even with such small range the schedule could have a significant effect on the output, as in the decrease schedules we see the output become confused and mess with the details, while in the increase schedules we can observer that the output is more clear.

In [None]:
decreasing_schedule = PIL.Image.open("grids_to_present/man_riding_decrease_schedule_grid.png")
display(decreasing_schedule)

In [None]:
increasing_schedule = PIL.Image.open("grids_to_present/man_riding_increase_schedule_grid.png")
display(increasing_schedule)

Anyway we observe here that we got similar outputs across different schedule families when the direction and range are the same, suggesting that curvature is a second-order effect compared to direction and range. (Beside the parabolic schedule that become confused in also in the increasing direction)

In [None]:
EXPERIMENT = "EXP_A_CORE_SWEEP"

kinds = list(SCHEDULER_PROPERTIES.keys())
directions = ["increasing", "decreasing"]
weight_ranges = [(1.0, 7.5)]
seeds = [42, 1337, 2000]
num_steps = [25]


prompts = {
    "P_DETAIL": "A close-up portrait of a cyberpunk robot with intricate clockwork gears and neon eyes, macro photography",
    "P_COMP":   "A symmetrical wide shot of a lone tree in a snowy field, Wes Anderson style, pastel colors",
    "P_TEXT":   "A wooden sign hanging on a door that says 'CLOSED' clearly carved into the wood"
}

spec = ExperimentSpec(
    experiment_group=EXPERIMENT,
    kinds=kinds,
    directions=directions,
    weight_ranges=weight_ranges,
    prompts=prompts,
    seeds=seeds,
    num_steps=num_steps,
    metadata={"notes": "baseline sweep", "model": "SD1.5"},
    num_workers=3
)

df_a = build_and_write(spec, f"{EXPERIMENT}_plan.csv")
df_a

Observations:
1. for monotone schedules, outputs are largely similar across families when monotonicity and ranges are matched, suggesting that schedule curvature is a second-order effect compared to direction and range.

## B Fidelity-Alignment Tradeoff: on visually verifiable features

## Nuemrical


---

### Experiment 2: Constraint Satisfaction and Fidelity Trade-offs

**1. Experimental Setup**
To evaluate the impact of time-dependent guidance on constraint satisfaction (specifically numeric counting) and overall image fidelity, we tested two prompts demanding exact quantities:

* "masterpiece, best quality, 3 red apples on a wooden table, studio lighting, sharp focus"


* "masterpiece, best quality, 5 ceramic vases on a minimalist shelf, distinct shapes, clean shadows"



We evaluated two random seeds (42, 1337) across five schedules: baseline (constant), cosine, exponential, linear, and sigmoid. We tested both increasing and decreasing $w(t)$ functions across four guidance ranges: Low (1.0–2.0), Standard (1.0–12.0), High (7.0–50.0), and Extreme (7.0–100.0).


In [None]:
import PIL
num_app_dec_sch_42 = PIL.Image.open("grids_to_present/Numeric_grids/APPLES_COUNT_decreasing_42.png")
display(num_app_dec_sch_42)

In [None]:
num_app_dec_sch_1337 = PIL.Image.open("grids_to_present/Numeric_grids/APPLES_COUNT_decreasing_1337.png")
display(num_app_dec_sch_1337)

In [None]:
num_app_inc_sch_42 = PIL.Image.open("grids_to_present/Numeric_grids/APPLES_COUNT_increasing_42.png")
display(num_app_inc_sch_42)

In [None]:
num_app_inc_sch_1337 = PIL.Image.open("grids_to_present/Numeric_grids/APPLES_COUNT_increasing_1337.png")
display(num_app_inc_sch_1337)

In [None]:
num_vase_dec_sch_42 = PIL.Image.open("grids_to_present/Numeric_grids/VASE_COUNT_decreasing_42.png")
display(num_vase_dec_sch_42)

In [None]:
num_vase_dec_sch_1337 = PIL.Image.open("grids_to_present/Numeric_grids/VASE_COUNT_decreasing_1337.png")
display(num_vase_dec_sch_1337)

In [None]:
num_vase_inc_sch_42 = PIL.Image.open("grids_to_present/Numeric_grids/VASE_COUNT_increasing_42.png")
display(num_vase_inc_sch_42)

In [None]:
num_vase_inc_sch_1337 = PIL.Image.open("grids_to_present/Numeric_grids/VASE_COUNT_increasing_1337.png")
display(num_vase_inc_sch_1337)


**2. Results by Guidance Range**

* **Low Guidance Range (1.0 – 2.0):**

**Baseline:** Images exhibited poor fidelity, appearing blurry or washed out with gray tones. Prompt alignment failed completely; the model over-generated objects, producing 5 apples instead of 3, and 6 to 9 vases instead of 5.



**Decreasing:** Results closely mirrored the low-quality baseline.



**Increasing:** These schedules drastically reduced object counts, often dropping to just 1 apple or 4 vases, and produced weird, mixed shapes or giant objects while remaining blurry.




* **Standard Guidance Range (1.0 – 12.0):**

**Baseline (CFG = 12):** This range produced the best fidelity, yielding sharp, high-quality images. However, strict counting constraints still failed; the model generated 3 to 5 apples and 7 to 8 vases.



**Decreasing:** Handled the prompt very similarly to the baseline, maintaining good quality.



**Increasing:** Exhibited delayed fidelity convergence. The exponential and sigmoid schedules remained very blurry. The linear and cosine schedules began to achieve sharpness and were better at forming distinct vase shapes than in the lower range.




* **High Guidance Range (7.0 – 50.0):**

**Baseline (CFG = 50):** Pushing CFG this high caused significant fidelity degradation. Images featured glowing colors, cyan artifacts, extremely blurry backgrounds (e.g., a "big sun"), and distorted vases. Object counts remained inaccurate (e.g., 4 apples, 6 to 7 vases).



**Decreasing:** Largely inherited the severe distortions, color mixtures, and artifacts of the CFG 50 baseline. The exponential and linear schedules occasionally mitigated some background blur or mixture colors.



**Increasing:** **Notably, these schedules acted as a strong regularizer.** The results looked surprisingly similar to the much lower CFG 12 baseline, producing nice arrangements of 5 apples or 7 to 8 vases. The primary trade-off was minor background distortion, such as the introduction of water drops or "brownie blackie" artifacts, rather than total color collapse.




* **Extreme Guidance Range (7.0 – 100.0):**
* Across the board, a $w_{max}$ of 100 resulted in severe fidelity degradation. However, it was only at this extreme baseline (CFG = 100) and its corresponding decreasing schedules that the model finally satisfied the strict numeric constraint of exactly "3 red apples" for one of our seeds. This constraint satisfaction came at the absolute cost of image quality, producing heavily distorted shapes, deep color bleeding (e.g., images turning entirely red, white background logos), and strange artifacts. Increasing schedules similarly failed to rescue the generation, yielding highly distorted images with water drop artifacts.




**3.The Fidelity vs. Alignment Trade-off** <br>
The empirical results clearly illustrate the severe trade-off between constraint satisfaction and image fidelity. While moderate guidance ranges (like CFG 12) produced the highest quality and sharpest images, they consistently failed strict numeric constraints, over-generating or under-generating objects.

We observed that time-dependent guidance modulates this trade-off curve. Decreasing schedules heavily anchor to the initial high guidance; if $w_{max}$ is pushed to extremes (e.g., 100), the model can occasionally force the correct object count, but the early structural distortions become baked in, and the subsequent lower guidance cannot recover the fidelity.
baked in, and the subsequent lower guidance cannot recover the fidelity.

Conversely, increasing schedules show that easing into higher guidance weights can bypass the severe color burn and structural collapse seen in constant high CFG, effectively mirroring lower CFG layouts (like CFG 12) even when reaching a $w_{max}$ of 50. However, this smoother convergence fails to enforce the strict numeric counts that the high CFG was intended to achieve. Ultimately, achieving absolute constraint alignment (e.g., exactly 3 apples) forces a total sacrifice of realism, suggesting that numeric counting remains a fundamental architectural limitation that time-dependent CFG alone cannot seamlessly fix.

---


---
## Experiment 3: Constraint Satisfaction (Color Binding) vs. Fidelity

**Objective:** To evaluate how dynamically changing the Classifier-Free Guidance scale, $w(t)$, over different intervals affects the diffusion model's ability to strictly bind colors to distinct objects (prompt alignment) without introducing visual artifacts, color bleeding, or structural collapse.

### Prompts: "A green sports car parked next to a red fire hydrant." <br> , "A solid yellow coffee mug resting on a bright blue book.",<br> "A man wearing a bright purple beanie and a neon orange winter jacket."

In [None]:
### Present all the grids in the "grids_to_present/Color_grids" directory

In [None]:
col_car_dec_sch_42 = PIL.Image.open("grids_to_present/Color_grids/CAR_HYDRANT_decreasing_42.png")
display(col_car_dec_sch_42)
col_car_dec_sch_1337 = PIL.Image.open("grids_to_present/Color_grids/CAR_HYDRANT_decreasing_1337.png")
display(col_car_dec_sch_1337)
col_car_inc_sch_42 = PIL.Image.open("grids_to_present/Color_grids/CAR_HYDRANT_increasing_42.png")
display(col_car_inc_sch_42)
col_car_inc_sch_1337 = PIL.Image.open("grids_to_present/Color_grids/CAR_HYDRANT_increasing_1337.png")
display(col_car_inc_sch_1337)


In [None]:
col_mug_dec_sch_42 = PIL.Image.open("grids_to_present/Color_grids/MUG_BOOK_grid_decreasing_vs_weight_range_seed_42.png")
display(col_mug_dec_sch_42)
col_mug_dec_sch_1337 = PIL.Image.open("grids_to_present/Color_grids/MUG_BOOK_grid_decreasing_vs_weight_range_seed_1337.png")
display(col_mug_dec_sch_1337)
col_mug_inc_sch_42 = PIL.Image.open("grids_to_present/Color_grids/MUG_BOOK_grid_increasing_vs_weight_range_seed_42.png")
display(col_mug_inc_sch_42)
col_mug_inc_sch_1337 = PIL.Image.open("grids_to_present/Color_grids/MUG_BOOK_grid_increasing_vs_weight_range_seed_1337.png")
display(col_mug_inc_sch_1337)


In [None]:
col_jacket_dec_sch_42 = PIL.Image.open("grids_to_present/Color_grids/HAT_JACKET_decreasing_vs_weight_range_seed_42.png")
display(col_jacket_dec_sch_42)
col_jacket_dec_sch_1337 = PIL.Image.open("grids_to_present/Color_grids/HAT_JACKET_decreasing_1337.png")
display(col_jacket_dec_sch_1337)
col_jacket_inc_sch_42 = PIL.Image.open("grids_to_present/Color_grids/HAT_JACKET_increasing_42.png")
display(col_jacket_inc_sch_42)
col_jacket_inc_sch_1337 = PIL.Image.open("grids_to_present/Color_grids/HAT_JACKET_increasing_1337.png")
display(col_jacket_inc_sch_1337)



#### 1. Low Guidance Range: $w(t) \in [1.0, 2.0]$

At consistently low guidance scales, the model struggles to establish clear shapes or distinct colors, prioritizing an overly smooth but semantically weak generation.

*
**Decreasing Schedules:** Results across all seeds are blurry, with objects either merging or remaining abstract.


*
**Color Binding:** Colors fail to bind to specific subjects, resulting in split, blurry color patches in the background, such as blurry orange shapes replacing the jacket and beanie.


*
**Increasing Schedules:** The outputs are similarly distorted and very blurry, often failing to form coherent objects like the fire hydrant or distinct facial features.



#### 2. Standard Guidance Range: $w(t) \in [1.0, 12.0]$

This range highlights the stark difference in generation quality between the directionality of the schedules.

*
**Baseline ($w = 7.5$):** The baseline achieves sharp imagery but frequently fails the color binding constraint, mixing colors (e.g., generating a closed purple jacket instead of orange) or hallucinating split books.


*
**Decreasing Schedules:** These schedules effectively delay color collapse and often yield better structural fidelity than the baseline, creating a completely yellow mug and more sport-like cars.


*
**Increasing Schedules:** Conversely, increasing schedules in this range fail to resolve the latent effectively, resulting in blurry, distorted shapes that strongly resemble the poor outputs of the $[1.0, 2.0]$ range.



#### 3. High Guidance Range: $w(t) \in [7.0, 50.0]$

Pushing $w_{max}$ to 50.0 forces the model to heavily prioritize the prompt, but it introduces severe tension with image fidelity.

*
**Decreasing Schedules:** This direction causes massive color bleeding and unphotorealistic artifacts.


*
**Color Bleed:** The prompt constraint overpowers the spatial boundaries, causing purple to bleed onto lips, the beanie to shift to orange-yellow , and backgrounds to glow with intense yellow or distort into bright blue shapes.


*
**Increasing Schedules:** Interestingly, increasing schedules handle the extreme upper bound much better, maintaining nice results similar to the baseline but with increased sharpness.



#### 4. Extreme Guidance Range: $w(t) \in [7.0, 100.0]$

At this extreme, the CFG weight entirely shatters image fidelity, resulting in deep structural collapse across nearly all configurations.

*
**Constant High Bounds ($w_{max} = 100.0$):** Results are heavily distorted, reducing objects to abstract cyan books or abstract car shapes with police-style red lights.


*
**Decreasing Schedules:** The color binding fails completely, generating surreal artifacts such as glowing cyan eyes , black mugs , and glowing yellow backgrounds.


*
**Increasing Schedules:** While slightly retaining the structure of the $[7.0, 50.0]$ range, these outputs suffer from messy, glowing backgrounds and unprompted artifacts like water drops covering the image.



**Summary:** Decreasing schedules in standard ranges $[1.0, 12.0]$ offer the best strict color binding while preserving fidelity. However, when forcing high constraint satisfaction at elevated ranges ($w_{max} \ge 50.0$), increasing schedules significantly delay color collapse and bleeding compared to decreasing counterparts.

---



## Experiment 2: Constraint Satisfaction (Spatial Relations) vs. Fidelity

This experiment evaluates the capacity of time-dependent guidance schedules to enforce complex spatial constraints—specifically vertical stacking and inversion—across three distinct prompts and two seeds ($42, 1337$). The primary focus is the trade-off between **spatial adherence** (correctly placing objects in relation to one another) and **image fidelity** (photorealism and structural integrity).

---
### Prompts: "A modern sports car parked perfectly balancing on the pitched roof of a suburban house.", "A small wooden chair placed directly on top of a dining table.", "A large, heavy couch suspended completely upside down from a living room ceiling."


In [None]:
### Present all the grids in the "grids_to_present/Spatial_grids" directory
import PIL

In [None]:
spatial_car_dec_sch_42 = PIL.Image.open("grids_to_present/Spatial_grids/CAR_ROOF_decreasing_42.png")
display(spatial_car_dec_sch_42)
spatial_car_dec_sch_1337 = PIL.Image.open("grids_to_present/Spatial_grids/CAR_ROOF_decreasing_1337.png")
display(spatial_car_dec_sch_1337)
spatial_car_inc_sch_42 = PIL.Image.open("grids_to_present/Spatial_grids/CAR_ROOF_increasing_42.png")
display(spatial_car_inc_sch_42)
spatial_car_inc_sch_1337 = PIL.Image.open("grids_to_present/Spatial_grids/CAR_ROOF_increasing_1337.png")
display(spatial_car_inc_sch_1337)


In [None]:
spatial_chair_dec_sch_42 = PIL.Image.open("grids_to_present/Spatial_grids/CHAIR_TABLE_decreasing_42.png")
display(spatial_chair_dec_sch_42)
spatial_chair_dec_sch_1337 = PIL.Image.open("grids_to_present/Spatial_grids/CHAIR_TABLE_decreasing_1337.png")
display(spatial_chair_dec_sch_1337)
spatial_chair_inc_sch_42 = PIL.Image.open("grids_to_present/Spatial_grids/CHAIR_TABLE_increasing_42.png")
display(spatial_chair_inc_sch_42)
spatial_chair_inc_sch_1337 = PIL.Image.open("grids_to_present/Spatial_grids/CHAIR_TABLE_increasing_1337.png")
display(spatial_chair_inc_sch_1337)


In [None]:
spatial_couch_dec_sch_42 = PIL.Image.open("grids_to_present/Spatial_grids/COUCH_CEILING_decreasing_42.png")
display(spatial_couch_dec_sch_42)
spatial_couch_dec_sch_1337 = PIL.Image.open("grids_to_present/Spatial_grids/COUCH_CEILING_decreasing_1337.png")
display(spatial_couch_dec_sch_1337)
spatial_couch_inc_sch_42 = PIL.Image.open("grids_to_present/Spatial_grids/COUCH_CEILING_increasing_42.png")
display(spatial_couch_inc_sch_42)
spatial_couch_inc_sch_1337 = PIL.Image.open("grids_to_present/Spatial_grids/COUCH_CEILING_increasing_1337.png")
display(spatial_couch_inc_sch_1337)



### 1. Low Guidance Range: CFG [1.0, 2.0]

At this near-baseline range, the model consistently fails to satisfy spatial constraints, prioritizing global coherence over prompt-specific positioning.

* **Spatial Composition:** Objects remain in their conventional, "gravity-compliant" positions. For example, the sports car is parked in the yard and the chair is placed alongside the table.


*
**Fidelity:** Results are universally **blurry and indistinct** across all schedules (Decreasing, Increasing, and Constant).


*
**Schedule Impact:** Increasing schedules show slightly more distortion in ceiling structures compared to decreasing ones, but neither succeeds in triggering the requested spatial "anomaly".



### 2. Standard Guidance Range: CFG [1.0, 12.0]

This range approximates the baseline ($7.5$) and represents the threshold where the model begins to attempt spatial repositioning, though often unsuccessfully.

* **Spatial Composition:** The model struggles with "merging." In the chair-on-table prompt, "middle versions" of objects appear, where a stool attempts to complete itself into a chair but remains ground-bound. For the couch prompt, the model often generates a sofa hanging like a hammock rather than being "completely upside down".


*
**Fidelity:** Sharpness improves significantly compared to the Low range.


*
**Schedule Impact:** **Increasing schedules** (Linear and Triangular) show the first signs of successful spatial shifting, such as a blue car appearing on a roof (Seed 1337), whereas **Decreasing schedules** tend to mirror the baseline's failure to move objects out of their standard positions.



### 3. High Guidance Range: CFG [7.0, 50.0]

As guidance increases, a sharp divergence emerges between spatial intent and aesthetic quality.

* **Spatial Composition:** This range shows the most "effort" toward constraint satisfaction. In the couch prompt, Linear and Triangular increasing schedules produce results that look "more upside-down" than the baseline. However, this often leads to **spatial merging**, where the table and chair fuse into a single distorted wired shape.


* **Fidelity:** Photorealism begins to degrade. Images adopt a "90s brown style" or a "painty" aesthetic.


*
**Schedule Impact:** **Increasing schedules** preserve image structure better, yielding "sharp and sunny" lighting. Conversely, **Decreasing schedules** and high **Max-Constants** ($50.0$) result in significant artifacts, such as cars becoming "silver blobs" or furniture becoming unidentifiable.



### 4. Extreme Guidance Range: CFG [7.0, 100.0]

At the highest guidance levels, the pressure to satisfy spatial constraints completely overwhelms the model’s ability to maintain a coherent image.

*
**Spatial Composition:** While there are more aggressive "attempts" to flip the couch or stack the chair, the objects lose their semantic identity. A lamp might morph into a "floating backrest" or a "duck-shape" rather than staying a separate object.


*
**Fidelity:** Images are described as **unphotorealistic, distorted, and "wired"**. Extreme guidance often introduces "drops of light dots" or "color splashing" that obliterates fine detail.


*
**Schedule Impact:** **Decreasing schedules** at this range are catastrophic for fidelity, producing abstract, "painty" results. **Increasing schedules** act as a minor buffer, maintaining more sharpness and "attempting" the spatial rule longer, but ultimately fail to produce a usable, realistic image.



---

### Summary of Trade-offs

| Range | Spatial Success | Image Fidelity | Observation |
| --- | --- | --- | --- |
| **Low** | None | Very Low (Blurry) | Insufficient guidance for any structure. |
| **Standard** | Minimal | High (Sharp) | Objects remain in default, "logical" positions. |
| **High** | Moderate | Moderate (Painty) | Spatial rules are attempted but cause object merging. |
| **Extreme** | High (Attempted) | Very Low (Distorted) | Constraint satisfaction destroys semantic integrity. |

**Conclusion:** **Increasing schedules** (Linear/Triangular) are superior for preserving structural clarity while attempting spatial constraints. However, even with optimized schedules, the model exhibits a hard "spatial-fidelity ceiling": it would rather distort the physical form of an object (merging a chair into a table) than violate the learned probabilistic "grounding" of that object in a scene.


In [None]:
##

### Spatial

### Color

In [None]:
...

# C