# Image Stereoscopic Reconstruction (Pipeline)

In this notebook, we explore the process of image translation, in order to obtain a frontal view of an architectural object from the corresponding lateral view, with possible image enhancements (inclusion of new details, inpainting, etc.).
To achieve this, we are going to use an attention based, Chain of Thoughts (CoT) driven generative process, which includes an LLM coupled with a Conditional Latent Diffusion Model (in our example, we are using Qwen 2.5 Image Edit).

## Setup

In [None]:
%pip install -r requirements.txt

In [None]:
import base64
import json
import ollama
import chromadb
from utility.vllm_utils import VLLMUtils
from utility.imagellm_utils import ImageLLMUtils
from utility.vector_store_utils import VectorStoreUtils

### Ollama

We make use of [Ollama](), a local LLM orchestrator.
Feel free to experiment with other vision models of your taste ([list of available ones](https://ollama.com/search?c=vision)).

In [None]:
vllmUtils = VLLMUtils()
vllmUtils.ollama_utils.ollama_init()

### Vector Store

We make use of [ChromaDB](https://www.trychroma.com/), a lightweight and easy to set up in-memory vector store.
Documentation can be found [here](https://docs.trychroma.com/docs/overview/getting-started).

In [None]:
chromaUtils = VectorStoreUtils()

Adding two images to the vector store, for final image details enrichment.

In [None]:
documents = [
    {
        "caption":"A statue of St. Eustace, patron saint of Matera, suited in armor with a golden plume, holding a spear upright in its right hand.",
        "base64":"assets/stEustace.jpg"
    },
    {
        "caption":"A statue of St. Vitus, suited in light armor and a red cape, bringing a silver cross in is left hand, followed by two dogs of the same breed, of brown and black.",
        "base64":"assets/stVitus.jpg"
    }
]

chromaUtils.addToCollection(documents)

#### Example of querying

In [None]:
result = chromaUtils.query("This is a query document about a saint followed by dogs")
print(json.loads(result))

## Phase 1: Prospective Change

To solve this purpose, we make use of Qwen 2.5 Image Edit, an **instructive** T2I model capable of image generation, image editing and in-context image editing, over a CoT-LLM infrastructure.

### Setup

Cloning from public [Qwen Huggingface repo](https://huggingface.co/Qwen/Qwen-Image-Edit)

In [None]:
!git clone https://github.com/QwenLM/Qwen-Image.git ./models/

In [None]:
%pip install git+https://github.com/huggingface/diffusers

In [None]:
import os
from PIL import Image

In [None]:
imageLLMUtils = ImageLLMUtils()
imageLLMUtils.pipeline_init()

### Inference

#### Solid Rotation

We take a ROI of the image beforehand, including just the monument for better results for the rotation task.

In [None]:
image_full = Image.open("assets/eustachian_monument.jpeg").convert("RGB")
image_roi = image_full.crop((200, 500, 800, 1100))
image_roi 

In [None]:
rotated = imageLLMUtils.generate(
    prompt="Rotate the main subject, so it looks in front view.",
    images=[image_full],
    output_filename="examples/rotated"
)

In [None]:
rotated

## Phase 2: Inpainting

In this phase we make use of negative prompt, due to the overall high complexity and need for stability in fulfilling the task

In [None]:
if rotated is None:
    rotated = Image.open("examples/rotated_0.jpeg")
rotated_roi = rotated.crop((350, 350, 690, 800))
rotated_roi

#### Generation of a known concept

This cell goes through the generation of an iconography of Holy Mary, showcasing how a very spread concept can be successfully reproduced by the LDM in a detailed matter, without the need for Image RAG.

In [None]:
output_image = imageLLMUtils.generate(
    prompt="""Put a little statue of Holy Mary
              standing and holding a blessing Baby Jesus, inside the central niche. Be sure to match proportions.
              **Be sure to put Holy Mary below the little arc.**
              Leave the background untouched.
              Keep the features of both Holy Mary and Baby Jesus' faces well defined.
              Make the image black and white.""",
    images=[rotated_roi],
    neg_prompt="Do not enlarge nor shrink the niche. Scale the size of Holy Mary instead. Do not put Holy Mary below the big arc.",
    output_filename="examples/inpaint_holymary"
)

In [None]:
output_image

#### Image RAG

In [None]:
images = [Image.open("examples/inpaint_holymary_0.jpeg")]

# TODO: Inserire il RAG (quindi generazione -> valutazione)

output_image = imageLLMUtils.generate(
    prompt=""" Add a statue of saint Eustace to the right of the central niche, using the image of the saint as a reference.
               Match the style of the image.
               Leave the central niche untouched. Keep face features well defined. Make the image black and white. """,
    images=images,
    neg_prompt="Do not enlarge nor shrink the niche. Do not edit the background. Do not crop the image. Do not make the statue too big or too small.",
    output_filename="examples/inpaint_eustace"
)
output_image

In [None]:
images = [output_image, Image.open("assets/stVitus.jpeg")]

# TODO: Inserire il RAG (quindi generazione -> valutazione)

inpainted =imageLLMUtils(
    prompt=""" Add a statue of saint Vitus to the left of the central niche, using the image of the saint as a reference. 
               Match the image style.
               Be sure to match proportions.
               Leave the central niche untouched.
               Keep face features well defined. Make the image black and white.""",
    images=images,
    neg_prompt="Do not put saint Vitus higher or lower with respect to other statues. Do not put the statue outside the monument. Do not put the statue on the side pillars, nor on the ground.",
    output_filename="examples/inpaint_vitus"
)
inpainted

### Environment Embedding

In [None]:
if inpainted is not None:
    inpainted = Image.open("examples/inpaint_vitus_0.jpeg")
if rotated is not None:
    rotated = Image.open("example/rotated_0.jpeg") 

inpainted_roi = inpainted.crop((120, 250, 690, 1000))
inpainted_roi

In [None]:
images = [rotated, inpainted_roi]

final = imageLLMUtils.generate(
    prompt="""Put the three saints inside the frontal view of the monument.
    Generate a scene similar to the first image, including the monument and the details in it.
    Keep the image in black and white.
    Be sure to match proportions.
    Be sure to generate consistent face features.
    """,
    images = images,
    neg_prompt="Do not edit the monument. Do not change the statues features or look. Do not remove any of the statues. Do not put statues out of the monument.",
    output_filename="examples/env_refinement"
)

## Final Result

In [None]:
if final is None:
    final = Image.open("examples/env_refinement_0.jpeg")
final