## Part4: Assignment: From Image to Text — Understanding Visual Content with Language Models

### Objective

The goal of this assignment is to explore how visual information can be represented in textual form and how large language models (LLMs) can reason about such representations.  
You will convert a personal image into a text-based (ASCII-style) image and investigate whether a language model can understand and describe the visual content from the textual representation alone.

---

### Task Description

#### Step 1: Image Acquisition

- Capture or generate an image of yourself.
  - You may use a photograph, a self-portrait, or an AI-generated image that represents you.
- The image should clearly show a human face or upper body.

---

#### Step 2: Image-to-Text Conversion

- Convert the image into a **text-based image representation** (e.g., ASCII art).
- The conversion should:
  - Use a fixed-width (monospace) font.
  - Preserve basic visual structure such as contours, shading, or facial features.
- You may use Python with any library (e.g., Python with PIL, OpenCV).

Below is an example:

![Input Image](imgs/image.png)

---

#### Step 3: Language Model Interpretation

- Provide the text-based image as input to a language model.
- Ask the model to:
  - Describe what it sees in the text representation.
  - Infer high-level attributes (e.g., whether it looks like a face, a person, or an object).
- Record the model’s response.

In [16]:
import numpy as np
import cv2
from pathlib import Path
import os
import requests
from PIL import Image
from io import BytesIO

CHARS = '@W#$OEXC[(/?=^~_.` '


def mono(input_img, target_h=100, target_w=100):
    """
    input_img: an image of shape [h, w], which should be an numpy array
    target_h: target height of the generated str-based image.
    target_w: target width of the generated str-based image.
    
    Note:
        1) that you may not want to generate an image of an extremely large size.
        2) so implement this, you may need to resize the image using the resize function from cv2 library
        3) you may need to keep the aspect ratio when giving target_h and target_w

    return:
        a text representing an image
    """
    # Todo 1: implement your code here
    # Get original dimensions
    h, w = input_img.shape
    
    # Calculate aspect ratio and resize while maintaining aspect ratio
    aspect_ratio = w / h
    if aspect_ratio > target_w / target_h:
        # Width is the constraint
        new_w = target_w
        new_h = int(target_w / aspect_ratio)
    else:
        # Height is the constraint
        new_h = target_h
        new_w = int(target_h * aspect_ratio)
    
    # Resize the image
    resized_img = cv2.resize(input_img, (new_w, new_h))

    # Equalize histogram to improve contrast
    resized_img = cv2.equalizeHist(resized_img)
    
    # Normalize pixel values to 0-255 range
    resized_img = (resized_img - resized_img.min()) / (resized_img.max() - resized_img.min() + 1e-5) * 255
    resized_img = resized_img.astype(np.uint8)
    
    # Map pixel values to characters
    ascii_art = ""
    for row in resized_img:
        for pixel in row:
            # Map pixel brightness (0-255) to character index (0 to len(CHARS)-1)
            char_idx = int((pixel / 255) * (len(CHARS) - 1))
            ascii_art += CHARS[char_idx]
        ascii_art += "\n"
    
    return ascii_art


# Todo 2: read your image here
url = "https://github.com/hacker-is-undefeatable/ISDN3150_Week2_Assignment/blob/main/imgs/459838642_879823690765993_7978852413075077763_n.jpg?raw=true"

# Download and read the image from URL
response = requests.get(url)
im = Image.open(BytesIO(response.content))

# Convert PIL image to numpy array
im = np.array(im)

# Convert to grayscale if it is an RGB/RGBA image
if len(im.shape) == 3:
    im = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)

# Convert this to string art
ascii_art = mono(im, target_h=50, target_w=100)
print(ascii_art)

@@@@@WW$_____.....__~~~~^~~(~^^C`~#E[XOO$@@@@@@@@@
@@@@@WW(__......___~~^~?==??~_^/=~_^XEOEWW@@@@@@@@
@@@@@WW~~__...._____~~?=~~^^~/~~X~~_=(OOCC#@@@@@@@
@@@@@WW~~~__._.____~.=^^_~^~=X~^~CX^~[CO$OOX@@@@@@
@@@@@WW~~~___.__...____~^==^/E_[[^CX(~XXECE=@@@@@@
@@@@@WO~^^~__/__..__.__~_~/^==^C(=#[?(^EEEXO@@@@@@
@@@@@@W~~^~~~^^....__~__~(/?[(=$[X=EE(~/EEWE#@@@@@
@@@@@WW[~^[^~(______^^==~^/(?C?XXC(##X[?/E$O$#@@@@
@@@@@WW#^=[~~~___~COE~^=~^^^=[?XOCX[O#X^/XXE#W#@@@
@@@@@WWW=^/^~~~~~^[^=$W$C=?(XCC?$EXE$#$C^(EEW$##@@
@@@@@WWW~^^^^=~~~^=^OWW/[==/X(XX#XEXC$#X^/[[XEW#@@
@@@@WW@O^^=/[??^^=/[=^~=__=CCO(X#$CXXC##/~CX#WOWW@
@@@@W@WO^^?EO/[=~~==^^~___=[EXEEO$EC[E##$^(CCW$WW@
@@@@@W@C~^[(=^C(~_~__~~___~[CEOO$WOXC/C$$/~XXW#@@#
@@@WWWWC~?[E#W@~~_~______~~/OXE#[[#XC/?(EX^CXC@WW[
WWW@WW#C=^?#O(=~~_~~_____~~=CO#$E#W$C??=/E=(XC$O#(
WWW@WWW[=?/O/=^~~~~^~____~^?XX#$$WWOOC?==C?^EXO$OC
WWW@W#W(??[X=^~~~~^^=~~~~^=?[(OW$$$$CC==^(O=XEC$X@
WWWWWWE=//C[?~~~~~^=X~~~~^=([X$WWCW$C(=^^/CEEEEE/@
WWWWWW$C((X(E~~~=CO$#~~~^=?[CO#

In [18]:
from openai import AzureOpenAI
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2024-10-21",
    azure_endpoint="https://hkust.azure-api.net"
)

# Todo 3: Put your text-based image here to test whether the LLM can understand your string art or not.
prompt_content = f"""Please analyze the following text-based image representation and describe what you see:

{ascii_art}

Based on this ASCII art representation, can you:
1. Describe what you observe in the text-based image
2. Infer what type of object or subject this represents (e.g., face, person, landscape)
3. Identify any visible features or characteristics"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can analyze text-based image representations and understand visual content from ASCII art."},
        {
            "role": "user",
            "content": prompt_content
        }
    ]
)
print(response.choices[0].message.content)

1. **Description of the Observation**:
   The ASCII art is quite complex and appears to be a dense arrangement of various characters, predominantly "@" and "W", with other symbols including $, #, C, O, E, and X. The overall composition has a structured form, implying a recognizable pattern or image. There are sections that seem to flow or wave with a certain rhythm, creating a dynamic visual.

2. **Inference of the Object or Subject**:
   This text-based image appears to represent a vaguely organic or abstract form, possibly resembling a plant-like structure, landscape, or a stylized depiction of a wave or movement of water. The characters seem to create a sense of fluidity and variation in shape, indicating a natural element rather than something mechanical or geometric.

3. **Visible Features or Characteristics**:
   - The presence of various symbols suggests different shades or depths, resembling a textured surface.
   - The use of "@" could indicate a solid or foundational aspect o