## Part4: Assignment: From Image to Text — Understanding Visual Content with Language Models

### Objective

The goal of this assignment is to explore how visual information can be represented in textual form and how large language models (LLMs) can reason about such representations.  
You will convert a personal image into a text-based (ASCII-style) image and investigate whether a language model can understand and describe the visual content from the textual representation alone.

---

### Task Description

#### Step 1: Image Acquisition

- Capture or generate an image of yourself.
  - You may use a photograph, a self-portrait, or an AI-generated image that represents you.
- The image should clearly show a human face or upper body.

---

#### Step 2: Image-to-Text Conversion

- Convert the image into a **text-based image representation** (e.g., ASCII art).
- The conversion should:
  - Use a fixed-width (monospace) font.
  - Preserve basic visual structure such as contours, shading, or facial features.
- You may use Python with any library (e.g., Python with PIL, OpenCV).

Below is an example:

![Input Image](imgs/image.png)

---

#### Step 3: Language Model Interpretation

- Provide the text-based image as input to a language model.
- Ask the model to:
  - Describe what it sees in the text representation.
  - Infer high-level attributes (e.g., whether it looks like a face, a person, or an object).
- Record the model’s response.

In [None]:
import numpy as np
import cv2
from pathlib import Path
import os
import requests
from PIL import Image
from io import BytesIO

CHARS = '@W#$OEXC[(/?=^~_.` '


def mono(input_img, target_h=100, target_w=100):
    """
    input_img: an image of shape [h, w], which should be an numpy array
    target_h: target height of the generated str-based image.
    target_w: target width of the generated str-based image.
    
    Note:
        1) that you may not want to generate an image of an extremely large size.
        2) so implement this, you may need to resize the image using the resize function from cv2 library
        3) you may need to keep the aspect ratio when giving target_h and target_w

    return:
        a text representing an image
    """
    # Todo 1: implement your code here


# Todo 2: read your image here
url = "https://www.ruanyifeng.com/blogimg/asset/2017/bg2017121301.jpg"
im=None # this should be your image read from the file or downloaded from the Internet.

# !!!Note: the following is something you need to pay attention to:
# you may need to convert your image into an numpy array: 
# im = np.array(im)
# we first convert this to a gray image if it is an RGB image
# im = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)

# we convert this to an string art
ascii_art = mono(url, target_h=100, target_w=100)
print(ascii_art)
# after printing you should see a string similar to your input image.

In [None]:
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="!!! put your api keys here",
    api_version="2025-02-01-preview",
    azure_endpoint="https://hkust.azure-api.net"
)

response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "" # Todo 3: Put your text-based image here to test whether the LLM can understand your string art or not.
        }
    ]
)
print(response.choices[0].message.content)