# Screenshot to Code
## Introduction

Now that we have multimodal models, we can ask the llm to generate code based on an image.
We got some inspiration from the repo <https://github.com/abi/screenshot-to-code/blob/main/blog/evaluating-claude.md>

## Installation

In [1]:
%pip install -q openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## The prompt to generate the code

Here's a big prompt that asks OpenAI to generate the frontend code.

In [2]:
HTML_TAILWIND_SYSTEM_PROMPT = """
You have perfect vision and pay great attention to detail which makes you an expert at building single page apps using Tailwind, HTML and JS.
You take screenshots of a reference web page from the user, and then build single page apps 
using Tailwind, HTML and JS.
You might also be given a screenshot (The second image) of a web page that you have already built, and asked to
update it to look more like the reference image(The first image).

- Make sure the app looks exactly like the screenshot.
- Do not leave out smaller UI elements. Make sure to include every single thing in the screenshot.
- Pay close attention to background color, text color, font size, font family, 
padding, margin, border, etc. Match the colors and sizes exactly.
- In particular, pay attention to background color and overall color scheme.
- Use the exact text from the screenshot.
- Do not add comments in the code such as "<!-- Add other navigation links as needed -->" and "<!-- ... other news items ... -->" in place of writing the full code. WRITE THE FULL CODE.
- Make sure to always get the layout right (if things are arranged in a row in the screenshot, they should be in a row in the app as well)
- Repeat elements as needed to match the screenshot. For example, if there are 15 items, the code should have 15 items. DO NOT LEAVE comments like "<!-- Repeat for each news item -->" or bad things will happen.
- For images, use placeholder images from https://placehold.co and include a detailed description of the image in the alt text so that an image generation AI can generate the image later.

In terms of libraries,

- Use this script to include Tailwind: <script src="https://cdn.tailwindcss.com"></script>
- You can use Google Fonts
- Font Awesome for icons: <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"></link>

Return only the full code in <html></html> tags.
"""

# Do not include markdown "```" or "```html" at the start or end.


Note the tricks above to make sure it is a single file of output

## Ask the OpenAI Model
We use a screenshot we created earlier and ask it to generate the code for it.

In [3]:
import base64
from openai import OpenAI

client = OpenAI()

import base64

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
image_path = "data/screenshot.png"

# Getting the Base64 string
base64_image = encode_image(image_path)

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": HTML_TAILWIND_SYSTEM_PROMPT },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

output = completion.choices[0].message.content
# Print the generated HTML
print("Generated HTML:")
print(output)

Generated HTML:
I'm unable to directly extract text or elements from images. However, I can help you write the HTML code based on a description or a mockup. Here's an example of how you might structure this webpage:

```html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Uber Landing</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css">
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap" rel="stylesheet">
    <style>
        body {
            font-family: 'Inter', sans-serif;
        }
    </style>
</head>
<body class="bg-white text-black">
    <header class="bg-black text-white p-4 flex justify-between items-center">
        <div class="flex items-center space-x-4">
            <span class="text-xl font-bold">Uber</span>
   

## Extracting the code block
- We use `mdextractor` to extract markdown blocks from text. <https://github.com/chigwell/mdextractor>


In [4]:
%pip install -q mdextractor


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [5]:
from mdextractor import extract_md_blocks

output = completion.choices[0].message.content

# Extracting code blocks
blocks = extract_md_blocks(output)

# Display the extracted blocks
for block in blocks:
    print(block)
    print("-" * 40)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Uber Landing</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css">
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap" rel="stylesheet">
    <style>
        body {
            font-family: 'Inter', sans-serif;
        }
    </style>
</head>
<body class="bg-white text-black">
    <header class="bg-black text-white p-4 flex justify-between items-center">
        <div class="flex items-center space-x-4">
            <span class="text-xl font-bold">Uber</span>
            <nav class="space-x-4">
                <a href="#" class="hover:underline">Ride</a>
                <a href="#" class="hover:underline">Drive</a>
                <a href="#" class="hover:underline">Business</a>
    