Develop a model that generates Scalable Vector Graphics (SVG) code based on a given text prompt, rendering the described image as accurately as possible.
Large Language Models (LLMs) often struggle to generate precise image-rendering code. This project aims to bridge that gap by generating SVG, a vector image format that uses XML to define two-dimensional graphics. SVGs can be scaled without quality loss, making them ideal for various applications.
The dataset consists of 500 text descriptions of everyday objects and scenes across diverse domains.
- Common objects and generic subjects (no brand names, trademarks, or personal names).
- Covers approximately a dozen categories, including landscapes, abstract art, and fashion.
- Descriptions are capped at 200 characters, with an average length of 50 characters.
train.csv
includes data from the landscape, abstract, and fashion categories.- Public and private test sets follow a similar category distribution.
Participants must submit a Model
class with a predict()
function that takes a text description as input and returns SVG code.
Performance is measured using Mean CLIP Similarity between the text description and the generated SVG image:
- Each SVG is converted into a PNG using the
cairosvg
Python library. - The PNG image is encoded into feature embeddings using a SigLIP SoVIT-400m model.
- The final score is the average cosine similarity between the text description embeddings and the corresponding image embeddings.
- The SVG file must not exceed 10,000 bytes.
- Only elements and attributes from a predefined allowlist are permitted.
- CSS styles are not allowed.
- No rasterized images or external data sources are allowed in the SVG.
- The model must return SVG output within 5 minutes of receiving a description.
- All SVGs must be generated within 9 hours.
To run the model, install the required dependencies:
pip install cairosvg torch torchvision
Ensure you have the dataset and then run the model:
from model import Model
model = Model()
text_description = "A simple mountain landscape with a rising sun."
svg_output = model.predict(text_description)
print(svg_output) # Outputs SVG code