# Genre-Driven Storytelling from Images using PyTorch XPU backend
## Overview
This sample explores the generation of creative, genre-specific stories from images, specifically optimized for Intel hardware using the PyTorch XPU backend. 

## Workflow
It takes an image and a user-defined genre (e.g., fantasy, horror, romance, sci-fi) as input and leverages a Vision Language Model (VLM) to craft engaging narratives that are visually inspired and thematically aligned with the chosen genre.

<img width="600" alt="image" src="./assets/story-generation.png">

## Import Necessary Packages

In [2]:
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
import torch
import ipywidgets as widgets
from IPython.display import display, Image as IPImage 
from PIL import Image as PILImage 
from qwen_vl_utils import process_vision_info
import io
import os

In [3]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Story Generation Module

Using Qwen VL Model, users could generate a creative genre-specific story with minimal prompt changes.

In [5]:
def story_generation(image, genre):
    """
        Generates a creative story using Qwen 2.5 VL 3B Instruct model
        Args:
            image(PIL image): User uploaded image
            genre(str): User selected genre(eg. Fantasy, Horror, Sci-fi, etc.)
        Returns: 
            story(str): Model generated story
    """
    try:
        model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
        model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_id,
            torch_dtype=torch.bfloat16,
        )
        model = model.to("xpu")
        model.eval()
        compiled_model = torch.compile(model)
        min_pixels = 256*28*28
        max_pixels = 1280*28*28
        processor = AutoProcessor.from_pretrained(model_id,
                                                  use_fast=True,
                                                  min_pixels=min_pixels, 
                                                  max_pixels=max_pixels)
    
        messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "image": image,
                    },
                    {
                        "type": "text",
                        # "text": prompt,
                        "text": f"Generate a creative {genre} story inspired by this image. Focus on the characters (if any are visible, describe them briefly), the atmosphere of the scene, and the potential narrative that could unfold. Craft an engaging plot and ensure the story conveys a suitable moral."                    
                    },
                ],
            }
        ]
        # Preparation for inference
        text = processor.apply_chat_template(
            messages, 
            tokenize=False, 
            add_generation_prompt=True
        )
        image_inputs, video_inputs = process_vision_info(messages)
        inputs = processor(text=[text], 
                           images=[image], 
                           padding=True, 
                           return_tensors="pt")
        inputs = inputs.to("xpu")
    
        torch.xpu.empty_cache()
        # Generation of the output
        with torch.no_grad():
            generated_ids = compiled_model.generate(**inputs, 
                                           temperature=0.9,
                                           top_p=0.99,
                                           top_k=40,
                                           do_sample=True,
                                           max_new_tokens=1024)
            generated_ids_trimmed = [
                out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
            ]
            output_text = processor.batch_decode(
                generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
            )
            torch.xpu.synchronize()
        del model, processor, inputs, generated_ids, generated_ids_trimmed
        return output_text[0]
    except Exception as e:
        print("Error generating story: ", e)

## User-provided inputs

### Upload image

Users could also use sample input images from [sample-inputs](./assets/sample-inputs) folder

In [6]:
default_image_path = "./assets/sample-inputs/input1.jpg" 

image = None

uploader = widgets.FileUpload(
    accept='image/*', 
    multiple=False    
)

output_area = widgets.Output()

def _load_and_store_pil_image(source_type, data, filename_for_error_msg=None):
    """
    Loads image data into a PIL.Image object and stores it globally.
    source_type: 'path' (data is a file path) or 'bytes' (data is image byte content).
    Returns PIL.Image object on success, None on failure.
    """
    global image
    try:
        if source_type == 'path':
            pil_img = PILImage.open(data)
        elif source_type == 'bytes':
            pil_img = PILImage.open(io.BytesIO(data))
        else: 
            image = None
            return None 
        
        image = pil_img
        return pil_img
    except Exception as e:
        image = None
        print(f"Error loading image: {e}")

def display_default_image_handler():
    """Loads the default image as a PIL object, stores it, and displays it."""
    with output_area:
        output_area.clear_output() 
        if not os.path.exists(default_image_path):
            global image
            image = None
            print(f"Default image not found: {default_image_path}. Please check the path.")
            return

        pil_img = _load_and_store_pil_image('path', default_image_path)
        if pil_img:
            display(IPImage(filename=default_image_path))
            print(f"Displaying default image: {default_image_path}. Stored as PIL.")
        else:
            print(f"Error loading default PIL image from '{default_image_path}'.")

def on_upload_event_handler(change):
    """Handles file upload/clear, stores as PIL Image, and displays."""
    with output_area:
        output_area.clear_output()

        if not uploader.value: 
            print("No file uploaded. Displaying default image.")
            display_default_image_handler() 
            return

        
        uploaded_file_info = uploader.value[0]
        file_content_bytes = uploaded_file_info['content']
        file_name = uploaded_file_info['name']

        pil_img = _load_and_store_pil_image('bytes', file_content_bytes, filename_for_error_msg=file_name)
        if pil_img:
            display(IPImage(data=file_content_bytes)) 
            print(f"Displayed uploaded image: {file_name}. Stored as PIL.")
        else:
            print(f"Error processing uploaded image '{file_name}' into PIL format.")

uploader.observe(on_upload_event_handler, names='value')

print("Please upload an image file (e.g., jpg, png, gif).")
print(f"If no image is uploaded, a default image will be attempted from: {default_image_path}")
display(uploader)
display(output_area)

with output_area:
    output_area.clear_output(wait=True) 
    display_default_image_handler() 


Please upload an image file (e.g., jpg, png, gif).
If no image is uploaded, a default image will be attempted from: ./assets/sample-inputs/input1.jpg


FileUpload(value=(), accept='image/*', description='Upload')

Output()

### Select Genre for the story

Specify the genre in which user would want the VL Model to generate the story.

In [7]:
# Top genres are listed using RadioButtons function from ipywidgets
genre = widgets.RadioButtons(
    options=['Fantasy', 'Horror', 'Science Fiction', 'Thriller and Suspense', 'Romance', 'Historical fiction'],
    value='Fantasy', # Default'
    description='Genre',
    disabled=False
)
display(genre)

RadioButtons(description='Genre', options=('Fantasy', 'Horror', 'Science Fiction', 'Thriller and Suspense', 'R…

In [9]:
torch.xpu.empty_cache()

## Story Generation

This orchestrates the story generation by taking an image and a user-defined genre as input and then producing a narrative aligned with the chosen genre.

In [10]:
print(genre.value)
story = story_generation(image, genre.value)
print(story)

Thriller and Suspense


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The sun dipped low in the western sky, casting long shadows across the golden expanse of the Great Divide. Here and there, the hills and valleys were dotted with ancient oaks and sagebrush, their rusty hues blending into the landscape. In the foreground, a solitary white horse stood majestically, its coat gleaming under the fading light. The horse had eyes half-shut from the heat, ears pricked as it listened intently to the rustling grass and distant howls.

Sitting atop his steed, a figure in a wide-brimmed hat and rugged jacket surveyed the scene with a serene yet watchful gaze. He was known as Jack, a man born of the west, whose life was dedicated to the land and the animals he loved so dearly. He wore gloves, suggesting readiness for the day's tasks ahead, but he was not alone in this vast, almost surreal wilderness.

Standing just a few paces away from the horse was a dog, a sturdy, tan-furred collie with sharp, alert eyes. It sat regally in the tall, golden grass, its tail waggin

## Sample Outputs

### Fantasy Genre

In [11]:
print(genre.value)
story = story_generation(image, genre.value)
print(story)

Fantasy


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

---

In the heart of the vast, golden plains, where the sky stretches as high as the earthen hills, lived a lone cowboy named Eli, who patrolled the land with his loyal white horse, Silver. Eli was no ordinary cowboy; he was the guardian of this untamed frontier, ensuring peace between the man and the wild.

One crisp autumn morning, as the first rays of sunlight bathed the plains in a warm glow, Eli saddled up Silver and set out on his patrol. The horse, with its gleaming white coat and the soft green grass swaying beneath its hooves, felt the earth's warmth and responded with a quiet sigh.

As Eli rode, his mind wandered to the past and the future he would face today. A month ago, he had stumbled upon an old, abandoned cabin hidden deep within the woods. Inside, he found not treasure, but a collection of enchanted maps and letters, each one promising a new adventure and a piece of the world’s mysteries.

Determined to unravel the secrets these maps held, Eli left his home a few days 

### Science Fiction

In [17]:
print(genre.value)
story = story_generation(image, genre.value)
print(story)

Science Fiction


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In the year 2150, the vast, untamed wilderness of the American West was once again reclaimed from the encroaching human settlements. The landscape was vast and stark, a testament to the enduring power of nature and the resilience of those who dared to live there.

Eleanor, a seasoned livestock rancher, ventured deep into the wilderness with her white horse, Lightning. Her life on the range had been a journey fraught with challenges, but she had come to love the freedom and the sense of belonging it provided. Today, she carried a different burden—a mysterious letter that hinted at a treasure hidden deep within the forest's ancient trees. This treasure was rumored to possess mystical powers, capable of reversing environmental damage and restoring balance to the land. The letter also mentioned the presence of a legendary wolf, a creature of legend said to be wise beyond its years, whose eyes knew everything and could guide Eleanor to the treasure's location.

As Lightning galloped across 