Text to Image Generator using Diffusion Models

Author: Karan Sahu


Domain: Generative AI


Tech Stack: Python, Stable Diffusion, Gradio

Project Overview

This notebook implements a Text-to-Image Generator using a diffusion-based model. Users provide a natural language prompt, and the system generates a corresponding image through an interactive Gradio interface.

The project focuses on:

Understanding diffusion models

Applying prompt engineering

Building modular, readable ML code

Creating an end-to-end AI application

In [30]:
# imports and environment setup
import torch
from diffusers import StableDiffusionPipeline
from PIL import Image
import gradio as gr
from datetime import datetime

In [31]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"


DEFAULT_NEGATIVE_PROMPT = (
"blurry, low quality, distorted, extra fingers, watermark"
)


KARAN_STYLE_SUFFIX = "ultra detailed, high resolution, cinematic lighting"

In [32]:
def load_text_to_image_model():
    pipeline = StableDiffusionPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5"
    )
    pipeline.to("cpu")
    pipeline.enable_attention_slicing()
    return pipeline


In [33]:
PROMPT_TEMPLATES = {
"Portrait": "A high quality portrait of {}",
"Landscape": "A wide cinematic landscape of {}",
"Fantasy": "A fantasy digital art of {}",
"Cyberpunk": "A cyberpunk futuristic scene of {}"
}

def build_prompt(user_prompt, style, karan_mode):
    base_prompt = PROMPT_TEMPLATES[style].format(user_prompt)
    if karan_mode:
     base_prompt += f", {KARAN_STYLE_SUFFIX}"
    return base_prompt

In [34]:
model = load_text_to_image_model()


Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

safety_checker/model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

You are using a model of type clip_text_model to instantiate a model of type clip. This is not supported for all configurations of models and can yield errors.


In [35]:
@torch.no_grad()
def generate_ai_image(
    user_prompt,
    style,
    steps,
    guidance_scale,
    karan_mode
):
    
    final_prompt = build_prompt(user_prompt, style, karan_mode)
    image = model(
    prompt=final_prompt,
    negative_prompt=DEFAULT_NEGATIVE_PROMPT,
    num_inference_steps=int(steps),
    guidance_scale=float(guidance_scale)
).images[0]

    return image

In [None]:
interface = gr.Interface(
    fn=generate_ai_image,
    inputs=[
        gr.Textbox(label="Enter your prompt"),
        gr.Dropdown(
            choices=list(PROMPT_TEMPLATES.keys()),
            label="Select Art Style",
            value="Portrait"
        ),
        gr.Slider(5, 15, value=8, label="Inference Steps"),
        gr.Slider(1, 15, value=7.5, label="Guidance Scale"),
        gr.Checkbox(label="Enable Karan Mode ðŸ˜Ž")
    ],
    outputs=gr.Image(label="Generated Image"),
    title="AI Tool - Text to Image Generator",
    description="AI-powered image generation using diffusion models"
)

interface.launch(share=True)




* Running on local URL:  http://127.0.0.1:7864
* Running on public URL: https://2b8a09ddd12a32db03.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)






  0%|          | 0/8 [00:00<?, ?it/s]

Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.


  0%|          | 0/8 [00:00<?, ?it/s]



  0%|          | 0/8 [00:00<?, ?it/s]



  0%|          | 0/12 [00:00<?, ?it/s]