<a href="https://colab.research.google.com/github/ludwigwittgenstein2/Research/blob/master/text_to_image_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install getdaft --pre --extra-index-url https://pypi.anaconda.org/daft-nightly/simple
!pip install min-dalle torch Pillow

Looking in indexes: https://pypi.org/simple, https://pypi.anaconda.org/daft-nightly/simple


In [None]:
CI = False

In [None]:
import daft

# Flip this flag if you want to see the performance of running on CPU vs GPU
USE_GPU = False if CI else True
IO_CONFIG = daft.io.IOConfig(
    s3=daft.io.S3Config(anonymous=True, region_name="us-west-2")
)  # Use anonymous-mode for accessing AWS S3
PARQUET_PATH = "s3://daft-public-data/tutorials/laion-parquet/train-00000-of-00001-6f24a7497df494ae.parquet"

```{hint}
✨✨✨ **Run this notebook on Google Colab** ✨✨✨

You can [run this notebook yourself with Google Colab](https://colab.research.google.com/github/Eventual-Inc/Daft/blob/main/tutorials/text_to_image/text_to_image_generation.ipynb)!
```

# Generating Images from Text with DALL-E

In this tutorial, we will be using the DALL-E model to generate images from text. We will explore how to use GPUs with Daft to accelerate computations.

To run this tutorial:

1. You will need access to a GPU. If you are on Google Colab, you may switch to a GPU runtime by going to the menu `Runtime -> Change runtime type -> Hardware accelerator -> GPU -> Save`.

Let's get started!

## Setting Up

First, let's load a Parquet file into Daft. This particular file is hosted in HuggingFace at a https URL.

In [None]:
import daft

daft.context.set_runner_py(use_thread_pool=False)

parquet_df = daft.read_parquet(PARQUET_PATH, io_config=IO_CONFIG)

Let's go ahead and `.collect()` this DataFrame. This will download the Parquet file and materialize the data in memory so that all our subsequent operations will be cached!

In [None]:
parquet_df.collect()

URL Utf8,TEXT Utf8,WIDTH Float64,HEIGHT Float64,similarity Float64,punsafe Float32,pwatermark Float32,AESTHETIC_SCORE Float32,hash Int64,__index_level_0__ Int64
https://images.assetsdelivery.com/thumbnails/torsakarin/torsakarin1507/torsakarin150700324.jpg,Photo pour Japanese pagoda and old house in Kyoto at twilight - image libre de droit,450,297,0.3459470868110657,0.00054100156,0.034098506,6.5262036,1777707726169138033,396
https://images.fineartamerica.com/images/artworkimages/mediumlarge/1/soaring-peter-eades.jpg,Soaring by Peter Eades,675,900,0.3104045391082763,5.7670622e-06,0.0764483,6.636003,4231601394502896160,7984
https://assets.vg247.com/current/2014/12/far-cry-4-concept-art-5.jpg,far cry 4 concept art is the reason why it 39 s a beautiful game vg247. Black Bedroom Furniture Sets. Home Design Ideas,1600,754,0.3163999617099762,4.2637777e-05,0.49650294,6.690522,6867480114123960364,21440
http://img.scoop.it/ttXSVXiLOZzyjJ2rG9AUijl72eJkfbmt4t8yenImKBVvK0kTmF0xjctABnaLJIm9,San Pedro: One Of Mother Nature's Most Powerful Psychedelics | Ayahuasca アヤワスカ | Scoop.it,467,369,0.3088734149932861,0.00061166286,0.10270452,6.749783,2573977429828778516,30627
https://www.stocktrekimages.com/pix/simg/misc/yzv200025s_p.jpg,"YZV200025S © Stocktrek Images, Inc. 360 panorama of the Milky Way over Lago-Naki plateau, Russia.",650,308,0.332099974155426,1.2845917e-05,0.057178423,6.5293713,-2870447088762390972,31548
https://i.pinimg.com/originals/d2/72/f3/d272f3250515ae0ae4317c34afac60b7.jpg,Grace Kelly Outfits,1024,1437,0.316754013299942,0.021755934,0.1605802,6.592924,5832306745679284925,31816
https://armenianart.am/wp-content/uploads/2019/06/portrait-anush-by-artur-mkhitaryan-1t-800x1066.jpg,"Portrait - Anush, by Artur Mkhitaryan",800,1066,0.3170162737369537,1.840563e-05,0.13537237,6.8501353,-3896567211749752908,33458
https://render.fineartamerica.com/images/rendered/search/poster/images/artworkimages/medium/1/children-listen-to-a-shepherd-playing-a-flute-j-alsina.jpg,Children Listen To A Shepherd Playing A Flute Poster by J Alsina,400,341,0.3169437646865845,2.8656173e-06,0.040163245,6.833819,2734666252886784419,34136


In [None]:
parquet_df = parquet_df.select(parquet_df["URL"], parquet_df["TEXT"], parquet_df["AESTHETIC_SCORE"])

## Downloading Images

Like many datasets, instead of storing the actual images in the dataset's files it looks like the Dataset authors have instead opted to store a URL to the image.

Let's use Daft's builtin functionality to download the images and open them as PIL Images - all in just a few lines of code!

In [None]:
# Filter for images with longer descriptions
parquet_df_with_long_strings = parquet_df.where(parquet_df["TEXT"].str.length() > 50)

# Download images
images_df = parquet_df_with_long_strings.with_column(
    "image",
    parquet_df["URL"].url.download().image.decode(),
)

In [None]:
images_df.show(5)

URL Utf8,TEXT Utf8,AESTHETIC_SCORE Float32,image Image[MIXED]
https://images.assetsdelivery.com/thumbnails/torsakarin/torsakarin1507/torsakarin150700324.jpg,Photo pour Japanese pagoda and old house in Kyoto at twilight - image libre de droit,6.5262036,
https://assets.vg247.com/current/2014/12/far-cry-4-concept-art-5.jpg,far cry 4 concept art is the reason why it 39 s a beautiful game vg247. Black Bedroom Furniture Sets. Home Design Ideas,6.690522,
http://img.scoop.it/ttXSVXiLOZzyjJ2rG9AUijl72eJkfbmt4t8yenImKBVvK0kTmF0xjctABnaLJIm9,San Pedro: One Of Mother Nature's Most Powerful Psychedelics | Ayahuasca アヤワスカ | Scoop.it,6.749783,
https://www.stocktrekimages.com/pix/simg/misc/yzv200025s_p.jpg,"YZV200025S © Stocktrek Images, Inc. 360 panorama of the Milky Way over Lago-Naki plateau, Russia.",6.5293713,
https://render.fineartamerica.com/images/rendered/search/poster/images/artworkimages/medium/1/children-listen-to-a-shepherd-playing-a-flute-j-alsina.jpg,Children Listen To A Shepherd Playing A Flute Poster by J Alsina,6.833819,


Great! Now we have a pretty good idea of what our dataset looks like.

# Running the Mini DALL-E model on a GPU using Daft UDFs

Let's now run the Mini DALL-E model over the `"TEXT"` column, and generate images for those texts!

Using GPUs with Daft UDFs is simple. Just specify `num_gpus=N`, where `N` is the number of GPUs that your UDF is going to use.

In [None]:
import torch
from min_dalle import MinDalle


@daft.udf(return_dtype=daft.DataType.python())
class GenerateImageFromText:
    def __init__(self):
        self.model = MinDalle(
            models_root="./pretrained",
            dtype=torch.float32,
            # Tell the min-dalle library to load model on GPU or GPU
            device="cuda" if USE_GPU else "cpu",
            is_mega=False,
            is_reusable=True,
        )

    def __call__(self, text_col):
        return [
            self.model.generate_image(
                t,
                seed=-1,
                grid_size=1,
                is_seamless=False,
                temperature=1,
                top_k=256,
                supercondition_factor=32,
            )
            for t in text_col.to_pylist()
        ]


if USE_GPU:
    GenerateImageFromText = GenerateImageFromText.override_options(num_gpus=1)

images_df.with_column(
    "generated_image",
    GenerateImageFromText(images_df["TEXT"]),
).show(1)

URL Utf8,TEXT Utf8,AESTHETIC_SCORE Float32,image Image[MIXED],generated_image Python
https://images.assetsdelivery.com/thumbnails/torsakarin/torsakarin1507/torsakarin150700324.jpg,Photo pour Japanese pagoda and old house in Kyoto at twilight - image libre de droit,6.5262036,,


In [None]:
import torch
from min_dalle import MinDalle
import daft

# Set GPU Usage
USE_GPU = torch.cuda.is_available()

# Define UDF class
@daft.udf(return_dtype=daft.DataType.python())
class GenerateImageFromText:
    def __init__(self):
        self.model = MinDalle(
            models_root="./pretrained",
            dtype=torch.float32,
            # Load model on GPU or CPU based on availability
            device="cuda" if USE_GPU else "cpu",
            is_mega=False,  # Smaller model for quicker execution
            is_reusable=True,
        )

    def __call__(self, text_col):
        return [
            self.model.generate_image(
                text=t,
                seed=-1,
                grid_size=1,
                is_seamless=False,
                temperature=1,
                top_k=256,
                supercondition_factor=32,
            )
            for t in text_col.to_pylist()
        ]


# If GPU is available, specify GPU options for the UDF
if USE_GPU:
    GenerateImageFromText = GenerateImageFromText.override_options(num_gpus=1)

# Load images DataFrame (mock example)
images_df = daft.from_pydict({"TEXT": ["A Female Patient sitting in hospital"]})

# Generate images and add to the DataFrame
generated_df = images_df.with_column(
    "generated_image",
    GenerateImageFromText(images_df["TEXT"]),
)

# Show a single generated image as proof of concept
generated_df.show(1)


TEXT Utf8,generated_image Python
A Female Patient sitting in hospital,
