# Playing around with GLIDE image model
> OpenAI tackles text to image

- toc: false
- branch: master
- badges: true
- comments: true
- author: Jesse van Elteren
- image: images/glideimg.png
- categories: []

You probably know that a computer can describe an image. For example an image of a dog playing with your kids may be translated into 'dog and children in garden'.

But did you know the other way around is now also possible? You come up with a text and the computer renders a **new** image. Completely new, not like a Google search which searches existing images. 

OpenAI has been one of the premier organisations publishing spectacular results in the past years. They train their models on huge datasets of texts and images. They released [a paper](https://arxiv.org/pdf/2112.10741.pdf) on their GLIDE image model, trained on several hundred million images. It outperforms their previous 'DALL-E' model in terms of photorealism.

They also open-sourced [a slimmed down version of their model](https://github.com/openai/glide-text2im). I played around with it by coming up with text prompts and let the model generate 10 images for each promt. 

Below the results. Zoom in on pc with ctrl+mousewheel or on mobile with your fingers. I repeat the text above the images keep it readable while zoomed in.

In [60]:
# hide
from pathlib import Path
import imageio as iio
from PIL import Image, ImageDraw, ImageFont
import numpy as np
from functools import reduce


def longer(name):
    # repeats name with postfix
    total = 350
    postfix = '                                    '
    l = len(name) + len(postfix)
    return postfix.join([name for _ in range(total//l)])
    
p = Path('summary')
filenames = [file.name for file in p.iterdir()]
filenames = [longer(filenames[i].replace('_1.png', '')) for i in range(0,len(filenames),10)]

images = list()
for file in p.iterdir():
    im = iio.imread(file)
    images.append(im)

In [63]:
# hide
font = ImageFont.truetype(r'C:\Windows\Fonts\arial.ttf', 20) 
allimgs = []
for i in range(0,len(images),10):
    # concat every 10 images
    imgs = np.hstack(images[i:i+10])
    # add a white box above
    white = np.full((30,2560,3), 255).astype('uint8')
    # convert to image
    text_and_imgs = Image.fromarray(np.concatenate([white, imgs]), 'RGB')
    # add text
    draw = ImageDraw.Draw(text_and_imgs)
    draw.text((3, 5),filenames[i//10], fill='blue', font=font)
    allimgs.append(text_and_imgs)
    

In [65]:
# hide_input
# concat all the text_img files
def get_concat_v(im1, im2):
    dst = Image.new('RGB', (im1.width, im1.height + im2.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (0, im1.height))
    return dst
full_img = reduce(get_concat_v, allimgs)
full_img.save('glide/summary.png')

![](glide/summary.png)

What do you think? Some things I noticed:
* The more complex prompts sometimes are only partially fulfilled. For example: a monkey looking at itself in the mirror often does not render the mirror.
* The representation sometimes is off, for example the secondright rubber ducky.
* The model can be quite wide in it's approach. When you think of a 'map of a city', you probably have 1 type of map in your head. The model generates all sorts of types of maps, all believable

I also had a culinary adventure: Tried out 'spagetthi on a plate' but got results that didn't look like somethink I'd like to consume... Turned out I misspelled it (should be spaghetti) and the corrected text looked much better. To finish it off, I tried to make it "delicious" and worked out pretty nicely, often the spaghetti get's some vegetables on top. So next time you order spaghetti in a restaurant, make sure to spell it right!

![](glide/spag.png)

The full GLIDE model is larger and also is trained on images of people. See these impressive examples from the paper:

![](glide/paper.png)

This clearly is a disruption to the stock photo business and does have a wide variety of use cases.

At this point AI can generate believeable news articles including images that are completely false. Still though, many experts feel that we are currently a long way from Artifical General Intelligence and the current deep learning architectures may not get us to AGI. 

To me, that doesn't make these 'narrow' intelligence less impressive. Hope you enjoyed it!

> Tip: There are even more examples in [the paper](https://arxiv.org/pdf/2112.10741), check it out! And in case you can't get enough, I've got [even more examples](https://github.com/jvanelteren/blog/blob/master/_notebooks/glide/full.png)