# Playing around with Glide image model
> Text to image now almost tackled?

- toc: false
- branch: master
- badges: true
- comments: true
- author: Jesse van Elteren
- image: images/glideimg.png
- categories: []

[THIS POST IS WORK IN PROGRESS]

You probably know that a computer can come up with a description of an image. For example an image of a dog playing with your kids in a garden may be converted into 'dog and children in garden'.

But did you know the other way around is now also possible? You come up with your textual description and the computer renders a **new** image. As in completely new, it's not like a google search which searches existing images. Let's check it out!

OpenAI has been one of the premier organisations publishing spectacular results in the past years. They mainly take huge datasets of texts and images and train their models on it. They released [a paper](https://arxiv.org/pdf/2112.10741.pdf) on their GLIDE image model, trained on several hundred million images. It outperforms their previous already amazing DALL-E model in terms of photorealism.

They also open-sourced [a slimmed down version of their model](https://github.com/openai/glide-text2im). I played around with it by coming up with some text prompts and let the model generate 10 images for each promt. I selected a range of prompts that gives you an insights to the power, but also the limitations of the released model. Below the results, where I repeat the prompt above the images which makes scrolling on mobile easier.

In [60]:
# hide
from pathlib import Path
import imageio as iio
from PIL import Image, ImageDraw, ImageFont
import numpy as np
from functools import reduce


def longer(name):
    total = 350
    postfix = '                                    '
    l = len(name) + len(postfix)
    return postfix.join([name for _ in range(total//l)])
    
p = Path('summary')
filenames = [file.name for file in p.iterdir()]
filenames = [longer(filenames[i].replace('_1.png', '')) for i in range(0,len(filenames),10)]

images = list()
for file in p.iterdir():
    im = iio.imread(file)
    images.append(im)

In [63]:
# hide
font = ImageFont.truetype(r'C:\Windows\Fonts\arial.ttf', 20) 
allimgs = []
for i in range(0,len(images),10):
    # concat every 10 images
    imgs = np.hstack(images[i:i+10])
    # add a white box above
    white = np.full((30,2560,3), 255).astype('uint8')
    # convert to image
    text_and_imgs = Image.fromarray(np.concatenate([white, imgs]), 'RGB')
    # add text
    draw = ImageDraw.Draw(text_and_imgs)
    draw.text((3, 5),filenames[i//10], fill='blue', font=font)
    allimgs.append(text_and_imgs)
    

In [65]:
# hide_input
# concat all the text_img files
def get_concat_v(im1, im2):
    dst = Image.new('RGB', (im1.width, im1.height + im2.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (0, im1.height))
    return dst
full_img = reduce(get_concat_v, allimgs)
full_img.save('glide/summary.png')

![](glide/summary.png)

Did you also see this:
* The more complex prompts sometimes are only partially fulfilled. For example: a painting of a cat playing checkers does generate paintings of cats, but not playing checkers
* The representation sometimes is off, with certain animals you can clearly see it's not correct. 
* The model can be quite wide in it's approch. When you think of a map of a city, you probably have 1 type of map in your head. The model generates all sorts of types of maps, all believable

Some I also had a culinary adventure. I tried out 'spagetthi on a plate' but got stuff that honestly looked... awful... Turned out I misspelled it (should be spaghetti) and the corrected dish looked much better. Then I tried to make it delicious and worked out pretty nicely!

![](glide/spag.png)

Their full model is larger and also is trained on images of people. See these impressive examples from the paper:

![](glide/paper.png)

There are even more examples in [the paper](https://arxiv.org/pdf/2112.10741), check it out! And in case you can't get enough, I've got more results here.

This clearly is a disruption to the stock photo business and the next step forward.

At this point AI can generate believeable news articles including images that are completely false. Still though, many experts feel that we are currently a long way from Artifical General Intelligence and the current deep learning architectures may not get us to AGI. But it doesn't make these more narrow applications less impressive. Hope you enjoyed it!