# Stable Diffusion generation

In [1]:
%load_ext autoreload
%autoreload 2
from matplotlib import pylab
from smlai import *

# Stable Diffusion
Generating photo-realistic images given any text input. In short, we provide a prompt which is a text description of a subject that we wish to create using the diffusion model. A prompt can be as simple as a single line of vague text or several lines of text, depending on how detailed we want the output to be. 


## Let's paint some landscapes 

In [2]:
# loading a trained model
model = load_model('jzli/DreamShaper-3.3-baked-vae')

unet/diffusion_pytorch_model.safetensors not found


In [3]:
generate(
    model, 
    "self", 
    prompt="a man standing on top of a lush green field next to a mountain covered \
    in clouds and a giant mountain in the background, landscape", 
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

### Negative Prompt
Stable Diffusion allows us to input negative prompts to let the model know the elements we want to remove from the final output. For instance, lets remove the anime, cartoon effects and also make it high res so our negative prompt could be as simple as "cartoon, anime, sketches, lowres" 

In [4]:
generate(
    model, 
    prompt="a man standing on top of a lush green field next to a mountain covered \
    in clouds and a giant mountain in the background, landscape", 
    negative_prompt="cartoon, anime, sketches, lowres" ,
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

## Fantasy architecture

In [5]:
model = load_model("stablediffusionapi/deliberate-v2")

unet/diffusion_pytorch_model.safetensors not found


### Detailed prompts

In [6]:
generate(
    model, 
    prompt="building, gothic in gothbuilding style, goth, horror, creepy, no humans, tree, scenery, \
            outdoors, fog, window, sky, forest, nature, cloud, house, bare tree",
    negative_prompt="[ng_deepnegative_v1_75t], (easynegative:1.1), NSFW, text, error, cropped, \
            worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, \
            username, blurry, out of focus, censorship, [out of frame], artist name, sketch, \
            comic, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), \
            ((grayscale)), out of the image",
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

Wait! these dont look very much like goth buildings? Stable diffusion should do better than this.

### Load a textual inversion model
Textual inversion involves training the Stable Diffusion model to better recreate a set of image's distinct features when generating from the same model by functionally creating a brand new word token for the model.
Using a [Goth Building textual inversion model](https://civitai.com/models/27912/goth-building-style-lord-of-the-rings-style-house-castle-or-landscape-gothbuilding?modelVersionId=33450) from civit ai

Load models from [civitai](https://civitai.com/), a hobbithouse finetuned checkpoint.
- Step 1: Download the textual inversion checkpoint
- Step 2: Search and find the base model on [huggingface-hub models](https://huggingface.co/models). In this case, the base model is ```stablediffusionapi/deliberate-v2```
- Step 3: Load with the mentioned base model
- Step 4: Load textual inversion

In [16]:
model = load_model("stablediffusionapi/deliberate-v2")

text_encoder/model.safetensors not found


In [7]:
# load pre-downloaded goth textual inversion checkpoint
model.load_textual_inversion('textual_inversion/deliberate_v2/gothbuilding.pt')

In [8]:
generate(
    model, 
    prompt="building, gothic in gothbuilding style, goth, horror, creepy, no humans, tree, scenery, outdoors, \
        fog, window, sky, forest, nature, cloud, house, bare tree",
    negative_prompt="[ng_deepnegative_v1_75t], (easynegative:1.1), NSFW, text, error, cropped, worst quality, \
        low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, out of focus, \
        censorship, [out of frame], artist name, sketch, comic, (worst quality:2), (low quality:2), \
        (normal quality:2), lowres, ((monochrome)), ((grayscale)), out of the image",
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

### A hobbit house textual inversion model
A fantasy cottage in the style of Lord of The Rings or a [Hobbithouse](https://civitai.com/models/18978/better-hobbit-house-fantasy-cottage-in-the-style-of-lord-of-the-rings)
Lets add hobbithouse to our stable diffusion model.

In [9]:
# load pre-downloaded hobbithouse textual inversion model
model.load_textual_inversion('textual_inversion/deliberate_v2/hobbithouse.pt')

In [10]:
generate(model, 
         prompt="(Vector image:1.3) of (Award-winning photograph), (one hobbithouse-37500 in the middle of the \
             forest), (Low-angle perspective), (natural lighting), (Wide-angle lens capturing scenery), \
             hidden objects games, video game concept art, (8K Unity wallpaper), fine details, award-winning \
             image, highly detailed, 16k, cinematic perspective, ((video game environment concept art style)), \
             pretty colors, cinematic environment,(Flat style:1.3), Illustration, Behance", 
         negative_prompt="ng_deepnegative_v1_75t, ugly, duplication, duplicates, mutilation, deformed, \
             out of frame, grainy, blurred, blurry, writing, calligraphy, signature, text, watermark, bad art, \
             neg_facelift512, worst quality, low quality, medium quality, deleted, lowres, comic,(Watermark:1.5),\
             (Text:1.3), watermark, signature, frame,watermark,signature",
         N=3)

Token indices sequence length is longer than the specified maximum sequence length for this model (121 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['- winning image, highly detailed, 1 6 k, cinematic perspective, ( ( video game environment concept art style ) ), pretty colors, cinematic environment, ( flat style : 1. 3 ), illustration, behance']


  0%|          | 0/50 [00:00<?, ?it/s]

## Fantasy Art
Painting some fantasy art with pre-trained models
Model loaded from civit ai [VinteProtogenMix](https://civitai.com/models/5657/vinteprotogenmix?modelVersionId=23690)

In [None]:
model = load_civit_ckpt('./vinteprotogenmix_V20.safetensors', model_name='vinteprotogenmix_V20')

In [11]:
generate(
    model,
    prompt='a bird sitting on a branch with pink flowers, cgi art, red-yellow colors, hyperrealistic sparrows, pinterest, photorealistic artstyle, in rich color, an ai generated image, very detailed portrait, singing for you, breathtaking render',
    negative_prompt='ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, \
        extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, \
        signature, cut off, draft,',
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

In [12]:
generate(
    model,
    prompt='a living room filled with lots of furniture and decor, inspired by Josephine Wall, instagram contest \
        winner, vibrant autumn colors, string lights, underground room, light - brown wall, cushions, arab inspired, \
        cramped new york apartment, by Daniel Ljunggren, cottagecore hippie, thisset style',
    negative_prompt='ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, \
        extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, \
        signature, cut off, draft,',
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

In [15]:
generate(
    model,
    prompt='a man dressed as a bull with a machine gun, litrpg novel cover, character art. sci-fi. cover art, \
        male soldier in the forest, discord profile picture, ultra realistic photography, zulu, fractal ceramic \
        armor, battle ready, 2 k aesthetic, 2077, biopunk, photoreal details',
    negative_prompt='ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, \
        extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, \
        signature, cut off, draft,',
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

In [16]:
# changing the seed to reproduce results
generate(
    model,
    prompt='a man dressed as a bull with a machine gun, litrpg novel cover, character art. sci-fi. cover art, \
        male soldier in the forest, discord profile picture, ultra realistic photography, zulu, fractal ceramic \
        armor, battle ready, 2 k aesthetic, 2077, biopunk, photoreal details',
    negative_prompt='ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, \
        extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, \
        signature, cut off, draft,',
    seed=932813817,
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

In [19]:
# changing the seed to reproduce results
generate(
    model,
    prompt='a dog with glasses and a chain around its neck, radiant nebula colors, vfx movie, marvel art, \
        highly photographic render, pitbull, illustration iridescent, heavy metal artwork, his head covered in \
        jewels, similar to the bifrost, conceptart. com, avatar image, genie, love death and robots',
    negative_prompt='ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, \
        extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, \
        signature, cut off, draft,',
    seed=4288483519,
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

## Cars
Some aesthetic cars, using another checkpoint from civit-ai for [Battle Cars](https://civitai.com/models/35438/battle-cars)

Steps to convert a ```lora``` checkpoint:
- Download the checkpoint from civit ai.
- Find the name of base model mentioned in the image generation data. Search and download from [huggingface-hub models](https://huggingface.co/models).
- Convert to diffusers format and load using the following cell.

In [None]:
# convert a civit ai lora safetensor model
# convert_civit_lora_safetensors_to_diffusers(
#     'abyssorangemix3AOM3_aom3a1b.safetensors', # base model
#     'battleCars_v2.safetensors',  # checkpoint
#     'battleCars_v2'  # name of the dir
# )


# load your model from the dir you saved in
# model = load_model('battleCars_v2')

In [20]:
# We have already converted and saved the battle-cars checkpoint
model = load_model("makaveli10/battleCars_v2")

unet/diffusion_pytorch_model.safetensors not found


In [22]:
generate(model, 
         "(masterpiece, best quality:1.1), ultra-detailed, (battlecar:1.1), (humvee:1.08), (painted glossy red:1.05), vehicle focus, no humans, car, wheel, tire, debris, splash, sparks, electricity, glowing, water, dirty", 
         negative_prompt="(worst quality, low quality:1.3), monochrome, blurry, license plate, english text, lowres, low detail, artist name, signature, watermark",
         N=3, 
)

  0%|          | 0/50 [00:00<?, ?it/s]

In [23]:
generate(model, 
         "(masterpiece, best quality:1.1), ultra-detailed, (battlecar:1.1), (limousine:1.08), (painted turquoise, colorful:1.05), vehicle focus, no humans, car, wheel, tire, debris, splash, water, dirty", 
         negative_prompt="(worst quality, low quality:1.3), monochrome, blurry, license plate, english text, lowres, low detail, artist name, signature, watermark",
         N=3, 
)

  0%|          | 0/50 [00:00<?, ?it/s]

## Lowkey and smooth paintings
Loading the ```lora``` checkpoint from civit-ai [LowRA model](https://civitai.com/models/48139/lowra?modelVersionId=52753), this checkpoint uses ```stablediffusionapi/deliberate-v2``` as the base model which is mentioned on the civit ai model information. So, lets' convert this checkpoint to diffusers and generate some cool images:
- Download the ```lora``` checkpoint from civit-ai.
- convert civit-ai ```lora``` checkpoint to diffusers.

In [None]:
# converting the checkpoint to diffusers
model = convert_civit_lora_safetensors_to_diffusers(
    'stablediffusionapi/deliberate-v2',  # base model, a huggingface model
    'lowra_v10.safetensors',             # lora checkpoint from civit ai
    'lowra',                             # name of the dir to save the converted diffuser model
    from_ckpt=False)                     # true if the base model is also a checkpoint

In [28]:
# loading the converted model
model = load_model('lowra')

In [29]:
generate(
    model,
    "<lora:lowra_v10:0.8> inconceivable and spectacular a scene of emergence of a figure from the glowing cloud, fractal nebula threads, cosmic entities, celestial, cosmic, vibrant and vivid, swirls, twirling, unrealistic, high contrast, symbolism, magical, mystical, mystifying, hyperrealistic, oversaturate, mashrooms",
    negative_prompt="(deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation",
    N=3)

Token indices sequence length is longer than the specified maximum sequence length for this model (80 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: [', mashrooms']


  0%|          | 0/50 [00:00<?, ?it/s]

In [30]:
generate(
    model,
    "<lora:lowra_v10:0.8> 250 gto, motion shot, dark theme, (hdr:1.2)",
    negative_prompt="(deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation",
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.


In [31]:
generate(
    model,
    "<lora:lowra_v10:0.6> dark an old chieftainman, (astronaut outfit:1.2), feathers headdress, medium shot",
    negative_prompt="(deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, \
        missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, \
        ugly, disgusting, blurry, amputation",
    N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

## Animated Art
Using a trained checkpoint from civit-ai - [526mix-animated](https://civitai.com/models/35893/526mix-animated)
- Download the checkpoint.
- And load using ```load_civit_checkpoint```

In [8]:
model = load_civit_ckpt('526mixAnimated_v1.safetensors', '526Mix-Animated')

global_step key not found in model


Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weigh

In [10]:
generate(model,
        "closeup (1980's dark sci-fi anime style)+++ futuristic black ops soldier aiming an energy rifle in battle, background smoke, combat stance, epic composition, particles, action scene",
        negative_prompt="desaturated---, pixelated---",
        N=3)

  0%|          | 0/50 [00:00<?, ?it/s]

In [12]:
generate(model,
        "2000's realistic anime style closeup shot of a middle-aged gruff detective eating ramen at a street \
        vendor in the rain, dark, after hours, cyberpunk city raining in background, vivid colors, neon signs",
        negative_prompt="umbrealla, umbreallas",
        N=3)

  0%|          | 0/50 [00:00<?, ?it/s]