figment-fusion-logo.svg

# Figment Fusion

`v1.0.0-beta.2`

*This is an early access version of Figment Fusion. You are welcome to provide feedback and contribute to the project on [GitHub](https://github.com/rlaneth/figment-fusion).*

---

Hello there! 👋

You've arrived at Figment Fusion, a notebook powered by the [Diffusers](https://github.com/huggingface/diffusers) library and pre-configured for the [Stable Diffusion](https://stability.ai/blog/stable-diffusion-announcement) and [Waifu Diffusion](https://huggingface.co/hakurei/waifu-diffusion) models. If you are a technology enthusiast and eager to get started with AI image generation, then Figment Fusion is for you.

The majority of the steps should be simple and fun to explore. If you encounter any issues, please consult the documentation on the [wiki](https://github.com/rlaneth/figment-fusion/wiki) or [open an issue](https://github.com/rlaneth/figment-fusion/wiki).

## 🌱 Getting Started

The steps in this notebook should be completed in the order they appear. Executing them in the wrong order might result in errors or other unexpected behavior.

The following code snippet displays the GPUs available on your current runtime, assisting you in determining whether you are working in an appropriate environment.

In [None]:
!nvidia-smi -L

If you are running this notebook on [Google Colaboratory](https://colab.research.google.com/), you might be provided with either a NVIDIA Tesla T4 or a P100, two of the most frequently seen of the GPU models made available through the service. As according to the [Google Colab FAQ](https://research.google.com/colaboratory/faq.html), the available GPU types may vary over time and access to resources is never guaranteed.

**Note:** previous versions of this notebook incorrectly stated that the Tesla P100 was faster and offered more VRAM than the T4. Both provide the same amount of VRAM, and data suggests that the T4 may be slightly faster than the P100 when generating images with Figment Fusion.

## ⚙️ Settings

In [None]:
#@title Model Selection


MODEL_MAP = {
    'Stable Diffusion v1.1': ['CompVis/stable-diffusion-v1-1', 'main'],
    'Stable Diffusion v1.2': ['CompVis/stable-diffusion-v1-2', 'main'],
    'Stable Diffusion v1.3': ['CompVis/stable-diffusion-v1-3', 'main'],
    'Stable Diffusion v1.4': ['CompVis/stable-diffusion-v1-4', 'main'],
    'Stable Diffusion v1.1 | FP16': ['CompVis/stable-diffusion-v1-1', 'fp16'],
    'Stable Diffusion v1.2 | FP16': ['CompVis/stable-diffusion-v1-2', 'fp16'],
    'Stable Diffusion v1.3 | FP16': ['CompVis/stable-diffusion-v1-3', 'fp16'],
    'Stable Diffusion v1.4 | FP16': ['CompVis/stable-diffusion-v1-4', 'fp16'],
    'Waifu Diffusion v1.2': ['hakurei/waifu-diffusion', '1692f03d5c0ab460f036adfc99a2442e8b046c12'],
    'Waifu Diffusion v1.3': ['hakurei/waifu-diffusion', '2dff9dab944d77470b545a1bbe86fffb08d54912'],
    'Waifu Diffusion v1.2 | FP16': ['hakurei/waifu-diffusion', '44d0d4d72a028ff9ac4e97aed4af966e4e6d90d8'],
    'Waifu Diffusion v1.3 | FP16': ['hakurei/waifu-diffusion', '342d18da939534326d8571b7a8b5195f43800db6']
}

#@markdown You can choose which model to use for image generation here. Consider the following
#@markdown points for making your choice.

#@markdown - Half precision (`fp16`) variants of each model require less storage space (around
#@markdown   2.6 GB) compared to the full precision versions (around 5.2 GB), run faster and consume
#@markdown   less [VRAM](https://www.techtarget.com/searchstorage/definition/video-RAM).
#@markdown - Older versions of Stable Diffusion are supported, but the latest one is generally
#@markdown    better.
#@markdown - [Waifu Diffusion](https://huggingface.co/hakurei/waifu-diffusion) is a model based on
#@markdown   Stable Diffusion v1.4 and fine-tuned for anime style art.

#@markdown To download each model, you must have an account on the Hugging Face website and accept
#@markdown the terms on the page of the repository that correponds to the model you've chosen. For
#@markdown example, in order to retrieve the Stable Diffusion v1.4 model (and its `fp16` variant),
#@markdown you are required to accept the terms on
#@markdown [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4).

use_model = "Stable Diffusion v1.4 | FP16" #@param ["Stable Diffusion v1.1", "Stable Diffusion v1.2", "Stable Diffusion v1.3", "Stable Diffusion v1.4", "Stable Diffusion v1.1 | FP16", "Stable Diffusion v1.2 | FP16", "Stable Diffusion v1.3 | FP16", "Stable Diffusion v1.4 | FP16", "Waifu Diffusion v1.2", "Waifu Diffusion v1.3",  "Waifu Diffusion v1.2 | FP16", "Waifu Diffusion v1.3 | FP16"]

model_select = MODEL_MAP[use_model]
model_fp16 = 'FP16' in use_model
model_repository = model_select[0]
model_revision = model_select[1]

In [None]:
#@title Concepts { vertical-output: true }

#@markdown Here you can specify [textual inversion](https://huggingface.co/docs/diffusers/training/text_inversion)
#@markdown concepts available on Hugging Face to be loaded. Please note that if you do so, the
#@markdown _Hugging Face_ step under _Permissions_ will be required.

from IPython.display import Markdown

#@markdown If you wish to load more than one concept, you may either run this step once providing a
#@markdown comma-separated list of repository IDs to `concepts_repo_id`, or run it more than once
#@markdown providing one or more IDs at each time to build up a list.
concepts_repo_id = '' #@param { "type": "string" }

#@markdown To clear the existing list of concepts to be loaded, run with the
#@markdown `concepts_reset_repo_id_list` option enabled.
concepts_reset_repo_id_list = False #@param { "type": "boolean" }

#@markdown **Hint:** to find concepts, try the
#@markdown [Stable Diffusion concepts library](https://huggingface.co/sd-concepts-library)
#@markdown community.

if not 'concepts_repo_id_list' in locals() or concepts_reset_repo_id_list:
  concepts_repo_id_list = []

if concepts_repo_id != '':
  concepts_repo_id_list_create = [item.strip() for item in concepts_repo_id.split(',')]
  concepts_repo_id_list.extend(concepts_repo_id_list_create)

if len(concepts_repo_id_list):
  display(Markdown('**The concepts from the repositories with the following IDs will be loaded**'))
  for concept_repo_id in concepts_repo_id_list:
    print(f'- {concept_repo_id}')

In [None]:
#@title Safety Checker

#@markdown By default, if the generation model creates content which is determined to be unsafe
#@markdown (e.g. sexually explicit), it will not be displayed or saved, being replaced with a black
#@markdown image instead. Use this parameter to toggle this feature.

enable_safety_checker = True #@param { "type": "boolean" }

In [None]:
#@title Storage

#@markdown The value of `storage_mount_path` is set to the the mount point of Google Drive on the
#@markdown Google Colab environment. If you are running on a different environment, you should
#@markdown change this parameter to specify a suitable location for data storage (e.g. the root
#@markdown of a persistent volume mounted to a virtual machine).
storage_mount_path = '/content/drive/MyDrive' #@param { "type": "string" }

#@markdown The `storage_data_path` parameter specifies a directory to store all data related to
#@markdown Figment Fusion, including the model cache and generated images. It is relative to
#@markdown `storage_mount_path`, meaning those two values are concatenated to obtain the full path.
storage_data_path = '/Colab Data/Figment Fusion' #@param { "type": "string" }

storage_output_path = '/output'
storage_cache_path = '/cache'

storage_data_full_path = storage_mount_path + storage_data_path
storage_output_full_path = storage_data_full_path + storage_output_path
storage_cache_full_path = storage_data_full_path + storage_cache_path

## 👮 Permissions

In [None]:
#@title Google Drive { vertical-output: true }

#@markdown If you are running on Google Colab, the notebook requires permission to access your
#@markdown Google Drive. For other environments, you should set `storage_mount_path` accordingly and
#@markdown ignore this step.

try:
  from pathlib import Path
  from google.colab import drive
  drive_mount_path = storage_mount_path.rstrip('MyDrive')
  Path(drive_mount_path).mkdir(parents=True, exist_ok=True)
  drive.mount(drive_mount_path, force_remount=True)
except ImportError:
  print('Not running on Google Colab')

In [None]:
#@title Hugging Face { vertical-output: true }

#@markdown This step requires an access token to your Hugging Face account, which will be used to
#@markdown retrieve the selected generation model. You can safely ignore it if you have already
#@markdown downloaded the model before and wish for it to be loaded from the cache.

!pip install --quiet huggingface_hub
!git config --global credential.helper store

try:
  from google.colab import output
  output.enable_custom_widget_manager()
except ImportError:
  pass

from huggingface_hub import notebook_login
notebook_login()

## 📦 Requirements

The cells under the _Requirements_ section should not require your interaction (unless something goes wrong). You can simply run all of these steps and, if no errors occur, you will be ready to continue.

In [None]:
#@title Initialization { vertical-output: true }
#@markdown Installs packages, imports modules and defines constants and variables required by the
#@markdown following steps.

# Packages
!pip install --quiet diffusers==0.4.1 transformers scipy ftfy 'ipywidgets>=7,<8'

# Imports
import json
import pprint
import time
import torch
from diffusers import DDIMScheduler, LMSDiscreteScheduler, PNDMScheduler, StableDiffusionPipeline
from huggingface_hub import hf_hub_download, notebook_login, whoami
from pathlib import Path
from PIL import Image
from transformers import CLIPTextModel, CLIPTokenizer

# Constants
STORAGE_META_PATH = '/meta'
FF_DISPLAY_NAME = 'Figment Fusion'
FF_VERSION = 'v1.0.0-beta.2'

# Create paths
Path(storage_output_full_path).mkdir(parents=True, exist_ok=True)
Path(storage_cache_full_path).mkdir(parents=True, exist_ok=True)

# Variables
if model_fp16:
  torch_dtype = torch.float16
else:
  torch_dtype = torch.float32

try:
  whoami()
  local_files_only = False
except: 
  local_files_only = True

# Output
display(
  Markdown('- **Model precision:** {}'.format('fp16' if torch_dtype == torch.float16 else 'fp32')),
  Markdown('- **Hugging Face token:** {}'.format('not present' if local_files_only else 'present'))
)

In [None]:
#@title Concepts { vertical-output: true }

#@markdown Downloads the concept files from the repositories specified in the _Concepts_ step under
#@markdown _Settings_.

concepts_required_files = ['learned_embeds.bin', 'token_identifier.txt']
concepts_meta = []

for repo_id in concepts_repo_id_list:
  meta_files = {}
  for filename in concepts_required_files:
    meta_files[filename] = hf_hub_download(
      repo_id=repo_id,
      filename=filename,
      cache_dir=storage_cache_full_path,
      local_files_only=local_files_only,
      use_auth_token=use_auth_token
    )
  concepts_meta.append({
    'name': repo_id,
    'files': meta_files
  })

In [None]:
#@title Helpers
#@markdown Defines auxiliary functions and classes required by the following steps.

def load_concept(tokenizer, text_encoder, path):
  concept = torch.load(path, map_location='cpu')

  trained_token = list(concept.keys())[0]
  embeds = concept[trained_token]

  dtype = text_encoder.get_input_embeddings().weight.dtype
  embeds.to(dtype)

  added_tokens = tokenizer.add_tokens(trained_token)
  if added_tokens == 0:
    raise ValueError(f'The tokenizer already contains the token {trained_token}')

  text_encoder.resize_token_embeddings(len(tokenizer))

  token_id = tokenizer.convert_tokens_to_ids(trained_token)
  text_encoder.get_input_embeddings().weight.data[token_id] = embeds

  return trained_token

def get_output_meta(batch_meta):
  return {
    'app': FF_DISPLAY_NAME,
    'version': FF_VERSION,
    'model': use_model,
    'batch': batch_meta
  }

def save_images(base_path, images, current_batch, meta):
  meta_path = '{}/{}/{}.json'.format(base_path, STORAGE_META_PATH, current_batch)
  meta_detailed = get_output_meta(meta)
  meta_serialized = json.dumps(meta_detailed, indent=2)
  with open(meta_path, 'w') as f:
    f.write(meta_serialized)  
  for i, image in enumerate(images):
    image_path = '{}/{}-{}.png'.format(base_path, current_batch, i + 1)
    image.save(image_path)

def display_images_as_grid(images):
  columns = len(images)
  width, height = images[0].size
  grid = Image.new('RGB', size=(columns * width, height))
  grid_width, grid_height = grid.size
  for i, image in enumerate(images):
    grid.paste(image, box=(i % columns * width, i // columns * height))
  display(grid)

def display_images(images, as_grid = False):
  if as_grid:
    display_images_as_grid(images)
    return
  for image in images:
    display(image)

def display_meta(meta):
  pprint.pprint(meta)

class Scheduler():
  @property
  def scheduler(self):
    return self.__scheduler

  def __init__(self, name, *args, **kwargs):
    self.name = name
    self.beta_start = kwargs.get('beta_start', 0.00085)
    self.beta_end = kwargs.get('beta_end', 0.012)
    self.beta_schedule = kwargs.get('beta_schedule', 'scaled_linear')
    self.ddim_clip_sample = kwargs.get('ddim_clip_sample', False)
    self.ddim_set_alpha_to_one = kwargs.get('ddim_set_alpha_to_one', False)
    self.ddim_eta = kwargs.get('ddim_eta', 0)

    scheduler_class = PNDMScheduler

    if name == 'DDIM':
      self.__scheduler = DDIMScheduler(
        beta_start=self.beta_start,
        beta_end=self.beta_end,
        beta_schedule=self.beta_schedule,
        clip_sample=self.ddim_clip_sample,
        set_alpha_to_one=self.ddim_set_alpha_to_one
      )
      return

    if name == 'LMS':
      scheduler_class = LMSDiscreteScheduler

    self.__scheduler = scheduler_class(
      beta_start=self.beta_start,
      beta_end=self.beta_end,
      beta_schedule=self.beta_schedule
    )

class Generator():
  def __init__(self, pipe, *args, **kwargs):
    self.pipe = pipe
    self.width = kwargs.get('width', 512)
    self.height = kwargs.get('height', 512)
    self.num_inference_steps = kwargs.get('num_inference_steps', 50)
    self.guidance_scale = kwargs.get('guidance_scale', 7.5)
    self.manual_seed = kwargs.get('manual_seed', None)
    self.__generator = torch.Generator('cuda')
    self.__scheduler = kwargs.get('scheduler', None)
    if self.__scheduler == None:
      self.__scheduler = Scheduler('PNDM')
  
  @property
  def scheduler(self):
    return self.__scheduler
  
  def scheduler(self, value):
    self.pipe.register_modules(scheduler=value)
    self.__scheduler = value

  def generate_latents(self, batch_size, offset):
    size_offset = batch_size + offset
    latents_shape = (size_offset, self.pipe.unet.in_channels, self.height // 8, self.width // 8)
    latents = torch.randn(
      latents_shape,
      generator=self.__generator,
      dtype=self.pipe.text_encoder.get_input_embeddings().weight.dtype,
      device='cuda'
    )
    return latents[offset:, ...]
  
  def run(self, text_prompt, negative_text_prompt, images_per_batch = 1, offset = 0):
    pipe = self.pipe
    generator = self.__generator
    generator_seed = self.manual_seed
    width = self.width
    height = self.height
    num_inference_steps = self.num_inference_steps
    guidance_scale = self.guidance_scale
    scheduler = self.__scheduler
    ddim_eta = scheduler.ddim_eta

    if generator_seed == None:
      generator_seed = generator.seed()
    else:
      generator.manual_seed(generator_seed)
    
    latents = self.generate_latents(images_per_batch, offset)

    if negative_text_prompt == '':
      negative_prompt = None

    meta = {
      'prompt': text_prompt,
      'negative_prompt': negative_text_prompt,
      'width': width,
      'height': height,
      'num_inference_steps': num_inference_steps,
      'guidance_scale': guidance_scale,
      'scheduler': scheduler.name,
      'seed': generator_seed,
      'num_images': images_per_batch
    }

    images = pipe(
      text_prompt,
      negative_prompt=negative_text_prompt,
      num_images_per_prompt=images_per_batch,
      width=width,
      height=height,
      num_inference_steps=num_inference_steps,
      guidance_scale=guidance_scale,
      latents=latents,
      eta=ddim_eta
    ).images

    return (images, meta)

In [None]:
#@title Pipeline { vertical-output: true }

#@markdown Creates the Stable Diffusion pipeline. If the generation model is not in the cache,
#@markdown downloading will take place in this step as well.

tokenizer = CLIPTokenizer.from_pretrained(
  model_repository,
  revision=model_revision,
  subfolder='tokenizer',
  cache_dir=storage_cache_full_path,
  local_files_only=local_files_only
) 

text_encoder = CLIPTextModel.from_pretrained(
  model_repository,
  revision=model_revision,
  subfolder='text_encoder',
  torch_dtype=torch_dtype,
  cache_dir=storage_cache_full_path,
  local_files_only=local_files_only
)

for meta in concepts_meta:
  concepts_name = meta['name']
  concepts_token = load_concept(
    tokenizer,
    text_encoder,
    meta['files']['learned_embeds.bin']
  )
  print(f'Loaded {concepts_name} as {concepts_token}')

pipe = StableDiffusionPipeline.from_pretrained(
  model_repository,
  revision=model_revision,
  torch_dtype=torch_dtype,
  text_encoder=text_encoder,
  tokenizer=tokenizer,
  cache_dir=storage_cache_full_path,
  local_files_only=local_files_only
).to('cuda')

if not enable_safety_checker:
  def safety_checker(images, **kwargs):
    return images, False
  pipe.safety_checker = safety_checker

## 🎨 Usage

In [None]:
#@markdown ### General

text_prompt = '' #@param { "type": "string" }
negative_text_prompt = '' #@param { "type": "string" }
width = 512 #@param {type:"slider", min:128, max:1024, step:128}
height = 512 #@param {type:"slider", min:128, max:1024, step:128}
num_inference_steps = 50 #@param {type:"slider", min:1, max:100, step:1}
guidance_scale = 7.5 #@param {type:"slider", min:0, max:30, step:0.5}

#@markdown ### Seed

#@markdown **Note:** the `manual_seed` parameter is ignored if `use_random_seed` is enabled.
use_random_seed = True #@param { "type": "boolean" }
manual_seed = 0 #@param {type:"integer"}

#@markdown ### Scheduler

use_scheduler = 'PNDM' #@param ["PNDM", "DDIM", "LMS"]

#@markdown ### Batch

num_batches = 1 #@param { "type": "number" }
images_per_batch = 1 #@param { type: "slider", min: 1, max: 4, step: 1 }
offset_batch = 0 #@param { type: "slider", min: 0, max: 4, step: 1 }

#@markdown ### Mutation

mutate_seed = True #@param { type: "boolean" }
mutate_guidance_scale = False #@param { type: "boolean" }
mutate_guidance_scale_min = 7.5 #@param {type:"slider", min:0, max:30, step:0.5}
mutate_guidance_scale_max = 21.5 #@param {type:"slider", min:0, max:30, step:0.5}
mutate_guidance_scale_interval = 1 #@param {type:"slider", min:0, max:5, step:0.5}

#@markdown ### Output

display_output = True #@param { "type": "boolean" }
display_batch_as_grid = True #@param { "type": "boolean" }
display_output_meta = True #@param { "type": "boolean" }
save_output = True #@param { "type": "boolean" }
save_output_path = '/run_{timestamp}' #@param { "type": "string" }

timestamp_start = time.time()
print('Starting up ...')

if save_output:
  timestamp_output = str(int(timestamp_start))
  storage_batch_path = save_output_path.replace('{timestamp}', timestamp_output)
  storage_batch_full_path = storage_output_full_path + storage_batch_path
  storage_meta_full_path = storage_batch_full_path + STORAGE_META_PATH
  Path(storage_meta_full_path).mkdir(parents=True, exist_ok=False)

if use_random_seed:
  manual_seed = None

scheduler = Scheduler(use_scheduler)

generator = Generator(
  pipe,
  width=width,
  height=height,
  num_inference_steps=num_inference_steps,
  guidance_scale=guidance_scale,
  manual_seed=manual_seed,
  scheduler=scheduler
)

for i in range(num_batches):
  if i > 0:
    if not mutate_seed:
      generator.manual_seed = meta['seed']
    if mutate_guidance_scale:
      guidance_scale = meta['guidance_scale'] + mutate_guidance_scale_interval
      if guidance_scale > mutate_guidance_scale_max:
        guidance_scale = mutate_guidance_scale_min
      generator.guidance_scale = guidance_scale
  
  current_batch = i + 1
  print('Batch {} of {}'.format(current_batch, num_batches))

  images, meta = generator.run(text_prompt, negative_text_prompt, images_per_batch, offset_batch)
  
  if display_output:
    display_images(images, display_batch_as_grid)
  if display_output_meta:
    display_meta(meta)
  if save_output:
    save_images(storage_batch_full_path, images, current_batch, meta)

timestamp_end = time.time()
execution_time = timestamp_end - timestamp_start

print('Finished ({} seconds)'.format(execution_time))