# Huggingface Transformers & Stable Diffusion
> 郑问迪
> zhengwd23@mails.tsinghua.edu.cn

- Huggingface Transformers: https://huggingface.co/

    Huggingface🤗 (HF) 是一个提供数据科学与机器学习相关开源模型及数据的平台。
    
    通过HF提供的`transformers`库，你可以仅使用若干行代码非常便捷地调用开源的模型实现推理及训练等自定义需求。
- Stable Diffusion: https://github.com/CompVis/stable-diffusion

    文本到图像扩散生成模型（Diffusion Models for Text-to-Image）中具有代表性的开源工作。
    
    Stable Diffusion (SD) 是目前文到图生成社区使用的主流基础模型。
    
    和官方仓库相比，[webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) 这个由民间大佬在模型基础上开发的集成仓库似乎更受大家欢迎）

## Huggingface Transformers

> 更多可参照HF官方tutorial：https://huggingface.co/docs/transformers/

In [None]:
# 基于 torch / tensorflow
# 本文档基于torch演示
# 安装
%pip install transformers datasets
%pip install accelerate

`transformers.pipeline`: 使用模型进行推理的API

部分任务的用法示例如下：
| 任务 | 用法 |
| :-- | :-- |
| 文本分类（Text classification）      | pipeline(task=“sentiment-analysis”)   |
| 文本生成（Text generation）          | pipeline(task=“text-generation”)      |
| 图像分类（Image classification）     | pipeline(task=“image-classification”) |
| 目标检测（Image classification）     | pipeline(task=“image-segmentation”)   |
| 视觉问答（Visual question answering）| pipeline(task=“vqa”)                  |
| 图像标注（Image captioning）         | pipeline(task=“image-to-text”)        |


In [None]:
# 以情感分类为例
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
results = classifier([
    "We are very happy to introduce python and transformers to you.",
    "So sad you didn't attend the class."
])
for result in results:
    print(result)

In [None]:
# 在pipeline中使用非默认的另外指定的module（比如model和tokenizer）
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
results = classifier([
    "We are very happy to introduce python and transformers to you.",
    "So sad you didn't attend the class."
])
for result in results:
    print(result)

`transformers.Trainer`: 训练模型的API

In [None]:
# 仍以训练情感分类模型为例

# 载入预训练模型
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

In [None]:
# 载入数据集
# 载入使用的tokenizer并定义预处理函数
from transformers import AutoTokenizer
from datasets import load_dataset

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
dataset = load_dataset("rotten_tomatoes")

def tokenize_dataset(dataset):
    return tokenizer(dataset["text"])
dataset = dataset.map(tokenize_dataset, batched=True) # apply tokenize function to the dataset

# 从数据集获取数据并为训练提供批次（batch）的数据
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
# 配置训练参数
from transformers import TrainingArguments
training_args = TrainingArguments(
    output_dir="path/to/save/folder/",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
)

In [None]:
# 进行trainer的初始化
from transformers import Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)  # doctest: +SKIP

# 使用trainer进行训练
trainer.train()

## Stable Diffusion

> 主要使用diffuser：huggingface推出的使用diffusion model的库。
> https://huggingface.co/docs/diffusers/index
> WebUI提供了更加便捷的LoRA接口，也有更丰富的LoRA社区建设。

In [None]:
# 安装依赖库
%pip install diffusers
%pip install safetensors

> 文本到图像生成，图像到图像生成，图像补全， etc. 

In [None]:
# 加载模型
from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

# Text-to-image generation: 根据文本生成图像
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image.save("generations/t2i.png")

In [None]:
# # Inpainting: 图像补全
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

prompt = "A majestic tiger sitting on a bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=50, strength=0.80).images[0]
image.save("generations/inpaint.png")

In [None]:
# Image-to-image generation: 图像到图像生成
from diffusers import StableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image

# 使用适配该任务的模型
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe = pipe.to("cuda")
url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"

init_image = load_image(url).convert("RGB")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt, image=init_image).images[0]
image.save("generations/i2i.png")

In [None]:
# 添加Lora进行风格化生成
# https://huggingface.co/spaces/multimodalart/LoraTheExplorer
# https://civitai.com
from diffusers import StableDiffusionXLPipeline
import torch

# 加载base model
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

In [None]:
# Lora: Pixel Art XL
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors")
prompt = "pixel art..."
lora_scale= 0.9
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5, cross_attention_kwargs={"scale": lora_scale}).images[0]
image.save("image_lora.png")

## Gradio

> 简单的demo界面制作库。
> 官方文档：https://www.gradio.app/guides

In [None]:
# 安装
%pip install gradio

In [None]:
# Hello world
# 作为python脚本时，以`gradio app.py`的形式运行。
import gradio as gr

def greet(name):
    return f"Hello world, {name}!"

demo = gr.Interface(fn=greet, inputs="text", outputs="text")
demo.launch()

In [1]:
# 多个不同种类的输入
import gradio as gr

def greet(name, is_morning, temperature):
    salutation = "Good morning" if is_morning else "Good evening"
    greeting = f"{salutation} {name}. It is {temperature} degrees today"
    celsius = (temperature - 32) * 5 / 9
    return greeting, round(celsius, 2)

demo = gr.Interface(
    fn=greet,
    inputs=["text", "checkbox", gr.Slider(0, 100)],
    outputs=["text", "number"],
)
demo.launch()

Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.




In [None]:
# 输入及输出图像的样例
import numpy as np
import gradio as gr

def sepia(input_img):
    sepia_filter = np.array([
        [0.393, 0.769, 0.189], 
        [0.349, 0.686, 0.168], 
        [0.272, 0.534, 0.131]
    ])
    sepia_img = input_img.dot(sepia_filter.T)
    sepia_img /= sepia_img.max()
    return sepia_img

demo = gr.Interface(sepia, gr.Image(shape=(200, 200)), "image")
demo.launch()

In [None]:
# gradio.Blocks() 模块化实现更复杂的gradio配置
import numpy as np
import gradio as gr

def flip_text(x):
    return x[::-1]

def flip_image(x):
    return np.fliplr(x)

with gr.Blocks() as demo:
    gr.Markdown("Flip text or image files using this demo.")
    with gr.Tab("Flip Text"):
        text_input = gr.Textbox()
        text_output = gr.Textbox()
        text_button = gr.Button("Flip")
    with gr.Tab("Flip Image"):
        with gr.Row():
            image_input = gr.Image()
            image_output = gr.Image()
        image_button = gr.Button("Flip")

    text_button.click(flip_text, inputs=text_input, outputs=text_output)
    image_button.click(flip_image, inputs=image_input, outputs=image_output)
    
demo.launch()

## The End

Have fun with channels of Stable Diffusion gradio demos!
- https://huggingface.co/spaces/Manjushri/SDXL-1.0
- https://stablediffusion.fr/sdxl
- https://huggingface.co/spaces/multimodalart/LoraTheExplorer
- ...

Example prompts:
- breathtaking night street of Tokyo, neon lights. award-winning, professional, highly detailed
- anime artwork an empty classroom. anime style, key visual, vibrant, studio anime, highly detailed
- concept art of dragon flying over town, clouds. digital artwork, illustrative, painterly, matte painting, highly detailed, cinematic composition
- ...