GANs, Diffusion Models, Generative Tasks (txt2img, img2img, inpainting) #28

sarthak247 · 2023-10-10T13:03:54Z

Greetings everyone,
Inspired by #19 , me and my fellow collaborators have also outlined a course curriculum for our section but we would like to have some inputs and feedback from the HF team before we finalise it and start working on it. This is our chosen structure so far.

INTRODUCTION

What are generative vision models and how do they differ from other models?
Different types of generative models/tasks?
Prerequisites and resources to help

GANS & VAEs

VAEs theory (Theory)
Idea behind GANs, generator and discriminator (Theory & code)
- DCGAN as the main implementation
Simple explanation, showcase and external resources:
- StyleGAN
- CycleGAN
- VQGAN

Diffusion models

Theory of diffusion models and how they differ from GANs (limitations of GANs)
Evolution/ what made it work diffusion models DDPM, latent diffusion
Using stable diffusion
- Basic structure of SD
- How to use txt2img, img2img, inpainting
Simple explanation, showcase and external resources
- Dreambooth
- LoRA, show how to use, link to fine-tuning yourself
- ControlNet, show how to use

PRACTICAL APPLICATIONS & CHALLENGES

Real-time Constraints and Privacy Concerns
Bias concerns

CC: @hwaseem04 , @mattmdjaga, @charchit7
BCC: @johko , @lunarflu , @merveenoyan

We would like to know if this course structure is suitable or do there need to be changes for this. In particular we are interested to know:

Do we need to create separate files for each topic or squish it together into a big jupyter notebook? (essentially structural help)
How much code is too much? Like, for GANs, we've thought to stick for DCGAN for code from scratch and for the rest just refer to the pre-trained models and showcase how to use them directly.
Is the chosen curriculum sufficient or do we need to add more to it? Also, if it's too much, then what can be truncated or removed from it considering that we also have an existing diffusers course out there.

Thanks,
Sarthak (and @hwaseem04 , @mattmdjaga, @charchit7)

lunarflu · 2023-10-10T13:25:33Z

Great job! 🤗 A few comments:

I like the emphasis on theory always being first, always ensuring we know WHY a certain thing is important to learn. And then building on past ideas and emphasizing what has evolved - very nice!
As for "amount of code", I don't think we have a hard ceiling. Use what you think is helpful, exercise common sense, and it should be fine 😌💪

jere357 · 2023-10-10T20:59:49Z

Hello, i would like to help on this section, my discord username is cropinky. My introduction to adversarial learning was ESRGAN and i think it would be a cool part of this section.

johko · 2023-10-11T20:55:34Z

hey @sarthak247

thanks for coming up with this curriculum. I think it covers everything we need for this course. And I don't have much to add.

With our new folder structure where .mdx and notebook files live in separate repos you also have a bit more flexibility in dividing the coding heavy parts from the theoretical ones. Definitely feel free to not squish it all into one file ;)

charchit7 · 2023-10-12T09:57:38Z

Hey @merveenoyan, @sayakpaul would love your inputs on this one :)

arkajyotimitra · 2023-10-14T04:44:34Z

Great outlook to cover about GANs and diffusion models.
Since diffusion model has become such a vast concept in itself and there is already a diffusers course to explore both the length and breadth of it. The simplistic introduction towards the concept is apt.
One thing that might be helpful to look at is the association of diffusion models with physics (the place from where it originated). Some difference between score-based and energy-based diffusion models. To that degree these resources might may come handy:

a video lecture by Jascha Sohl-Dickstein
a video explanation of the diffusion models evolution by Yang Song
an insightful blog on different perspectives and their associations around diffusion models by Sander Dieleman

This might get too deep so I will leave it at your discretion to use it or just have fun reading/listening these sources 🤗. I just wanted to share as I enjoyed them and these gave me more insights about diffusion models as a whole.

Shamie cc resnet

alperenunlu added the Chapter Content Discuss and track the content of a chapter label Oct 10, 2023

pedrogengo mentioned this issue Oct 10, 2023

Multimodal Models - CLIP and relatives #29

Closed

snehilsanyal mentioned this issue Nov 1, 2023

Unit 4, Chapter 1 Fusion of Text and Vision: Draft Outline #54

Closed

sarthak247 mentioned this issue Dec 17, 2023

Add Variational Autoencoder and GANs #131

Merged

This was referenced Dec 24, 2023

This PR Introduces simple explaination for CycleGAN as part of Unit 5 : Generative Models #158

Merged

Adds basic structure of Stable diffusion and it's usecases. Part of Unit 5 #164

Merged

charchit7 mentioned this issue Jan 10, 2024

[WIP] added cycleGAN notebook! #175

Merged

johko closed this as completed Apr 21, 2024

merveenoyan pushed a commit that referenced this issue Apr 30, 2024

Merge pull request #28 from sezan92/ShamieCC-Resnet

9020fbb

Shamie cc resnet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GANs, Diffusion Models, Generative Tasks (txt2img, img2img, inpainting) #28

GANs, Diffusion Models, Generative Tasks (txt2img, img2img, inpainting) #28

sarthak247 commented Oct 10, 2023

lunarflu commented Oct 10, 2023

jere357 commented Oct 10, 2023

johko commented Oct 11, 2023

charchit7 commented Oct 12, 2023

arkajyotimitra commented Oct 14, 2023

GANs, Diffusion Models, Generative Tasks (txt2img, img2img, inpainting) #28

GANs, Diffusion Models, Generative Tasks (txt2img, img2img, inpainting) #28

Comments

sarthak247 commented Oct 10, 2023

INTRODUCTION

GANS & VAEs

Diffusion models

PRACTICAL APPLICATIONS & CHALLENGES

lunarflu commented Oct 10, 2023

jere357 commented Oct 10, 2023

johko commented Oct 11, 2023

charchit7 commented Oct 12, 2023

arkajyotimitra commented Oct 14, 2023