Zero-shot Computer Vision - Draft Outline #43

mmhamdy · 2023-10-22T21:14:07Z

This is an early draft outlining the Zero-shot Computer Vision chapter. Below I'll give a brief overview of the chapter content.
There is also a presentation slides for the main concepts with some (little😃) details available.

🔹 Introduction

This section will basically lay the ground for the rest of the chapter. Each subsection was initially a section of its own, but then we thought that it would be better to merge them together under one heading.

On Generalization
Zero-shot Learning (ZSL), History and Definitions
Comparison With Other Techniques: This part aims to differentiate between zero-shot learning and some other methods such as Open Set Recognition (OSR), Domain Adaptation, and Out of Distribution (OOD) Detection
Relationship with Transfer Learning: This part discusses how is zero-shot learning related to transfer learning, and differentiates between homogeneous and heterogeneous transfer learning. this will only discuss parts related to ZSL as there is already a transfer learning chapter.

🔹 Side Box: How Humans Recognize New Objects

This is a collapsable box for the interested reader about how humans are good at identifying new unseen objects and why this is not the same for machines.

🔹 Zero-shot Learning methods

Attributes and Descriptors: Discusses what are attributes and why the a need for them in ZSL.
Visual and Semantic Spaces: Discusses different embedding spaces for the visual and semantic data.
ZSL Baselines: Discusses the Directed Attribute Prediction (DAP) algorithm as a baseline for ZSL.
ZSL Algorithms: Discusses some selected zero-shot learning algorithms such as Embarrassingly Simple Zero-Shot Learning (ESZSL), Deep Visual-Semantic Embedding (DeViSE), Attribute Label Embedding (ALE), Structured Joint Embedding (SJE), and Semantic Autoencoder (SAE)
ZSL Benchmarks: Dicusses some ZSL benchmarks such as Animals with Attributes2 (AWA2), Caltech-UCSDBirds 200-2011 (CUB), Attribute Pascal and Yahoo (aPY), and Sun attribute database (SUN).
Evaluation: How zero-shot learning is evaluated, and the ZSLGBU Framework
ZSL vs. Generalized ZSL: What is Generalized Zero-shot Learning (GZSL), and how it is a more realistic version of plain zero-shot learning?

🔹 Zero-shot Learning with CLIP and friends

How is CLIP Different From Previous Approaches: CLIP has been introduced in previous chapters. Here, we will discuss briefly (I hope) the parts related to zero-shot learning.

🔹 Zero-shot Learning in Computer Vision

This part illustrates how zero-shot learning can be used in the context of many different computer vision tasks. CV tasks were introduced in previous chapters.

Zero-shot Object Recognition/ Image Classification

Methods
Code Example

Zero-shot Object Detection

Methods
Code Example

Zero-shot Instance Segmentation

Methods
Code Example

Other CV Tasks

Besides the three most common CV tasks mentioned above, in this section, we may discuss other interesting CV tasks in the ZSL context.

🔹 Advantages of Zero-shot Learning

This section discusses why zero-shot learning is important. This will span a paragraph or two at most.

🔹 Applications of Zero-shot Learning

This section aims to provide some real-world applications of zero-shot learning in a computer vision context. There are no specific ones yet.

🔹 Challenges and Limitations of Zero-shot Learning

Bias
Domain Shift
Hubness
Semantic Loss

🔹 Frontiers

This is a paragraph or a little bit more mentioning the current state-of-the-art and or recent experimental approaches in zero-shot learning.

🔹 Chapter Summary

This is another paragraph (or two 😅) that aims to condense the main ideas discussed in the chapter and Key Takeaways.

👩‍💻 Hands-on Notebook

This is a hands-on notebook that shows two things:

The implementation of a classic ZSL algorithm, for example, ESZSL from scratch.
The implementation of a ZSL pipeline using CLIP or another friend from scratch.

Notes

The chapter contents may seem overwhelming but we hope that we will get to a much shorter and dense version when we start working on the details.
The Algorithms part is the most volatile (high probability of change) one. There are a lot of ZSL algorithms out there and we are trying to choose a representative sample showing different approaches.
We will make sure that the ratio of plain ZSL : CV ZSL remains in the reasonable range.

Other Resources:

ATaylorAerospace · 2023-10-23T00:21:38Z

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

alperenunlu · 2023-10-24T00:07:16Z

Comprehensive content. This is just terrific.

Looking forward to this.
🚀 🚀 🚀 🚀 🚀

mmhamdy · 2023-10-24T11:39:50Z

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

Thanks, @ATaylorAerospace. We're still not sure which algorithm to use in the notebook but surely will keep that in mind.

mmhamdy · 2023-10-24T11:41:33Z

Comprehensive content. This is just terrific.

Looking forward to this. 🚀 🚀 🚀 🚀 🚀

Thanks, @alperenunlu for taking the time to read it.

lunarflu · 2023-10-24T12:21:49Z

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

mmhamdy · 2023-10-24T13:18:13Z

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

Yeah, of course. This is going to get much shorter bit by bit. The plan is to grow the chapter (async) and then work on pruning. Releasing section by section will also make the process much easier.

johko · 2023-10-25T19:55:01Z

This really is an extensive outline, thanks for all the work, it looks awesome. I thought I know some stuff about ZSL, but I could not have come up with that much material 😄

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

mmhamdy · 2023-10-25T20:53:37Z

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

Yeah, we will take an inside-out approach and start from Zero-shot Learning in Computer Vision (which still needs an outline of its own, I think 😅) and then work on the rest of the sections once it is finished. The ZSL section will have its own share but we will keep it brief, kind of like introducing Q-Learning before talking about Deep Q-Learning in Reinforcement learning. The outermost chapters (Introduction, Advantages, Applications, and Frontiers) are mostly a couple of paragraphs long and won't take much of the chapter.

merveenoyan · 2023-10-31T13:19:50Z

@mmhamdy great outline! my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

mmhamdy · 2023-11-01T21:08:58Z

my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

It's too broad, I agree. But a lot of the non-CV-specific content will be pruned and condensed to just provide a smooth transition from plain ZSL to CV ZSL. We are starting with the Zero-shot Learning in Computer Vision section in order not to get overwhelmed by the other sections. Once finished, we will branch out from there and start working on the rest of the sections.

alperenunlu added the Chapter Content Discuss and track the content of a chapter label Oct 23, 2023

johko mentioned this issue Nov 10, 2023

Multimodal Transfer Learning: Draft outline #56

Closed

mmhamdy self-assigned this Dec 2, 2023

mmhamdy mentioned this issue Dec 18, 2023

Zero-shot Computer Vision - Draft Introduction #119

Merged

johko closed this as completed Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-shot Computer Vision - Draft Outline #43

Zero-shot Computer Vision - Draft Outline #43

mmhamdy commented Oct 22, 2023 •

edited

Loading

ATaylorAerospace commented Oct 23, 2023 •

edited

Loading

alperenunlu commented Oct 24, 2023

mmhamdy commented Oct 24, 2023

mmhamdy commented Oct 24, 2023

lunarflu commented Oct 24, 2023

mmhamdy commented Oct 24, 2023

johko commented Oct 25, 2023

mmhamdy commented Oct 25, 2023

merveenoyan commented Oct 31, 2023

mmhamdy commented Nov 1, 2023

Zero-shot Computer Vision - Draft Outline #43

Zero-shot Computer Vision - Draft Outline #43

Comments

mmhamdy commented Oct 22, 2023 • edited Loading

🔹 Introduction

🔹 Side Box: How Humans Recognize New Objects

🔹 Zero-shot Learning methods

🔹 Zero-shot Learning with CLIP and friends

🔹 Zero-shot Learning in Computer Vision

Zero-shot Object Recognition/ Image Classification

Zero-shot Object Detection

Zero-shot Instance Segmentation

Other CV Tasks

🔹 Advantages of Zero-shot Learning

🔹 Applications of Zero-shot Learning

🔹 Challenges and Limitations of Zero-shot Learning

🔹 Frontiers

🔹 Chapter Summary

👩‍💻 Hands-on Notebook

ATaylorAerospace commented Oct 23, 2023 • edited Loading

alperenunlu commented Oct 24, 2023

mmhamdy commented Oct 24, 2023

mmhamdy commented Oct 24, 2023

lunarflu commented Oct 24, 2023

mmhamdy commented Oct 24, 2023

johko commented Oct 25, 2023

mmhamdy commented Oct 25, 2023

merveenoyan commented Oct 31, 2023

mmhamdy commented Nov 1, 2023

mmhamdy commented Oct 22, 2023 •

edited

Loading

ATaylorAerospace commented Oct 23, 2023 •

edited

Loading