Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-shot Computer Vision - Draft Outline #43

Closed
4 of 22 tasks
mmhamdy opened this issue Oct 22, 2023 · 10 comments
Closed
4 of 22 tasks

Zero-shot Computer Vision - Draft Outline #43

mmhamdy opened this issue Oct 22, 2023 · 10 comments
Assignees
Labels
Chapter Content Discuss and track the content of a chapter

Comments

@mmhamdy
Copy link
Collaborator

mmhamdy commented Oct 22, 2023

This is an early draft outlining the Zero-shot Computer Vision chapter. Below I'll give a brief overview of the chapter content.
There is also a presentation slides for the main concepts with some (little😃) details available.

🔹 Introduction

This section will basically lay the ground for the rest of the chapter. Each subsection was initially a section of its own, but then we thought that it would be better to merge them together under one heading.

  • On Generalization
  • Zero-shot Learning (ZSL), History and Definitions
  • Comparison With Other Techniques: This part aims to differentiate between zero-shot learning and some other methods such as Open Set Recognition (OSR), Domain Adaptation, and Out of Distribution (OOD) Detection
  • Relationship with Transfer Learning: This part discusses how is zero-shot learning related to transfer learning, and differentiates between homogeneous and heterogeneous transfer learning. this will only discuss parts related to ZSL as there is already a transfer learning chapter.

🔹 Side Box: How Humans Recognize New Objects

This is a collapsable box for the interested reader about how humans are good at identifying new unseen objects and why this is not the same for machines.

🔹 Zero-shot Learning methods

🔹 Zero-shot Learning with CLIP and friends

  • How is CLIP Different From Previous Approaches: CLIP has been introduced in previous chapters. Here, we will discuss briefly (I hope) the parts related to zero-shot learning.

🔹 Zero-shot Learning in Computer Vision

This part illustrates how zero-shot learning can be used in the context of many different computer vision tasks. CV tasks were introduced in previous chapters.

Zero-shot Object Recognition/ Image Classification

  • Methods
  • Code Example

Zero-shot Object Detection

  • Methods
  • Code Example

Zero-shot Instance Segmentation

  • Methods
  • Code Example

Other CV Tasks

Besides the three most common CV tasks mentioned above, in this section, we may discuss other interesting CV tasks in the ZSL context.

🔹 Advantages of Zero-shot Learning

This section discusses why zero-shot learning is important. This will span a paragraph or two at most.

🔹 Applications of Zero-shot Learning

This section aims to provide some real-world applications of zero-shot learning in a computer vision context. There are no specific ones yet.

🔹 Challenges and Limitations of Zero-shot Learning

  • Bias
  • Domain Shift
  • Hubness
  • Semantic Loss

🔹 Frontiers

This is a paragraph or a little bit more mentioning the current state-of-the-art and or recent experimental approaches in zero-shot learning.

🔹 Chapter Summary

This is another paragraph (or two 😅) that aims to condense the main ideas discussed in the chapter and Key Takeaways.

👩‍💻 Hands-on Notebook

This is a hands-on notebook that shows two things:

  1. The implementation of a classic ZSL algorithm, for example, ESZSL from scratch.
  2. The implementation of a ZSL pipeline using CLIP or another friend from scratch.

Notes

  1. The chapter contents may seem overwhelming but we hope that we will get to a much shorter and dense version when we start working on the details.
  2. The Algorithms part is the most volatile (high probability of change) one. There are a lot of ZSL algorithms out there and we are trying to choose a representative sample showing different approaches.
  3. We will make sure that the ratio of plain ZSL : CV ZSL remains in the reasonable range.

Other Resources:

@ATaylorAerospace
Copy link
Collaborator

ATaylorAerospace commented Oct 23, 2023

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

@alperenunlu alperenunlu added the Chapter Content Discuss and track the content of a chapter label Oct 23, 2023
@alperenunlu
Copy link
Collaborator

Comprehensive content. This is just terrific.

Looking forward to this.
🚀 🚀 🚀 🚀 🚀

@mmhamdy
Copy link
Collaborator Author

mmhamdy commented Oct 24, 2023

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

Thanks, @ATaylorAerospace. We're still not sure which algorithm to use in the notebook but surely will keep that in mind.

@mmhamdy
Copy link
Collaborator Author

mmhamdy commented Oct 24, 2023

Comprehensive content. This is just terrific.

Looking forward to this. 🚀 🚀 🚀 🚀 🚀

Thanks, @alperenunlu for taking the time to read it.

@lunarflu
Copy link
Collaborator

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

@mmhamdy
Copy link
Collaborator Author

mmhamdy commented Oct 24, 2023

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

Yeah, of course. This is going to get much shorter bit by bit. The plan is to grow the chapter (async) and then work on pruning. Releasing section by section will also make the process much easier.

@johko
Copy link
Owner

johko commented Oct 25, 2023

This really is an extensive outline, thanks for all the work, it looks awesome. I thought I know some stuff about ZSL, but I could not have come up with that much material 😄

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

@mmhamdy
Copy link
Collaborator Author

mmhamdy commented Oct 25, 2023

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

Yeah, we will take an inside-out approach and start from Zero-shot Learning in Computer Vision (which still needs an outline of its own, I think 😅) and then work on the rest of the sections once it is finished. The ZSL section will have its own share but we will keep it brief, kind of like introducing Q-Learning before talking about Deep Q-Learning in Reinforcement learning. The outermost chapters (Introduction, Advantages, Applications, and Frontiers) are mostly a couple of paragraphs long and won't take much of the chapter.

@merveenoyan
Copy link
Collaborator

@mmhamdy great outline! my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

@mmhamdy
Copy link
Collaborator Author

mmhamdy commented Nov 1, 2023

my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

It's too broad, I agree. But a lot of the non-CV-specific content will be pruned and condensed to just provide a smooth transition from plain ZSL to CV ZSL. We are starting with the Zero-shot Learning in Computer Vision section in order not to get overwhelmed by the other sections. Once finished, we will branch out from there and start working on the rest of the sections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chapter Content Discuss and track the content of a chapter
Projects
None yet
Development

No branches or pull requests

6 participants