Implementation of Image-to-Text (Captioning) #6

bryanwong17 · 2024-05-02T08:13:12Z

Hi, I was wondering if CONCH is able to directly convert an image to text? From the code, it seems like CONCH is only available for "image-to-text retrieval," meaning that given an image and several texts, it will check which text is most similar to the given image. However, in the paper, there is also an example of CONCH doing captioning and a comparison between predicted and corrected captions. If so, could you please provide the code for doing captioning? Thanks!

Weiqin-Zhao · 2024-07-03T08:37:55Z

I am also looking for this amazing function of this excellent work, hope the authors can release the corresponding code and weights in the future.

fedshyvana · 2024-09-10T16:21:49Z

We don't have the code for image captioning at this time since the repo is meant to only provide inference capabilities to keep the code base clean. But feel free to use the open_clip (https://github.com/mlfoundations/open_clip) as reference if you want to implement this functionality!

bryanwong17 changed the title ~~Implementation of Image-to-Text~~ Implementation of Image-to-Text (Captioning) May 2, 2024

fedshyvana added the enhancement New feature or request label Sep 10, 2024

fedshyvana closed this as completed Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Image-to-Text (Captioning) #6

Implementation of Image-to-Text (Captioning) #6

bryanwong17 commented May 2, 2024 •

edited

Loading

Weiqin-Zhao commented Jul 3, 2024 •

edited

Loading

fedshyvana commented Sep 10, 2024

Implementation of Image-to-Text (Captioning) #6

Implementation of Image-to-Text (Captioning) #6

Comments

bryanwong17 commented May 2, 2024 • edited Loading

Weiqin-Zhao commented Jul 3, 2024 • edited Loading

fedshyvana commented Sep 10, 2024

bryanwong17 commented May 2, 2024 •

edited

Loading

Weiqin-Zhao commented Jul 3, 2024 •

edited

Loading