You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was wondering if CONCH is able to directly convert an image to text? From the code, it seems like CONCH is only available for "image-to-text retrieval," meaning that given an image and several texts, it will check which text is most similar to the given image. However, in the paper, there is also an example of CONCH doing captioning and a comparison between predicted and corrected captions. If so, could you please provide the code for doing captioning? Thanks!
The text was updated successfully, but these errors were encountered:
bryanwong17
changed the title
Implementation of Image-to-Text
Implementation of Image-to-Text (Captioning)
May 2, 2024
We don't have the code for image captioning at this time since the repo is meant to only provide inference capabilities to keep the code base clean. But feel free to use the open_clip (https://github.com/mlfoundations/open_clip) as reference if you want to implement this functionality!
Hi, I was wondering if CONCH is able to directly convert an image to text? From the code, it seems like CONCH is only available for "image-to-text retrieval," meaning that given an image and several texts, it will check which text is most similar to the given image. However, in the paper, there is also an example of CONCH doing captioning and a comparison between predicted and corrected captions. If so, could you please provide the code for doing captioning? Thanks!
The text was updated successfully, but these errors were encountered: