You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are there plans to evaluate the vision modality of GPT-4? I am interested to know how GPT-4 could perform on classification tasks with 0- and few-shot-learning and how it compares to vision-only models. If the few-shot-learning capabilities of LLMs translate to other modalities, this would be a real game changer.
Question out of curiosity: How was the vision-modality incorperated? Maybe similar approaches can be taken for other modalities, such as audio or video? Would be an interessting Open-Source project for sure :)
The text was updated successfully, but these errors were encountered:
I have an engineering exam bank of about 1000 questions with simple illustrations. I have the questions already in JSONL format but some of them rely on the image to answer correctly.
Are there plans to evaluate the vision modality of GPT-4? I am interested to know how GPT-4 could perform on classification tasks with 0- and few-shot-learning and how it compares to vision-only models. If the few-shot-learning capabilities of LLMs translate to other modalities, this would be a real game changer.
Question out of curiosity: How was the vision-modality incorperated? Maybe similar approaches can be taken for other modalities, such as audio or video? Would be an interessting Open-Source project for sure :)
The text was updated successfully, but these errors were encountered: