Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we perform pre-training/fine-tuning for visual quesion answering task on custom dataset. #23

Closed
kirito-0512 opened this issue Feb 26, 2024 · 2 comments

Comments

@kirito-0512
Copy link

I would greatly value your assistance in offering guidance for initiating pre-training/fine-tuning on the Visual Question Answering (VQA) task, specifically in the following aspects:

  1. The necessary format for the required dataset.
  2. Minimum hardware requirements for its execution.

Please note that while this question might be straightforward and potentially addressed by reviewing the model documentation, I am seeking an expert opinion on this matter.

Thank you sincerely.

@lorenmt
Copy link
Collaborator

lorenmt commented Feb 27, 2024

Hello, we have released VQA checkpoints in this repo, you can try it out first to see if it works within your needs. Otherwise, you should just follow the instructions in the documentation, i.e. getting the expert labels ready and modify the training config scripts.

@kirito-0512
Copy link
Author

Thank you!, will work on it would be higly appreciated if you could also provide any additonal resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants