Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add visual-question-answering / multimodal support to gradio notebook tasks #1392

Open
Bedrovelsen opened this issue Mar 3, 2024 · 4 comments · Fixed by #1396
Open

Add visual-question-answering / multimodal support to gradio notebook tasks #1392

Bedrovelsen opened this issue Mar 3, 2024 · 4 comments · Fixed by #1396

Comments

@Bedrovelsen
Copy link

Enjoying the recent gradio notebook stuff!

Was curious about when/if support for an additional hugging face task option of "visual question answering“" is planned?

If not currently planning to add this could a quick overview on how to add a new task category to the gradio notebook codebase (beside just manually reading over the current code for gradio notebooks myself to figure it out on my own which I can do of course but guidance from the team is preferred for best practices in contributing etc)

@saqadri
Copy link
Contributor

saqadri commented Mar 3, 2024

Thanks @Bedrovelsen! Would love your help adding that, and messages you on discord so our team can work with you to make sure you can get this set up!

@Bedrovelsen
Copy link
Author

Sounds good

@rholinshead
Copy link
Contributor

Just copying over the quick implementation overview from discord here:

  1. A new HuggingFaceVisualQuestionAnsweringRemoteInference ModelParser under https://github.com/lastmile-ai/aiconfig/tree/main/extensions/HuggingFace/python/src/aiconfig_extension_hugging_face/remote_inference_client folder
    This parser should look pretty similar to the existing HuggingFaceImage2TextRemoteInference model parser, with the following changes:
  • serialize implementation will do the same image/attachment data stuff but the constructed PromptInput will also need data string representing the 'question' string value from the data passed to serialize
  • refine_completion_params implementation can be the same, but should have comment pointing to the visual_question_answering api code: https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_client.py#L1785
  • deserialize implementation can be mostly the same, except we will need to add 'question' to the completion_data from the prompt data: completion_data["question"] = prompt["data"]
  • run implementation will be similar as well, just needs to call client.visual_question_answering with the completion_data and need to handle the response as desired. It looks like the response will be a list of VisualQuestionAnsweringOutputElement objects; we'll want to serialize those as ExecuteResult outputs in the format you think is best. For example, we could have data be the answer and then store the score in metadata

I believe the helpers about validating/retrieving the image from attachments can just be kept the same.

With the parser implemented, we can expose it in the extension here: https://github.com/lastmile-ai/aiconfig/blob/main/extensions/HuggingFace/python/src/aiconfig_extension_hugging_face/__init__.py

For testing the extension, please see README instructions - https://github.com/lastmile-ai/aiconfig/blob/main/extensions/HuggingFace/python/README.md

Then, I would recommend importing and registering the new parser in https://github.com/lastmile-ai/aiconfig/blob/main/cookbooks/Gradio/aiconfig_model_registry.py with id "Visual Question Answering" and then following the Getting Started instructions in https://github.com/lastmile-ai/aiconfig/edit/main/cookbooks/Gradio/README.md to open the huggingface.aiconfig.json file with the new parser registered.

On the UI side, we will need to add a new PromptSchema to the client for rendering the parser's input and settings nicely. I can implement that shortly

rholinshead pushed a commit that referenced this issue Mar 4, 2024
# Implement HuggingFaceVisualQuestionAnsweringRemoteInferencePromptSchema

For #1392

This will add the prompt schema so that visual question answering prompts have the nice UI for input and settings
rholinshead added a commit that referenced this issue Mar 4, 2024
…ma (#1396)

Implement HuggingFaceVisualQuestionAnsweringRemoteInferencePromptSchema

# Implement
HuggingFaceVisualQuestionAnsweringRemoteInferencePromptSchema

For #1392

This will add the prompt schema so that visual question answering
prompts have the nice UI for input and settings

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/1396).
* #1397
* __->__ #1396
@rholinshead rholinshead reopened this Mar 4, 2024
@rholinshead
Copy link
Contributor

Whoops, linked #1396 which has the schema changes and it auto-closed. This issue is still open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants