Skip to content

DocumentQuestionAnswering widget#306

Merged
mishig25 merged 4 commits intomainfrom
dqa-widget
Sep 16, 2022
Merged

DocumentQuestionAnswering widget#306
mishig25 merged 4 commits intomainfrom
dqa-widget

Conversation

@mishig25
Copy link
Copy Markdown
Collaborator

@mishig25 mishig25 commented Sep 16, 2022

DocumentQuestionAnswering by @ankrgyl

**first of all, amazing work @ankrgyl !

I'm opening a PR here from @ankrgyl branch.

Demo

https://63243249748e56079f18d750--huggingface-widgets.netlify.app/impira/layoutlm-document-qa

API shape

input (identical to VisualQuestionAnsweringWidget ):

const requestBody = {
    inputs: { 
        question: string, 
        image: string, // base64 str of img 
    },
};

output (similar to VisualQuestionAnsweringWidget):

Array<{ answer: string; score: number }>

todos:

  • test when api-inference dqa is up
  • test widget input samples
  • document widget input sample

cc: @NielsRogge @osanseviero

Questions:

@ankrgyl @Narsil Is the API shape for pipelines visual-questions-answering & document-questions-answering are identical? If so, we can just reuse VisualQA widget for DocumentQA pipeline as well. Please let me know

@mishig25 mishig25 requested review from Narsil and ankrgyl September 16, 2022 08:30
@Narsil
Copy link
Copy Markdown

Narsil commented Sep 16, 2022

@ankrgyl @Narsil Is the API shape for pipelines visual-questions-answering & document-questions-answering are identical? If so, we can just reuse VisualQA widget for DocumentQA pipeline as well. Please let me know

It is in the default case. There was actually a debate if they should be different or not.
What settle my opinion is that DocumentQA should be able to work on PDF without looking at them like images (but inspecting the content itself directly). This is not implemented yet, but it means that's it's different in Input/Output (PDF are a document not an image, not the same Python object).

But for the widget, becuase it's only using defaults, then yes it's the same input/output.

@mishig25
Copy link
Copy Markdown
Collaborator Author

mishig25 commented Sep 16, 2022

Oh I see

@Narsil Does it mean that the widget has to support PDF upload?

@Narsil
Copy link
Copy Markdown

Narsil commented Sep 16, 2022

Should be transparent for the widget, no ?

In any case neither the API nor the actual pipeline deal with it right now, but it's something that we could/should do.

@mishig25
Copy link
Copy Markdown
Collaborator Author

I see, in this case, I will just reuse "VisualQA widget for DocumentQA pipeline" until more features are needed

@mishig25
Copy link
Copy Markdown
Collaborator Author

@Narsil does api-inference support document-questions-answering atm?

* Reuse VisualQA for DocQA (for now)

* fix
@mishig25 mishig25 merged commit 82e81ca into main Sep 16, 2022
@mishig25 mishig25 deleted the dqa-widget branch September 16, 2022 11:53
@ankrgyl
Copy link
Copy Markdown
Collaborator

ankrgyl commented Sep 16, 2022

It looks like this has already been merged, but one thing I didn't do before pushing the initial PR is fix the examples to be more relevant to document question/answering. Is it possible still to change those? Happy to send a PR.

@ankrgyl
Copy link
Copy Markdown
Collaborator

ankrgyl commented Sep 16, 2022

			widgetData: [
				{
					text: "What animal is it?",
					src: "https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg",
				},
				{
					text: "Where is it?",
					src: "https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg",
				},
			],

should probably be

https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg

			widgetData: [
				{
					text: "What is the invoice number?",
					src: "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
				},
				{
					text: "What is the purchase amount??",
					src: "https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
				},
			],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants