Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow ups to DocumentQuestionAnswering Pipeline #18926

Open
4 of 8 tasks
ankrgyl opened this issue Sep 7, 2022 · 16 comments · Fixed by #19027
Open
4 of 8 tasks

Follow ups to DocumentQuestionAnswering Pipeline #18926

ankrgyl opened this issue Sep 7, 2022 · 16 comments · Fixed by #19027

Comments

@ankrgyl
Copy link
Contributor

ankrgyl commented Sep 7, 2022

Feature request

PR #18414 has a number of TODOs left over which we'd like to track as follow up tasks.

Pipeline

  • Add support for documents which have more than the tokenizer span (e.g. 512) words
  • Add support for multi-page documents (e.g. for Donut, we need to present one image per page)
  • Rework use of tokenizer to avoid the need for add_prefix_space=True
  • Re-add support for Donut
  • Refactor Donut usage in the pipeline or move logic into the tokenizer, so that pipeline does not have as much Donut-specific code

Testing

  • Enable test_small_model_pt_donut once hf-internal-testing/tiny-random-donut is implemented

Documentation / Website

Motivation

These are follow ups that we cut from the initial scope of PR #18414.

Your contribution

Happy to contribute many or all of these.

@NielsRogge
Copy link
Contributor

cc'ing @Narsil for enabling the model on the inference API, cc'ing @stevhliu for adding tutorial documentation to the task summary

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Sep 9, 2022

@NielsRogge because we removed donut-swin from AutoModelForDocumentQuestionAnswering, you can no longer create a pipeline with donut, i.e.

In [2]: p = pipeline('document-question-answering', model='naver-clova-ix/donut-base-finetuned-docvqa')
/Users/ankur/projects/transformers/venv/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
The model 'VisionEncoderDecoderModel' is not supported for document-question-answering. Supported models are ['LayoutLMForQuestionAnswering', 'LayoutLMv2ForQuestionAnswering', 'LayoutLMv3ForQuestionAnswering'].

Should we add it back to that list? Or what is the best way to support that?

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Sep 26, 2022

Could we re-open this (I don't think I have permissions to)? There are still a few changes necessary to complete all of the checkboxes.

@sgugger sgugger reopened this Sep 26, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@JuheonChu
Copy link
Contributor

JuheonChu commented Mar 23, 2023

@ankrgyl Can I ask you if I can work on this?
If I want to work on adding support for multi-page documents (e.g. for Donut, we need to present one image per page), may I ask you where I can start to proceed making contributions?

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 23, 2023

Absolutely!

Feel free to start looking here: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/document_question_answering.py

@JuheonChu
Copy link
Contributor

JuheonChu commented Mar 24, 2023

  • Add support for multi-page documents (e.g. for Donut, we need to present one image per page)

Thank you! I carefully read it! In order to add support for multi-page documents in document_question_answering.py, should I modify some methods in that file such as preprocess()? Can I create a pull request of the file you provided after modifying those methods?

@elabongaatuo
Copy link
Contributor

@ankrgyl Hello. I would love to contribute to this task : Add tutorial documentation to Task Summary. Is it open and may I get pointers on how to begin working on it?
Thank you.

@y3sar
Copy link
Contributor

y3sar commented May 10, 2023

@elabongaatuo It seems like the Add tutorial documentation to Task Summary is still open. are you working on it? It seems you need to change starting from here

@elabongaatuo
Copy link
Contributor

Hello @y3sar , no, I am not working on it at the moment.

@y3sar
Copy link
Contributor

y3sar commented May 10, 2023

@elabongaatuo then I would like to take it up if there is no problem with you

Hello @y3sar , no, I am not working on it at the moment.

@elabongaatuo
Copy link
Contributor

@elabongaatuo then I would like to take it up if there is no problem with you

Hello @y3sar , no, I am not working on it at the moment.

@y3sar , sure thing. 😊 no problem.

@rajveer43
Copy link
Contributor

rajveer43 commented Jul 26, 2023

@ankrgyl I would Like to work on this Add tutorial documentation to Task Summary and also in Add support for multi-page documents (e.g. for Donut, we need to present one image per page)

@hackpk
Copy link
Contributor

hackpk commented Aug 9, 2023

@ankrgyl Can i work on Refactor Donut usage ???

@dhivyeshrk
Copy link

Hey @ankrgyl ! I would be happy to contribute to this issue by adding support for multi-page documents.
Could you assign this to me ?

@ArthurZucker
Copy link
Collaborator

Hey! For anyone wanting to contribute, the best way is to just open a PR and link it here! We don't usually assign issues as they can be taken over in case of inactivity for example! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants