Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Skipping? #67

Closed
Ryandonofrio3 opened this issue Sep 14, 2023 · 6 comments
Closed

Prevent Skipping? #67

Ryandonofrio3 opened this issue Sep 14, 2023 · 6 comments

Comments

@Ryandonofrio3
Copy link

Hello. I am looking to simply process all pages of my PDF but I find it skipping about 50% of all pages due to repetition. But I can manually confirm they are not repeats. For instance my just 6 page PDF of an academic text only got the methods section. Is there a way to disable this and "force" the entire output?

(.venv) PS C:\Users\--\Desktop\Nougat> nougat .\t3.pdf -o .\output\ c:\Users\---\Desktop\Nougat\.venv\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3484.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0%| | 0/2 [00:00<?, ?it/s]WARNING:root:Found repetitions in sample 0 INFO:root:Processing file t3.pdf with 6 pages WARNING:root:Skipping page 1 due to repetitions. 50%|███████████████████████████████████████████████████████▌ | 1/2 [00:21<00:21, 21.34s/it]WARNING:root:Found repetitions in sample 1 WARNING:root:Skipping page 5 due to repetitions. 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:32<00:00, 16.44s/it] (.venv) PS C:\Users\---\Desktop\Nougat>

@lukas-blecher
Copy link
Contributor

Ok makes sense. I'll add support next week

@LHFO94
Copy link

LHFO94 commented Sep 15, 2023

That would be great, I am running into the same issue as well

@marwinsteiner
Copy link

Ok makes sense. I'll add support next week

That would be very useful!!

@lukas-blecher
Copy link
Contributor

Done in 8ad92cc
Will update pypi shortly

@jordantgh
Copy link

Done in 8ad92cc Will update pypi shortly

@lukas-blecher How do you set this behaviour? I'm on the latest commit. That commit is just a moved line AFAICT.

@lukas-blecher
Copy link
Contributor

Ok the commit is weird. Add --no-skipping when calling nougat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants