Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heading_search is reporting the incorrect line_num #27

Open
mdietz3 opened this issue Feb 9, 2022 · 2 comments
Open

heading_search is reporting the incorrect line_num #27

mdietz3 opened this issue Feb 9, 2022 · 2 comments
Assignees
Labels
Milestone

Comments

@mdietz3
Copy link

mdietz3 commented Feb 9, 2022

I have tested this with multiple PDFs that were loaded into R as character vectors. In particular there is a PDF (character vector) that has a "CONTENTS" page on page 6. When previewing the text using head(text) the 6th element (page of the text) is the contents page. When searching for it using

heading_search('text',"CONTENTS")  

returns
keyword page_num
CONTENTS 7
I tried using the function directly with the source PDF and the same result occurs.

@lebebr01 lebebr01 self-assigned this Feb 11, 2022
@lebebr01 lebebr01 added the bug label Feb 11, 2022
@lebebr01 lebebr01 added this to the v0.4 milestone Feb 11, 2022
@lebebr01
Copy link
Owner

Thanks for submitting this, this is a holdover from some modification to the code previously. I'll fix this in the dev version soon.

@mdietz3
Copy link
Author

mdietz3 commented Feb 11, 2022

@lebebr01 great thanks for fixing it. To add some context it seems the issue is with a blank page.
The blank page shows as "" when looking at the document using head(document) in R. In the document with the issue the first 3 pages have text, the 4th is blank, the next 2 have text (the 6th page is the table of contents). Using heading_search I find the other pages correctly until the blank page. Even removing the blank page does not fix the error. If I remove pages up to and including the blank page it works correctly. For some reason I think the blank page is being counted twice or alters the page numbering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants