-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refextract fails to extract from two-columned layout pdf #85
Comments
I don't think the issue is related to the layout. Two-column layout should work just fine usually. Refextract is not meant to be a general-purpose reference extraction tool but has been tuned to work well for High-Energy Physics and related fields. If citations styles are very different, it will get into trouble. In this case, I believe it's due to the heading being called refextract/refextract/references/regexs.py Lines 696 to 710 in 24418cd
|
Thanks for the prompt and elaborated response. It answers the other doubt I had also. :) |
Input PDF has two-columned layout. Refextract outputs empty array of references.
Input PDF has one-columned layout. Refextract works fine.
How can I allow refextract to parse both type of layouts?
Thank you.
The text was updated successfully, but these errors were encountered: