Skip to content

Not able to detect tables from ocr to text converted pdf #988

Closed Answered by samkit-jain
kathimohan asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks for the PDF. For this case, the best way will be to use the explicit vertical lines. The table settings that you can use is

{
    "vertical_strategy": "explicit",
    "horizontal_strategy": "text",
    "snap_tolerance": 5,
    "explicit_vertical_lines": [87, 125, 335, 402, 525]
}

The result will be

['JA', 'WAHARLAL NEHRU TECHNOLOGICAL', 'UNIVERSIT', 'Y HYDERABAD']
['', '', '', '']
['', 'Academic Calendar 2', '021-22', '']
['', '', '', '']
['', 'B. TECH./B.PHARM. Il & IV YEAR', 'S I & I! SEM', 'ESTERS']
['', '', '', '']
['I SEM', '', '', '']
['', '', '', '']
['', 'sue', '', 'Duration']
['S. No', 'Description', 'From', '__| To']
['', '', '', '']
['1', 'Commencement of I Semester cla…

Replies: 3 comments 4 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@samkit-jain
Comment options

Answer selected by samkit-jain
Comment options

You must be logged in to vote
3 replies
@samkit-jain
Comment options

@samkit-jain
Comment options

@samkit-jain
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
awaiting-code-or-pdf Issues and PRs awaiting code and/or a PDF from issue/PR-author
2 participants