Replies: 2 comments 1 reply
-
Above example: In [1]: import fitz
In [2]: doc=fitz.open("Table1.pdf")
In [3]: page=doc[0]
In [4]: tabs = page.find_tables() # detect the tables
In [5]: len(tabs.tables)
Out[5]: 1
In [6]: tab = tabs[0]
In [7]: for e in tab.extract():
...: print(e)
...:
['大撒大撒 1', 'we are1', '大丈夫です 1', '큰 스프레드 1', '特色 1']
['大撒大撒 2', 'we are2', '大丈夫です 2', None, '特色 2']
['大撒大撒 3', 'we are3', '大丈夫です 3', '큰 스프레드 3', '特色 3']
['大撒大撒 4', 'we are4', '大丈夫です 4', '큰 스프레드 4', '特色 4']
['大撒大撒 5', 'we are5', '大丈夫です 5', '큰 스프레드 5', None]
['大撒大撒 6', 'we are6', '大丈夫です 6', '큰 스프레드 6', '特色 6']
In [8]: |
Beta Was this translation helpful? Give feedback.
1 reply
-
Closing this because on discord it was found to be a problem with an old release of PyMuPDF. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When extracting text by line, how to extract table structures and content?
Table1.pdf
For example, extract table structure like html code
Beta Was this translation helpful? Give feedback.
All reactions