Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error #636

Closed
Puneet0353 opened this issue Apr 5, 2022 · 2 comments
Closed

Error #636

Puneet0353 opened this issue Apr 5, 2022 · 2 comments
Labels

Comments

@Puneet0353
Copy link

Describe the bug

A clear and concise description of what the bug is.
ValueError: not enough values to unpack (expected 2, got 1)

The complete details of the error are-

ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14976/3193579313.py in
5 #Get Basic Data and convert them to Dictionary
6 page = pdf.pages[1]
----> 7 Page1_Tables = page.extract_tables()
8 input(Page1_Tables)
9 B1 = pd.DataFrame(Page1_Tables[0])

~\anaconda3\lib\site-packages\pdfplumber\page.py in extract_tables(self, table_settings)
223 def extract_tables(self, table_settings={}):
224 table_settings = TableFinder.resolve_table_settings(table_settings)
--> 225 tables = self.find_tables(table_settings)
226
227 extract_kwargs = dict(

~\anaconda3\lib\site-packages\pdfplumber\page.py in find_tables(self, table_settings)
219
220 def find_tables(self, table_settings={}):
--> 221 return TableFinder(self, table_settings).tables
222
223 def extract_tables(self, table_settings={}):

~\anaconda3\lib\site-packages\pdfplumber\table.py in init(self, page, settings)
472 self.page = page
473 self.settings = self.resolve_table_settings(settings)
--> 474 self.edges = self.get_edges()
475 self.intersections = edges_to_intersections(
476 self.edges,

~\anaconda3\lib\site-packages\pdfplumber\table.py in get_edges(self)
568
569 if v_strat == "lines":
--> 570 v_base = utils.filter_edges(self.page.edges, "v")
571 elif v_strat == "lines_strict":
572 v_base = utils.filter_edges(self.page.edges, "v", edge_type="line")

~\anaconda3\lib\site-packages\pdfplumber\container.py in edges(self)
77 if hasattr(self, "_edges"):
78 return self._edges
---> 79 line_edges = list(map(utils.line_to_edge, self.lines))
80 self._edges = self.rect_edges + line_edges
81 return self._edges

~\anaconda3\lib\site-packages\pdfplumber\container.py in lines(self)
35 @Property
36 def lines(self):
---> 37 return self.objects.get("line", [])
38
39 @Property

~\anaconda3\lib\site-packages\pdfplumber\page.py in objects(self)
150 if hasattr(self, "_objects"):
151 return self._objects
--> 152 self._objects = self.parse_objects()
153 return self._objects
154

~\anaconda3\lib\site-packages\pdfplumber\page.py in parse_objects(self)
206 def parse_objects(self):
207 objects = {}
--> 208 for obj in self.iter_layout_objects(self.layout._objs):
209 kind = obj["object_type"]
210 if kind in ["anno"]:

~\anaconda3\lib\site-packages\pdfplumber\page.py in layout(self)
96 )
97 interpreter = PDFPageInterpreter(self.pdf.rsrcmgr, device)
---> 98 interpreter.process_page(self.page_obj)
99 self._layout = device.get_result()
100 return self._layout

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in process_page(self, page)
1003 ctm = (1, 0, 0, 1, -x0, -y0)
1004 self.device.begin_page(page, ctm)
-> 1005 self.render_contents(page.resources, page.contents, ctm=ctm)
1006 self.device.end_page(page)
1007 return

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in render_contents(self, resources, streams, ctm)
1021 self.init_resources(resources)
1022 self.init_state(ctm)
-> 1023 self.execute(list_value(streams))
1024 return
1025

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in execute(self, streams)
1049 else:
1050 log.debug('exec: %s', name)
-> 1051 func()
1052 else:
1053 if settings.STRICT:

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in do_s(self)
584 """Close and stroke path"""
585 self.do_h()
--> 586 self.do_S()
587 return
588

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in do_S(self)
576 def do_S(self) -> None:
577 """Stroke path"""
--> 578 self.device.paint_path(self.graphicstate, True, False, False,
579 self.curpath)
580 self.curpath = []

~\anaconda3\lib\site-packages\pdfminer\converter.py in paint_path(self, gstate, stroke, fill, evenodd, path)
119 raw_pts = [cast(Point, p[-2:] if p[0] != 'h' else path[0][-2:])
120 for p in path]
--> 121 pts = [apply_matrix_pt(self.ctm, pt) for pt in raw_pts]
122
123 if shape in {'mlh', 'ml'}:

~\anaconda3\lib\site-packages\pdfminer\converter.py in (.0)
119 raw_pts = [cast(Point, p[-2:] if p[0] != 'h' else path[0][-2:])
120 for p in path]
--> 121 pts = [apply_matrix_pt(self.ctm, pt) for pt in raw_pts]
122
123 if shape in {'mlh', 'ml'}:

~\anaconda3\lib\site-packages\pdfminer\utils.py in apply_matrix_pt(m, v)
251 def apply_matrix_pt(m: Matrix, v: Point) -> Point:
252 (a, b, c, d, e, f) = m
--> 253 (x, y) = v
254 """Applies a matrix to a point."""
255 return a * x + c * y + e, b * x + d * y + f

ValueError: not enough values to unpack (expected 2, got 1)

Code to reproduce the problem

import pdfplumber
FILE = "D:\Astro\Charts\"+Name+".pdf"
pdf = pdfplumber.open(FILE)
#Get Basic Data and convert them to Dictionary
page = pdf.pages[1]
Page1_Tables = page.extract_tables()
Paste it here, or attach a Python file.

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

What did you expect the result should have been?
It should have extracted tables. It was working fine. However after I reinstalled Anaconda with Python 3.9, this problem has started coming

Actual behavior

What actually happened, instead?

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

  • pdfplumber version: [e.g., 0.5.22]
  • Python version: [e.g., 3.8.1]
  • OS: [e.g., Mac, Linux, etc.]

Additional context

Ajay VR Detailed.pdf

Add any other context/notes about the problem here.

@Puneet0353 Puneet0353 added the bug label Apr 5, 2022
@jsvine
Copy link
Owner

jsvine commented Apr 11, 2022

Hi @Puneet0353, and thanks for sharing this interesting example. I have examined the file and the error, and have come to the following conclusions:

  • Per the traceback you've pasted above (and which I've confirmed), the error is raised by pdfminer.six, the library we use to extract the raw object information from the PDFs. So this isn't an issue that cannot be resolved directly through pdfplumber.

  • pdfminer.six appears to raise the error due to an unusual graphics command in the PDF. I'm not entirely sure whether the PDF is malformed or whether it's just unusual. In either case, the PDF appears to parse cleanly if you first repair it with GhostScript:

 gs \
  -o "Ajay VR Detailed-repaired.pdf" \
  -sDEVICE=pdfwrite \
  -dPDFSETTINGS=/prepress \
  "Ajay VR Detailed.pdf"

I hope that helps. In the meantime, I plan to investigate whether there's a way to improve pdfminer.six's handling of the graphics command in your PDF, and will submit a PR on that repository if I find a solution.

@jsvine jsvine closed this as completed Apr 11, 2022
@Puneet0353
Copy link
Author

Puneet0353 commented Apr 11, 2022 via email

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants