Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdfdevice.TagExtractor: fix TypeError in render_string #665

Closed
wants to merge 1 commit into from

Conversation

0xabu
Copy link
Contributor

@0xabu 0xabu commented Sep 2, 2021

Fixes:

$ tools/pdf2txt.py --output_type tag samples/simple1.pdf
<page id="0" bbox="0.000,0.000,612.000,792.000" rotate="0">Traceback (most recent call last):
  File "tools/pdf2txt.py", line 204, in <module>
    sys.exit(main())
  File "tools/pdf2txt.py", line 198, in main
    outfp = extract_text(**vars(A))
  File "tools/pdf2txt.py", line 66, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/mnt/c/src/pdfminer.six/pdfminer/high_level.py", line 85, in extract_text_to_fp
    interpreter.process_page(page)
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 925, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 939, in render_contents
    self.execute(list_value(streams))
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 964, in execute
    func(*args)
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 832, in do_TJ
    self.device.render_string(self.textstate, seq, self.ncs,
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfdevice.py", line 191, in render_string
    self.outfp.write(utils.enc(text))
TypeError: a bytes-like object is required, not 'str'

Found via mypy (PR #661)

…es conversion in render_string

Fixes:
```
$ tools/pdf2txt.py --output_type tag samples/simple1.pdf
<page id="0" bbox="0.000,0.000,612.000,792.000" rotate="0">Traceback (most recent call last):
  File "tools/pdf2txt.py", line 204, in <module>
    sys.exit(main())
  File "tools/pdf2txt.py", line 198, in main
    outfp = extract_text(**vars(A))
  File "tools/pdf2txt.py", line 66, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/mnt/c/src/pdfminer.six/pdfminer/high_level.py", line 85, in extract_text_to_fp
    interpreter.process_page(page)
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 925, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 939, in render_contents
    self.execute(list_value(streams))
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 964, in execute
    func(*args)
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfinterp.py", line 832, in do_TJ
    self.device.render_string(self.textstate, seq, self.ncs,
  File "/mnt/c/src/pdfminer.six/pdfminer/pdfdevice.py", line 191, in render_string
    self.outfp.write(utils.enc(text))
TypeError: a bytes-like object is required, not 'str'
```
@0xabu
Copy link
Contributor Author

0xabu commented Sep 2, 2021

This was already fixed via #610

@0xabu 0xabu closed this Sep 2, 2021
@0xabu 0xabu deleted the TagExtractor_bytes_not_str branch September 2, 2021 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant