Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Illegal seek on macOS #28

Closed
metaist opened this issue Jul 13, 2023 · 1 comment
Closed

fix: Illegal seek on macOS #28

metaist opened this issue Jul 13, 2023 · 1 comment

Comments

@metaist
Copy link
Owner

metaist commented Jul 13, 2023

OS: macOS 13.4.1 (22F82)
Python: 3.11

When running pdfmerge on python 3.11, the following error occurs:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/pdfmerge/__init__.py", line 282, in pdfmerge
    with PdfWriter(stream) as writer:
  File "/opt/homebrew/lib/python3.11/site-packages/pypdf/_writer.py", line 227, in __exit__
    self.write(self.fileobj)
  File "/opt/homebrew/lib/python3.11/site-packages/pypdf/_writer.py", line 1173, in write
    self.write_stream(stream)
  File "/opt/homebrew/lib/python3.11/site-packages/pypdf/_writer.py", line 1146, in write_stream
    object_positions = self._write_pdf_structure(stream)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/pypdf/_writer.py", line 1200, in _write_pdf_structure
    obj.write_to_stream(stream, key)
  File "/opt/homebrew/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 797, in write_to_stream
    stream.write(data)
OSError: [Errno 29] Illegal seek

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/bin/pdfmerge", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/pdfmerge/__main__.py", line 83, in main
    pdfmerge(
  File "/opt/homebrew/lib/python3.11/site-packages/pdfmerge/__init__.py", line 281, in pdfmerge
    with Path(output).open("wb") as stream:
OSError: [Errno 29] Illegal seek

Looks like something is wrong with the way the steam is constructed and passed to pypdf.

@metaist
Copy link
Owner Author

metaist commented Jul 24, 2023

Using ChatGTP 4.0 / CodeInterpreter

ChatGPT: The error message indicates that there is an issue with the file object being used. The OSError: [Errno 29] Illegal seek typically occurs when you try to perform a seek operation on a file object that doesn't support seeking, like a file object opened in write mode ("w"), or a network stream, for example.

The pdfmerge function is trying to open a file in write mode with Path(output).open("wb") as stream: and then trying to pass this to pypdf, which is later trying to seek in this stream.

One possible solution to this problem is to make sure that the file object supports seeking. This could mean opening the file in a mode that supports seeking, like "rb+" for reading and writing in binary mode, or using a different kind of file object that supports seeking.

However, this is more likely a bug in the pdfmerge or pypdf library. If you can't find a workaround, it may be worth reporting this issue to the maintainers of the library.

Also, as Python 3.11 is relatively new at the time of your issue, it could be that the library has not yet been fully updated to support this version of Python. You might want to try using an older version of Python to see if the issue persists.

Lastly, it's worth noting that the Python standard library has a shutil module which includes a copyfileobj function that can be used to copy the content of one file object to another. Depending on your specific needs, this might provide an alternative way to merge PDF files without relying on the pdfmerge library.


metaist: I'm the author of pdfmerge and I'm trying to fix this error. The comment about a network stream, seems interesting because this seems to happen when merging PDFs that are mounted using rclone. First, how can I detect if a stream is a network stream. Second, what are workarounds for dealing with a network stream? Should I just make a temp file and then move it when I'm done?


ChatGPT: Detecting whether a file object is a network stream can be challenging because Python's file objects don't have a built-in way to check this. In general, the best way to check whether a file is a network stream might be to attempt a seek() operation and catch the OSError if it is raised. Here's an example of how you could do this:

def is_seekable(fileobj):
    try:
        fileobj.seek(fileobj.tell())
        return True
    except OSError:
        return False

In this function, we're checking whether we can seek to the current position in the file (which should always be a no-op if seeking is supported). If this raises an OSError, we know that seeking is not supported, and we return False.

For dealing with network streams, writing to a temporary file is a common workaround. In fact, Python has a built-in library for creating temporary files and directories that you could use: tempfile. Here's an example of how you could do this:

import tempfile

def write_to_tempfile(fileobj):
    with tempfile.NamedTemporaryFile(delete=False) as temp:
        shutil.copyfileobj(fileobj, temp)
        return temp.name

In this function, we're creating a temporary file, copying the contents of fileobj into the temporary file, and then returning the name of the temporary file.

Once you have the temporary file, you can perform whatever operations you need on it (like seeking), and then move it to its final location when you're done. Just be sure to delete the temporary file when you're done with it to avoid wasting disk space. Python's os.remove() function can be used to delete files.

Of course, this is just one possible solution. Depending on your specific needs and the characteristics of your network stream, there might be other, more efficient ways to handle this situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant