Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplFileObject in spatie/fork #14587

Closed
remco-pc opened this issue Jun 16, 2024 · 5 comments
Closed

SplFileObject in spatie/fork #14587

remco-pc opened this issue Jun 16, 2024 · 5 comments

Comments

@remco-pc
Copy link

Description

Doing the following:

opening filepointers with splFileObject in the main thread and then using them multithreaded with the spatie/fork package in a child process causes current() to return an empty string after seek. this error leads to have some records missing in my framework r3m_io/framework with package r3m_io/node.

I have moved the initialization of the filepointers to the child process and now i don't have missing records.

probably because it used to seek simultaniously to a different line on the same object

should splFIleObject be used simultaniously used, so initiate once, run in childprocess parallel

PHP Version

8.3.6

Operating System

Debian 12

@Girgias
Copy link
Member

Girgias commented Jun 16, 2024

SplFileObject is not designed to work in a parallel environment. And there is nothing that can be done about it. You cannot just start multithreading stuff without careful considerations.

@remco-pc
Copy link
Author

remco-pc commented Jun 17, 2024

I am careful, just use it for read operations, should there be a note in the documentation for best practices,

@Girgias also do you know a safe way todo this in php ?

@Girgias
Copy link
Member

Girgias commented Jun 17, 2024

I am careful, just use it for read operations, should there be a note in the documentation for best practices,

@Girgias also do you know a safe way todo this in php ?

You are sharing an object which holds a pointer to a FILE across multiple threads, even for reading this is not parallel safe. You don't have two independent pointers that can move at their own pace.

"Reading" something does not make it automatically safe for parallelization, see this SO response: https://stackoverflow.com/a/25411265

If you want to do parallel processing of the content of a file, you need to load the content of it in memory, and split it up into chunks to dispatch to the various threads.

@realFlowControl
Copy link
Contributor

realFlowControl commented Jun 17, 2024

Hez @remco-pc 👋

a quick look reveals that spatie/fork is not using multithreading but pcntl_fork() to create child processes. This is something different than creating threads.

After a fork() all processes share the same file descriptor. This includes all attributes of that file descriptor, including for example the offset. So when one process moves the offset, this will be moved for all other processes that share that file descriptor as well and cause your unexpected behaviour.

I see two options for you moving forward:

  • as @Girgias mentioned: read in the data and assign chunks to the processes
  • analyse the file so you have chunk boundaries and open the file in each process after fork()-ing and pass the chunk start and end to it

Hope this helps.

EDIT: Just an FYI: this is not a PHP thing, but just how operating systems behave in case of a fork, so you'd have the same problem when doing the same thing in C or Rust

Copy link

github-actions bot commented Jul 2, 2024

No feedback was provided. The issue is being suspended because we assume that you are no longer experiencing the problem. If this is not the case and you are able to provide the information that was requested earlier, please do so. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants