Skip to content

file_put_contents() is racy #20108

@mikkorantalainen

Description

@mikkorantalainen

php_stream_truncate_set_size(stream, 0);

Imagine following sequence (two processes A and B writing some serialized data to file called "cache.dat" using file_put_contents():

  • Process A: open the cache file for writing
  • Process B: open the same cache file for writing
  • Process A: truncate the file to zero length
  • Process B: truncate the file to zero length
  • Process A: write the cached data (e.g. 1002 byte long serialized data)
  • Process B: write the cached data (e.g. 1000 byte long serialized data)
  • Process A: close the file
  • Procees B: close the file

This results in a file with 1000 bytes from Process B and 2 extra bytes from process A and you end up with unserialize(): Extra data starting at offset 1000 of 1002 bytes if you later try to unserialize the file.

Since it seems that many developers incorrectly believe that a function called file_put_contents() would actually write a file with the given contents, I'm suggesting that PHP internal implementation should instead be (unless FILE_APPEND or LOCK_EX in flags)

  1. Create a new temporary file in the same directory with the target filename.
  2. Write the given data (argument to file_put_contents()) into said file.
  3. Close the file
  4. Rename the temporary file to final target filename.

Since only the rename() is guaranteed to be atomic in POSIX compatible systems (and even that requires rename within the same directory), this is the only way to make sure you don't end up with mixture of two files when multiple processes are calling file_put_contents() at nearly the same time. And in case NFS is used, you cannot assume that LOCK_EX actually works so you must use rename() semantics.

And maybe allow current behavior with some new flag (FILE_ALLOW_RACY?) with documentation "Reduce syscalls to improve performance but caller must guarantee that two processes are not trying to write into the same file concurrently."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions