Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating manifest.mpd truncates the file before writing it again, letting the web server serve empty file #186

Closed
hariszukanovic opened this issue Jan 4, 2017 · 8 comments
Labels
status: archived Archived and locked; will not be updated type: bug Something isn't working correctly

Comments

@hariszukanovic
Copy link

It seems that shaka packager uses following sequence of file operations when it updates the manifest.mpd

  1. open file
  2. truncate file
  3. write file (in chunks)
  4. close file

This seems to cause manifest.mpd (right after it is truncated) to have no content (size 0) for a brief period of time. If during this short period it happens to actually get served by the web server, it arrives to the player as an empty response.
Some players have great issues with this and stop playing when such empty response is received.

While I am uncertain as to why updating the manifest.mpd is done in this particular way, and if this should be resolved otherwise and/or elsewhere... I would expect the packager to ensure that the this update is done in any kind of transactional manner to avoid serving empty and/or incompletely written files.

Please let me have your comments and any suggestions on how to fix this...

Below the output of ab test which shows that sometimes the response is actually empty. By watching closely I could verify that this happens approx every 10 seconds, more or less each time and exactly when the manifest.mpd is written to.

ab -v3 -c 8 -n 900000 http://192.168.0.103/CT_1_001/manifest.mpd | grep --line-buffered "Length: 0" | ts
Completed 100000 requests
Jan 04 18:32:38 Content-Length: 0
Jan 04 18:32:38 Content-Length: 0
Jan 04 18:32:38 Content-Length: 0
Jan 04 18:32:38 Content-Length: 0
Jan 04 18:32:39 Content-Length: 0
Jan 04 18:32:39 Content-Length: 0
Jan 04 18:32:39 Content-Length: 0
Jan 04 18:32:39 Content-Length: 0
Completed 200000 requests
Completed 300000 requests
Completed 400000 requests
Jan 04 18:32:48 Content-Length: 0
Jan 04 18:32:48 Content-Length: 0
Jan 04 18:32:48 Content-Length: 0
Jan 04 18:32:48 Content-Length: 0
Jan 04 18:32:48 Content-Length: 0
Jan 04 18:32:49 Content-Length: 0
Jan 04 18:32:49 Content-Length: 0
Jan 04 18:32:49 Content-Length: 0
Jan 04 18:32:49 Content-Length: 0
Completed 500000 requests
Completed 600000 requests
Completed 700000 requests
Jan 04 18:32:58 Content-Length: 0
Jan 04 18:32:58 Content-Length: 0
Jan 04 18:32:58 Content-Length: 0
Jan 04 18:32:59 Content-Length: 0
Jan 04 18:32:59 Content-Length: 0
Completed 800000 requests
Completed 900000 requests

@kqyang
Copy link
Collaborator

kqyang commented Jan 5, 2017

@hariszukanovic Thanks for bringing up this problem. That is definitely something should be improved.

Right now shaka-packager writes to the target output mpd directly, so it needs to truncate the file first before overwriting the mpd with new contents every time mpd is updated.

One possible solution to the problem is to write to a temporary file first. Here is the process:

  1. Write to a temp file.
  2. Unlink the target file.
  3. Rename the temp file to the target file.

The web-server will be able to serve the file successfully most of the time (before step 2 and after step 3). There will still be a tiny window that the file is not accessible (between step 2 and step 3); but step 3 should be very fast on most operating systems. This minimizes the unavailability window. It is the smallest possible gap we could achieve.

For the time being, you may achieve the same result by using cron jobs:

  1. Packager writes mpd to temp file 1
  2. Cron job runs the below operations periodically
    2.1 Copy temp file 1 to temp file 2
    2.2 Check temp file 2, if it is valid (not empty), unlink the target file and rename temp file 2 to the target file

@kqyang kqyang added the type: enhancement New feature or request label Jan 5, 2017
@zcalusic
Copy link

zcalusic commented Jan 5, 2017

This is a bad bad race condition and you label it as "enhancement"?

The proposed solution also has a race condition (between steps 2. & 3. the web server serving the segments would serve 404 File not found). The proper solution is (and has been since 1970.):

  1. Write to a temp file
  2. (Atomically) rename the temp file to the target file

man 2 rename clearly states: "If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing."

So, NO unlink(), rename() will take care of it.

@hariszukanovic
Copy link
Author

hariszukanovic commented Jan 5, 2017

@zcalusic thank you for the solution. It works like a charm :)
My ab test as specified above does not receive any 0-byte responses anymore.

I'll paste here my (simplest) version of the solution. It will, of course, need adjustment to properly incorporate into shaka packager. Hope you can use it as a reference...

bool MediaPlaylist::WriteToFile(const char *output_path) {
....
std::string tmp_str = output_path;
tmp_str += ".tmp";
const char * tmp_output_path = tmp_str.c_str();

int tmp_fd = open(tmp_output_path, O_CREAT|O_WRONLY|O_TRUNC, 0666);
LOG(INFO) << "open:" << tmp_fd;
if (errno) {
LOG(ERROR) << "Error:" << strerror(errno);
}
ssize_t written = 0;
written = write(tmp_fd, content.data(), content.size());
LOG(INFO) << "Written:" << written;
if (written != (ssize_t)content.size()){
LOG(ERROR) << "written != content.size(): " << written << "!=" << content.size();
}
if (errno) {
LOG(ERROR) << "Error:" << strerror(errno);
}

close(tmp_fd);
// ...and
rename(tmp_output_path,output_path);

@zcalusic
Copy link

zcalusic commented Jan 5, 2017

Yeah @hariszukanovic, that should fix it.

Just watch out for that fsync() call, it is not strictly necessary in that place and it changes semantics additionaly, beyond fixing race condition.

fsync() requests that all writes to a file descriptor are securely on the disk plates before returning, and that can be quite slow, so it can affect performance severely on moderately busy machines.

Someone might notice that manifests are quite small, and that it shouldn't matter much, but even modern OS-es still have issues with fsync() and in some cases decide to flush ALL dirty buffers to disk even when fsync() of a single file descriptor is attempted. And that can take a long time to finish...

@kqyang kqyang added type: bug Something isn't working correctly and removed type: enhancement New feature or request labels Jan 5, 2017
@kqyang
Copy link
Collaborator

kqyang commented Jan 5, 2017

Hey @zcalusic, thanks for pointing out that rename in Linux is atomic. That will fix the issue in my proposed solution. And yes, I should have included "bug" label as well. (I labeled it as "enhancement" as I wasn't aware that the rename was atomic and thought a race was inevitable and the problem had to be addressed in the player - apparently I was wrong)

@hariszukanovic Are you generating DASH or HLS streams? For DASH, you should update bool WriteMpdToFile(const std::string& output_path, MpdBuilder* mpd_builder) in mpd_notifier_util.cc instead. We'll work on a solution for this problem.

@hariszukanovic
Copy link
Author

@zcalusic, thanx.
Indeed, it works flawlessly without the fsync() call.

@hariszukanovic
Copy link
Author

Please note that the error handling in the above example is quite incorrect. I will try to post an update once I have a final version of it

@kqyang
Copy link
Collaborator

kqyang commented Jan 9, 2017

@hariszukanovic You are welcomed to submit a pull request to address this issue! You can add a utility function WriteFileAtomically in packager/media/file/file_util.h and use the function when writing atomically is needed, e.g. during MPD update and HLS manifest update.

@shaka-bot shaka-bot added the status: archived Archived and locked; will not be updated label Apr 19, 2018
@shaka-project shaka-project locked and limited conversation to collaborators Apr 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

4 participants