-
Notifications
You must be signed in to change notification settings - Fork 64
Use more interesting videos for README experiment #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer median because it makes the default pytorch.utils.benchmark table results look identical to the charts. Mean may cause user confusion.
Also see my comment that it's not really the arith mean fps -- it's the harmonic mean
Using the higher res nasa video looks good to me
|
||
# Set the title for the subplot | ||
base_video = os.path.basename(video) | ||
base_video = os.path.basename(video).removesuffix(".mp4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should stop using os.path
, I had tried to migrate awa from those already in #249
I know this wasn't introduced in this PR but do you mind relyin on pathlib
for this?
|
||
# These are the number of uniform seeks we do in the seek+decode benchmark. | ||
num_samples = 10 | ||
video_files_paths = glob.glob(f"{videos_dir_path}/*.mp4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use pathlib.glob
here too instead of glob.glob
 | ||
|
||
The top row is a [Mandelbrot](https://ffmpeg.org/ffmpeg-filters.html#mandelbrot) video | ||
generated from FFmpeg that has a resolution of 1280x720 at 60 fps and is 120 seconds long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the codec and pixel format too if you can
|
||
The top row is a [Mandelbrot](https://ffmpeg.org/ffmpeg-filters.html#mandelbrot) video | ||
generated from FFmpeg that has a resolution of 1280x720 at 60 fps and is 120 seconds long. | ||
The bottom row is [promotional video from NASA](https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two charts look very similar. I would remove the fractal and just use the real video
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to merge as-is. Let's discuss later what we want long-term. I have a slight preference to show two videos, to give some indication that the NASA video isn't a fluke and to include a higher resolution video. Put another way, I want someone to look at both videos and think "Huh, they're about the same performance."
Before this PR, our README experiments were using a video that was just a single blue frame for 30 seconds. This PR changes the README experiment to use two different videos:
I also experimented with a third video, using FFmpeg's rgbtestsrc option. The results were roughly the same as Mandelbrot - interesting for us, and some evidence that our existing videos are not outliers, but it doesn't tell our users much new, and it adds 30% to the size of the charts. I think even the current size is maybe too big.
The other big change is that our old charts were using the median with error bars representing p25 and p75 percentile. I changed our charts to use the mean and the error bars are the standard deviation. In my experience reporting performance in computer systems, I have always used mean and standard deviation rather than the median. I think including the outliers is important. You can have a great median and terrible p90 performance.After discussing, it's easier to go with the median. We have a challenge that we're turning the times into a rate, and average that properly will take more thinking.