Skip to content

Conversation

scotts
Copy link
Contributor

@scotts scotts commented Apr 1, 2025

Additional test for file-like behavior; see test comments for more.

@scotts scotts requested a review from NicolasHug April 1, 2025 17:20
@scotts scotts marked this pull request as ready for review April 1, 2025 17:20
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 1, 2025
Comment on lines +937 to +938
# amount of data per frame. We also can't know the amount of data per
# frame from first principles, because it is data-depenent.
Copy link
Contributor

@NicolasHug NicolasHug Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could store the number of samples of each frames in the .json files that we check in. Not sure that would be enough if the number of reads still depends on the internal FFmpeg buffers, as you mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure that would be enough to calculate the number of bytes decoded for each frame; I think we'd be going down the path of tracking the encoded data itself, which I don't think we should do. And that still wouldn't get us the determinism we need, as there's the internal buffers.


def seek(self, offset: int, whence: int) -> bytes:
self.num_seeks += 1
return self._file.seek(offset, whence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting this so we don't forget, we should test for proper errors if the user provides a malformed object that e.g. doesn't support read() or seek() correctly. E.g. if read() doesn't exist, right now we only error:

    def create_from_file_like(
        file_like: Union[io.RawIOBase, io.BytesIO], seek_mode: Optional[str] = None
    ) -> torch.Tensor:
        assert _pybind_ops is not None
>       return _convert_to_tensor(_pybind_ops.create_from_file_like(file_like, seek_mode))
E       NotImplementedError

which isn't super informative. I guess the check can be done on the public Pythons side rather than on the ops?

Also, we should try to hit some of the error checks, e.g. catching in the tests that we properly raise https://github.com/pytorch/torchcodec/blob/f416dcf8cb709d679cd739d2a56a761d8658468f/src/torchcodec/decoders/_core/AVIOFileLikeContext.cpp#L53-L59 when doing something bad like

return self._file.read(size + 10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point. I'll put up another PR with such a test.

@scotts scotts merged commit 0a8df45 into meta-pytorch:main Apr 2, 2025
46 checks passed
@scotts scotts deleted the test_file_like branch April 2, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants