Skip to content

Conversation

NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Oct 23, 2024

This PR:

  • Adds a bunch of input checks when creating a Frame or a FrameBatch. We enforce Frame data to be 3D, and FrameBatch data to be >= 4D. We also ensure consistency of leading dimensions between data, pts_seconds and duration_seconds
  • Adds indexing support for FrameBatch. This supports pytorch fancy indexing naturally and intuitively. Note that indexing a 4D FrameBatch returns a Frame.
  • Adds iteration support for FrameBatch. This removes the "tuple unpacking" behavior that we had for FrameBatch, but luckily this is not something we have been using at all. The unpacking behavior of Frame is preserved.

These changes are mostly necessary in order for us to change the output of the samples from List[FrameBatch(4D)] to FrameBatch (5D), as done in #284

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 23, 2024
@NicolasHug NicolasHug marked this pull request as ready for review October 23, 2024 13:38
Comment on lines +97 to +108
if self.data.ndim == 4:
return Frame(
data=data,
pts_seconds=float(pts_seconds.item()),
duration_seconds=float(duration_seconds.item()),
)
else:
return FrameBatch(
data=data,
pts_seconds=pts_seconds,
duration_seconds=duration_seconds,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor has a .item() method for returning the underlying dtype.

Should we have something like that here? i.e. always return a FrameBatch but return a Frame if .item() is called?

Copy link
Contributor Author

@NicolasHug NicolasHug Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always return a FrameBatch but return a Frame if .item() is called?

I feel like this is what we're already doing, but perhaps I'm misunderstanding?

BTW, this quirk is only needed for mypy (sigh). Originally the code was simpler:

        cls = Frame if self.data.ndim == 4 else FrameBatch
        return cls(
            self.data[key],
            self.pts_seconds[key],
            self.duration_seconds[key],
        )

and everything was fine, and the Frame would get proper float value because of what we do in its post_init. But mypy was complaining so I had to go for this in 4661237 (#283)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is what we're already doing, but perhaps I'm misunderstanding?

Don't we return a Frame for the special case of dimensions=4?

What I am saying is we should return a FrameBatch even in that case (of size 1). So we are consistent with Tensor

I'll leave it to you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a super strong preference on this - let me open an issue so we can discuss during one of the meetings

@NicolasHug NicolasHug merged commit b841eb3 into meta-pytorch:main Oct 24, 2024
24 checks passed
@NicolasHug NicolasHug deleted the frame_improvements branch October 24, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants