Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[video-based training] Define and implement a basic architecture #77

Open
bhigy opened this issue Feb 12, 2021 · 2 comments
Open

[video-based training] Define and implement a basic architecture #77

bhigy opened this issue Feb 12, 2021 · 2 comments

Comments

@bhigy
Copy link
Contributor Author

bhigy commented Mar 3, 2021

For the visual part:

  • [1] and [2] use features from 2D and 3D CNNs + temporal max-pooling.
  • [3] uses I3D/S3D features + global mean-pooling [+ linear transformation]
    -> I would go for something similar to this, at least for a first attempt, as it is easy to implement. Once it works, we can consider more complex approaches.

In both [1] and [2], a non-linear gating mechanism is applied on the vectors obtained from each modality.
[3] uses a special training loss that compensates for misalignments.

@bhigy
Copy link
Contributor Author

bhigy commented Jul 28, 2021

You can find the code from [1] here: https://github.com/roudimit/AVLnet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant