-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Video Swin Transformer #2369
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work here 🎉 Thanks @innat! I still need to test presets.
The model looks very good overall, just some nits about exporting some layers and models.
return { | ||
"videoswin_base_kinetics400": copy.deepcopy( | ||
backbone_presets["videoswin_base_kinetics400"] | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backbone base model has more than one checkpints.
- with kinetics-400-base (current)
- with kinetics-400-base-imagenet22k
- with kinetics-600-base-imagenet22k
- with something-something-v2
How to facilitate the preset
method for all of these?
def presets(cls):
"""Dictionary of preset names and configurations."""
return {
"videoswin_base_kinetics400": copy.deepcopy(
backbone_presets["videoswin_base_kinetics400"]
),
"videoswin_base_kinetics400_imagenet22k": copy.deepcopy(
backbone_presets["videoswin_base_kinetics400_imagenet22k"]
),
...
}
Summarizing weight check. Backbones (tolerance 1e-4) Classifier (tolerance 1e-5) notebook-1 for kinetics-400 (tiny, small, base, base-imagenet22k) @tirthasheshpatel @divyashreepathihalli Note, In notebook-1, torchvision lib is used to load video-swin api and the pytorch weights they offered, whereas in notebook-2, raw official code and weights are loaded. |
ONNXI noticed others also tried to export this model to onnx format but failed and reported to the official repo, tickets. So, I tried with this implementation with torch backend and it works as expected. model = VideoClassifier(
backbone=backbone,
num_classes=num_classes,
activation=None,
pooling='avg',
)
model.eval() batch_size = 1
#Input to the model
x = torch.randn(batch_size, 32, 224, 224, 3, requires_grad=True)
torch_out = model(x) Using the torch official guideline. torch.onnx.export(
model, # model being run
x, # model input (or a tuple for multiple inputs)
"vswin.onnx",
export_params=True,
opset_version=10,
do_constant_folding=True,
input_names = ['input'], # the model's input names
output_names = ['output'], # the model's output names
dynamic_axes={
'input' : {0 : 'batch_size'},
'output' : {0 : 'batch_size'}
}
) import onnx
import onnxruntime
def to_numpy(tensor):
if tensor.requires_grad:
tensor = tensor.detach()
tensor = tensor.cpu()
numpy_array = tensor.numpy()
return numpy_array
onnx_model = onnx.load("vswin.onnx")
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession(
"vswin.onnx", providers=["CPUExecutionProvider"]
)
# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs) Logit checking. np.testing.assert_allclose(
to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets move the video_swin layers into the model folder itself. Everything else LGTM!
Sorry, could u please elaborate? |
Nope! all model specific layers should be inside the model folder. Only generic layers will go under the layers folder. |
I think the test is failling for other issue. |
Thank you for this awesome contribution!!! |
What does this PR do?
Fixes #2262
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed.