regression #3

raijinspecial · 2021-03-31T23:54:15Z

Beautiful work as usual, thanks for this implementation.

I'm curious if you tried using this for a regression task? I have tried using TimeSFormer without success yet, I know the signal is there because I can learn it with a small 3dcnn trained from scratch so I suspect my understanding of how and where to modify the transformer is the culprit. The output is a 1D vector with len == num_frames. Any suggestions very appreciated!

tcapelle · 2021-05-23T08:45:32Z

This is a pure code implementation, no experiments or training code or test.
I am currently using this and TimeSformer for regression, you don't need to modify anything, just set n_classes to the number of regressors, and use MSELoss.
The output of these type of models comes from the clst_oken attending to other inputs. You can see that the head is super simple:

self.mlp_head = nn.Linear(dim, num_classes)

monajalal · 2021-12-14T05:22:31Z

@tcapelle

What do you mean by "number of regressors"?

I initially had a classification based transformer code and then convert it to a regressor.

I am not sure if the following is correct? Is 1 correct here? What should I set 1 to?

        self.mlp_head = nn.Sequential(
            nn.LayerNorm(emb_dim),
            nn.Linear(emb_dim, 1) # is this 1 correct for regression? 
        )

Previously, it was:
nn.Linear(emb_dim, num_classes)

Taimoor-R · 2023-01-03T23:10:43Z

@tcapelle

What do you mean by "number of regressors"?

I initially had a classification based transformer code and then convert it to a regressor.

I am not sure if the following is correct? Is 1 correct here? What should I set 1 to?
        self.mlp_head = nn.Sequential(
            nn.LayerNorm(emb_dim),
            nn.Linear(emb_dim, 1) # is this 1 correct for regression? 
        )
Previously, it was: nn.Linear(emb_dim, num_classes)

Hi did you figure out how to use timesformer for regression tasks as i am trying to do the same but have found no luck

tcapelle · 2023-01-04T12:04:39Z

Yeah, that's it!
You will put as many outputs as variables to regress. If you have only one-dimensional regression, then 1 is it.
My only take away, is that most regression problems can be converted to classification problems by binning the outputs.
Instead of predicting the price of a good in, let's say, a range of[0,100], you will predict the probability of the value to be in bins:

[0,10], [10,20], ..., [90,100]
This way you get a probabilistic model that can be trained with standard cross entropy loss. It's a very useful trick.
The tricky part is creating a data pipeline to train this model; good luck 👍 .

Taimoor-R · 2023-01-05T19:05:13Z

Yeah, that's it! You will put as many outputs as variables to regress. If you have only one-dimensional regression, then 1 is it. My only take away, is that most regression problems can be converted to classification problems by binning the outputs. Instead of predicting the price of a good in, let's say, a range of[0,100], you will predict the probability of the value to be in bins:

[0,10], [10,20], ..., [90,100]

This way you get a probabilistic model that can be trained with standard cross entropy loss. It's a very useful trick.
The tricky part is creating a data pipeline to train this model; good luck 👍 .

Thank you for the quick response, so lets say that I am hoping to use the pretrained timesformer model for regression instead of classification, for example using negative pearson loss, and each frame of the video having a unique numeric label/ground truth. So essentially the training data would be a 60 sec video broken into frames with corrsponding values/ labels for each frame. So in this case the we will only have a 1 dimensional regression am I right?

tcapelle · 2023-01-05T19:18:47Z

Thank you for the quick response, so let's say that I am hoping to use the pre-trained timesformer model for regression instead of classification, for example, using negative Pearson loss, and each frame of the video has a unique numeric label/ground truth. So essentially, the training data would be a 60-sec video br

I think that TimeSformer expects a fat tensor of the type:

frames = torch.randn(2, 5, 3, 256, 256) # (batch x frames x channels x height x width)

So you have to construct a dataloader that generates this. When I used these models I trained from scratch. So I was not carefully checking what input the model expects, I used the model as an architecture.

For training, construct a dataloader that, for each batch of videos, gives you a batch of values. How you label this snippets of video (you will have to subsample or reduce the input size, as the model cannot ingest inputs that are too long).
I was training using 10 frames of video that came from a camera with one image per minute, so a 10-minute sequence and estimating the average movement speed. So I predicted one value for this 10-second tensor (bs, 10, 128, 128).

I hope that clarifies the strategy to follow.

Another quick tip, you can create a super simple dataloader by stacking the full video together and then just slicing randomly on it; here you have an example

Taimoor-R · 2023-01-05T20:38:24Z

Thank you so much for the quick and detailed responce, I am sorry for asking so many questions I am new to the whole video transformer domain. I just have a follow up question so my dataloader looks something like this

Containing video frames and corresponding to them pulse signal.
Frames are put in 4D tensor with size [c x d x w x h]

train_loader = torch.utils.data.DataLoader(
pulse(with pulse containing (frames, labels)),
batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True, sampler=sampler)

tcapelle · 2023-01-05T20:48:05Z

Hope this clarifies my idea:

Taimoor-R · 2023-01-09T12:41:06Z

@tcapelle hi thanks for all the help regarding the data loader, I am sorry to bother yet again. I was having some trouble understadning where this issue arises from and why it arises as the only thing I changed is the dataloaders.

Taimoor-R · 2023-01-09T13:52:23Z

I have pin-pointed where the issue is it seems like my traindataloader doesnt have the values in bold for cur_iter, (inputs, labels,### _, meta) in enumerate(train_loader). I dont understand how to reslove this though as i am not using their dataloaders. The dataloader i am using works in the following way where the pulse_3d returns: sample = (frames, labels)

tcapelle · 2023-01-09T14:03:35Z

Sorry, I can't help you with this. Maybe ask on the PyTorch forums?

Taimoor-R · 2023-01-09T14:09:56Z

I will try asking there but i dont think its a pytorch issue is it? I beleive it comes from the dataloader apperently the dataloader should contain inputs, labels, _,meta as seen in the following snippet from the train_net.py(TimeSformer)

tcapelle · 2023-01-09T16:14:40Z

sorry, don't know.

Taimoor-R · 2023-01-09T16:25:48Z

sorry, don't know.

Thank you for all the help, just a tiny follow up for the TimeSformer did you use the code provided by facebook or did you manage to find some other script

tcapelle · 2023-01-09T16:59:45Z

I used @lucidrains implementation

Taimoor-R · 2023-01-09T17:02:00Z

But @lucidrains implementation doesnt have a trainer code does it?

Taimoor-R · 2023-01-29T13:05:50Z

hi @tcapelle using TimeSformer(orange line) for regression commapred to 3D CNN(pink line) my results are quite weird. I am adding a screen shot of the loss(MSE)-epoch graph for training and validation. Note: Each video is broken into chuncks of 32 conseutive frames each with their corresponding gt values. The model predicts 1 value per frame fed so for 32 frames it outputs 32 values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression #3

regression #3

raijinspecial commented Mar 31, 2021

tcapelle commented May 23, 2021 •

edited

monajalal commented Dec 14, 2021

Taimoor-R commented Jan 3, 2023

tcapelle commented Jan 4, 2023

Taimoor-R commented Jan 5, 2023

tcapelle commented Jan 5, 2023 •

edited

Taimoor-R commented Jan 5, 2023

tcapelle commented Jan 5, 2023

Taimoor-R commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

tcapelle commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

tcapelle commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

tcapelle commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

Taimoor-R commented Jan 29, 2023 •

edited

regression #3

regression #3

Comments

raijinspecial commented Mar 31, 2021

tcapelle commented May 23, 2021 • edited

monajalal commented Dec 14, 2021

Taimoor-R commented Jan 3, 2023

tcapelle commented Jan 4, 2023

Taimoor-R commented Jan 5, 2023

tcapelle commented Jan 5, 2023 • edited

Taimoor-R commented Jan 5, 2023

tcapelle commented Jan 5, 2023

Taimoor-R commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

tcapelle commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

tcapelle commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

tcapelle commented Jan 9, 2023

Taimoor-R commented Jan 9, 2023

Taimoor-R commented Jan 29, 2023 • edited

tcapelle commented May 23, 2021 •

edited

tcapelle commented Jan 5, 2023 •

edited

Taimoor-R commented Jan 29, 2023 •

edited