Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL Tuning Tips #67

Open
smorad opened this issue Sep 7, 2022 · 5 comments
Open

RL Tuning Tips #67

smorad opened this issue Sep 7, 2022 · 5 comments

Comments

@smorad
Copy link

smorad commented Sep 7, 2022

I'm currently writing a recurrent reinforcement library, with LSTMs, linear attention, etc that I would like to add S4 to.
Unfortunately, I find S4D unable to learn in even simple RL tasks (e.g. output the input from 4 timesteps ago).

Do you know of any configurations or tips for making S4/S4D models smaller/more robust? LR is already set very low (1e-5) and I'm using enormous batch sizes (65536 transitions per batch). Other recurrent models are able to learn, but not S4. This is what I have thus far:

class S4Model(nn.Module):
    def __init___(...):
        ...
        self.core = S4Model(
            d_model=self.h, # This is usually on the order of ~10-20
            d_state=self.h,
            mode="diag",
            measure="diag-lin",
            bidirectional=False,
            disc="zoh",
            real_type="exp",
            transposed=False,
        )
        self.core.setup_step()

...

def forward(self, z, state):
    batch, time, feature = z.shape
    if time == 1:
        # Rollout/inference (recurrent mode)
        z, state = self.core.step(z.reshape(batch, time))
    else:
        # Train (batch mode)
        z, _ = self.core(z)
    return z, state
@albertfgu
Copy link
Contributor

It's hard for me to say more without understanding the setting more completely. At a glance, the model seems to be set up correctly. Some things I notice are:

  • Did you intend to use only 1 layer? Generally we always use deeper models with residual connections. Or is your code just a snippet of 1 layer from a bigger model?
  • This model looks very tiny, at most a few thousand parameters. The model dimensions are very small compared to what we normally use (e.g. d_model=256 d_state=64). Are the baseline models of a similar size? Can LSTM and linear attention solve these tasks with ~1k parameters and 1 layer?
  • Is training done with convolution or recurrent mode? The way the code is currently written, setup_step is meant to be called after training and only for inference; I've never called it in the init call (I believe it shouldn't affect anything, but not 100% sure right now). Is training and evaluation both done with the same mode for consistency?
  • Why does the LR need to be set so slow? Is training unstable with higher LR? I've almost always found that S4 can get away with higher LR than LSTM and Attention (I can't think of any exceptions actually) so I'm not sure what would be causing instabilities.

@mrsamsami
Copy link

Based on your comments, I trained S4 models with more layers and a larger d_model and d_state. I used convolution mode to update the parameters and recurrent mode to generate and collect trajectories. So far, there has been no significant improvement, and it is still worse than MLP. Do you have any other suggestions?

@albertfgu
Copy link
Contributor

It's very hard for me to say without knowing more details about the problem (and I'm not an expert in RL). If MLP is doing well, perhaps the problem is Markovian and doesn't need a sequence model at all. Another sanity check could be to try other recurrent baselines such as an LSTM/GRU core, which should have a similar interface to S4(D). I know of another RL project using S4(D)/S5 where they found that it was consistently much better than an LSTM, so this type of baseline could reveal if the discrepancy is in the problem setup (e.g. if you find MLP is better than any sequence model) or in the S4 usage specifically (e.g. if you find LSTM is better than S4)

@MichaelFYang
Copy link

Hi Albert, given the situation that people are trying S4D in the RL settings, would it be possible to provide or extend the current minimal s4d.py to also have step function to be able to run(train) in the RNN model, in case people are doing it wrong?

@albertfgu
Copy link
Contributor

The minimal file is purposefully minimal. The documentation explains the additional features available in the main module and how to use them. For example step code: https://github.com/HazyResearch/state-spaces/blob/main/models/s4/s4.py#L1197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants