## RNN Problems â€” Long-Term Dependencies & Gradient Issues

Based on the referenced video, this note summarises the core problems affecting standard Recurrent Neural Networks (RNNs) and practical mitigations.

**Summary:** Standard RNNs struggle with long sequences due to vanishing and exploding gradients, which cause long-term dependencies to be lost or training to become unstable.

### 1. Long-Term Dependencies
RNNs can fail when relevant information occurs many time steps before it is needed. Example: to predict "Marathi" from the sentence "I grew up in Maharashtra ... I speak fluent ...", the network must retain "Maharashtra" across the sequence.

### 2. Vanishing Gradient
- Cause: During Backpropagation Through Time (BPTT), repeated multiplications of derivatives smaller than 1 make gradients shrink exponentially.
- Effect: Contributions from the distant past vanish; the network learns mainly short-term dependencies.
- Remedies:
  - Use ReLU (or ReLU variants) instead of tanh/sigmoid where appropriate.
  - Use careful weight initialization (identity-like, Xavier/Glorot) and scaling.
  - Add skip connections or use architectures designed for long-term memory.

### 3. Exploding Gradient
- Cause: Repeated multiplications of derivatives greater than 1 make gradients grow exponentially.
- Effect: Gradients become extremely large (or NaN), destabilising training.
- Remedies:
  - Gradient clipping (clip-by-norm or clip-by-value).
  - Truncated BPTT: limit backpropagation to a fixed window of steps.
  - Use smaller learning rates and proper initialization.

### Conclusion
Because of these issues, standard RNNs are often replaced by LSTM or GRU architectures that are designed to handle long-term dependencies more robustly.




### **Problem of long term dependency**
- Here the short term dominates and the long terms dependency will have very low contribution
- ![image.png](attachment:image.png)
- ![image-2.png](attachment:image-2.png)
- ![image-3.png](attachment:image-3.png)