I have train the model for around 4 days, episode now is 14278, while the score is 40 ~ 50, what's the problem?