Skip to content

1.5B coder model reproduction #130

@yesiam-png

Description

@yesiam-png

Hi, thanks for your excellent work! When I tried to train with Qwen-2.5-Coder 1.5B using your 14b_16k recipe, I found that the minimum response length is very small, and the response length & LCB score significantly decrease after certain steps. Could you help me understand the possible reasons? If possible, could you share a recipe for your 1.5B deepseek-qwen coder model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions