Skip to content

How to continue my training #96

@bfhaha

Description

@bfhaha

My model was successfully training and it generated checkpoint files

ckpt-1.data-00000-of-00001
ckpt-1.index
ckpt-2.data-00000-of-00001
ckpt-2.index
ckpt-3.data-00000-of-00001
ckpt-3.index
......

Since I used Google Colab, I cannot finish the training at once. So I downloaded the latest checkpoint file before interrupting, say,

ckpt-8.data-00000-of-00001 
ckpt-8.index

Then uploaded them next time. However, the training did not start from the checkpoint and it started from 0 again (and generated checkpoints 1, 2, ... again). I have already edited the value of fine_tune_checkpoint in pipeline.config. Some issues on the Internet say that it actually trained from the checkpoint even its number is from 0. But this arises another question: If it starts from 0 every time, then the training will be endless. Does anyone know the regular method of continuing the training?

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleNo recent activity

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions