Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat support continue training #668

Merged
merged 9 commits into from
Jan 30, 2023
Merged

Conversation

guenthermi
Copy link
Member

@guenthermi guenthermi commented Jan 26, 2023

Add support for continue the training

This PR enables to continue the training form an existing artifact of a fine-tuned model

train_data = 'path/to/another/data.csv'

run2 = finetuner.fit(
    model='efficientnet_b0',
    train_data=train_data,
    model_artifact=run.artifact_id,
)
print(f'Run name: {run2.name}')
print(f'Run status: {run2.status()}')

Since we need the model name to detect the task in order to construct the training dataset correctly, users still need to add the model property to the fit function.

  • This PR references an open issue
  • I have added a line about this change to CHANGELOG

@guenthermi
Copy link
Member Author

Tests are failing because I did not commit any requirements change for commons and stubs yet.

@guenthermi guenthermi marked this pull request as ready for review January 26, 2023 16:56
@github-actions github-actions bot added the area/testing This issue/PR affects testing label Jan 27, 2023
@guenthermi
Copy link
Member Author

I added a pre-release to see if the tests are running. Everything seems to work as expected. One test is failing because loss-optimizers are not added yet. This should work when #664 is merged.

@guenthermi guenthermi requested review from gmastrapas, LMMilliken and bwanglzu and removed request for gmastrapas and LMMilliken January 27, 2023 08:26
Copy link
Member

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! while why CI is failing?

train_data = 'path/to/another/data.csv'

run2 = finetuner.fit(
model='efficientnet_b0',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need the model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay now i see the hint, do i understand correly that the model type must be identical to artifact model type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the model name to detect the task, so the task they are used for must be equal, so basically yes. If you use a docarray dataset, it probably does not matter

Copy link
Contributor

@LMMilliken LMMilliken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the pr for the new pooling and loss options is merged, everything should pass.
LGTM!

Comment on lines 59 to 61
:class: hint
When you want to continue training, you still need to provide the `model` parameter
beside the `model_artifact` parameter for Finetuner to correctly configure the new run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:class: hint
When you want to continue training, you still need to provide the `model` parameter
beside the `model_artifact` parameter for Finetuner to correctly configure the new run.
:class: hint
When you want to continue training, you still need to provide the `model` parameter
as well as the `model_artifact` parameter for Finetuner to correctly configure the new run.

Comment on lines -240 to +242
name=model,
name=model if not kwargs.get(MODEL_ARTIFACT) else None,
artifact=kwargs.get(MODEL_ARTIFACT),
Copy link
Contributor

@LMMilliken LMMilliken Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, this means that if there is a MODEL_ARTIFACT, then name is set to None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because George implemented the jsonschema in a way that both arguments are mutually exclusive, which makes somehow sense, because otherwise it would be a little bit hard to see what will happen, e.g., if a new model is constructed or the artifact is used. So in this way, the final config to be send to the API is more clear on what a run should do.

@guenthermi
Copy link
Member Author

LGTM! while why CI is failing?

as said in the comment above, the stubs version includes already the ArcFace changes and specifically, the loss-optimizers etc. The tests therefore failing but it will be addressed by Louis PR. So I need to merge main into this branch here after Louis finished its PR then it should not fail anymore

@github-actions
Copy link

📝 Docs are deployed on https://ft-feat-support-continue-training--jina-docs.netlify.app 🎉

@guenthermi guenthermi merged commit fea1263 into main Jan 30, 2023
@guenthermi guenthermi deleted the feat-support-continue-training branch January 30, 2023 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

continuous training using finetuner
4 participants