Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite guides for fine-tuning with Datasets #13923

Merged
merged 5 commits into from
Nov 9, 2021

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Oct 7, 2021

What does this PR do?

This PR updates the current documentation for Fine-tuning with custom datasets. It removes the custom code for loading a dataset in favor of using the Datasets library for loading and preprocessing a dataset. The new guide also introduces the Keras method for compiling and fitting a model instead of using TFTrainingArguments and TFTrainer.

⚠️ The TensorFlow example for Question Answering still needs a little work. It returns ValueError: No gradients provided for any variable when calling model.fit. @Rocketknight1 has provided a temporary solution where we set dummy_labels=True in tf_to_dataset.

This notebook contains all the code examples shown in the guide.

To do:

  • Update the TensorFlow code example for question answering once we have a better solution.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this! I have a general comment on the code samples. The >>> are usually used to go with the doctest package, but it also requires to have ... on all the lines not having >>>, which is not the case here.

We should either pick if we want this guide to be enabled for the doctests (in which case we should add the ...) or not (in which case the >>> just hurt readability and should be removed).

docs/source/custom_datasets.rst Outdated Show resolved Hide resolved
docs/source/custom_datasets.rst Outdated Show resolved Hide resolved
docs/source/custom_datasets.rst Outdated Show resolved Hide resolved
docs/source/custom_datasets.rst Outdated Show resolved Hide resolved
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The content looks great to me. I think there's a good opportunity to do something a bout doctests here, which would tremendously help in maintaining this guide.

We have setup doctests for a few files but we'll need to enable them for more files - @stevhliu let us know if you'd like us to walk you through how these work and how these are setup so that your work may be tested, which should greatly reduce the maintenance cost down the road.

@stevhliu stevhliu marked this pull request as ready for review November 9, 2021 18:56
@LysandreJik LysandreJik merged commit e4d8f51 into huggingface:master Nov 9, 2021
@LysandreJik
Copy link
Member

Thank you, @stevhliu!

@stevhliu stevhliu deleted the finetune-datasets-guide branch November 9, 2021 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants