Rewrite guides for fine-tuning with Datasets #13923

stevhliu · 2021-10-07T20:23:27Z

What does this PR do?

This PR updates the current documentation for Fine-tuning with custom datasets. It removes the custom code for loading a dataset in favor of using the Datasets library for loading and preprocessing a dataset. The new guide also introduces the Keras method for compiling and fitting a model instead of using TFTrainingArguments and TFTrainer.

⚠️ The TensorFlow example for Question Answering still needs a little work. It returns ValueError: No gradients provided for any variable when calling model.fit. @Rocketknight1 has provided a temporary solution where we set dummy_labels=True in tf_to_dataset.

This notebook contains all the code examples shown in the guide.

To do:

Update the TensorFlow code example for question answering once we have a better solution.

sgugger

Thanks a lot for working on this! I have a general comment on the code samples. The >>> are usually used to go with the doctest package, but it also requires to have ... on all the lines not having >>>, which is not the case here.

We should either pick if we want this guide to be enabled for the doctests (in which case we should add the ...) or not (in which case the >>> just hurt readability and should be removed).

docs/source/custom_datasets.rst

LysandreJik

Nice! The content looks great to me. I think there's a good opportunity to do something a bout doctests here, which would tremendously help in maintaining this guide.

We have setup doctests for a few files but we'll need to enable them for more files - @stevhliu let us know if you'd like us to walk you through how these work and how these are setup so that your work may be tested, which should greatly reduce the maintenance cost down the road.

LysandreJik · 2021-11-09T19:12:55Z

Thank you, @stevhliu!

rewrite guides for fine-tuning with datasets

f3a5b4a

stevhliu added the Documentation label Oct 7, 2021

stevhliu requested review from Rocketknight1, LysandreJik and sgugger October 7, 2021 20:23

sgugger reviewed Oct 7, 2021

View reviewed changes

docs/source/custom_datasets.rst Outdated Show resolved Hide resolved

docs/source/custom_datasets.rst Outdated Show resolved Hide resolved

docs/source/custom_datasets.rst Outdated Show resolved Hide resolved

docs/source/custom_datasets.rst Outdated Show resolved Hide resolved

LysandreJik approved these changes Oct 12, 2021

View reviewed changes

simple qa code example

7a04097

stevhliu requested a review from sgugger October 15, 2021 20:40

huggingface deleted a comment from github-actions bot Nov 9, 2021

LysandreJik mentioned this pull request Nov 9, 2021

Your example code for WNUT NER produces array indexing ValueError #7937

Closed

4 tasks

Steven added 2 commits November 9, 2021 08:54

use anonymous rST links

b8f5abc

style

5fb6874

stevhliu marked this pull request as ready for review November 9, 2021 18:56

Merge branch 'master' into finetune-datasets-guide

74c5d61

LysandreJik merged commit e4d8f51 into huggingface:master Nov 9, 2021

stevhliu deleted the finetune-datasets-guide branch November 9, 2021 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite guides for fine-tuning with Datasets #13923

Rewrite guides for fine-tuning with Datasets #13923

stevhliu commented Oct 7, 2021

sgugger left a comment

LysandreJik left a comment

LysandreJik commented Nov 9, 2021

Rewrite guides for fine-tuning with Datasets #13923

Rewrite guides for fine-tuning with Datasets #13923

Conversation

stevhliu commented Oct 7, 2021

What does this PR do?

To do:

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik commented Nov 9, 2021