Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Oct 10, 2025

What does this PR do?

It should have been gone long long time ago ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to remove this example script under legacy and not to handle the removed classes ...

I would love to remove the whole legacy directory as it is mentioned

# Legacy examples

This folder contains examples which are not actively maintained (mostly contributed by the community).

Using these examples together with a recent version of the library usually requires to make small (sometimes big) adaptations to get the scripts working.

May I ? 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ArthurZucker here as well, but in my opinion it's fine to delete the folder. Everything is very old, so they are not useful examples anyway

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says, in 2020

DEPRECATION_WARNING = (
    "This dataset will be removed from the library soon, preprocessing should be handled with the 🤗 Datasets "
    "library. You can have a look at this example script for pointers: {0}"
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to rework on 6 tests, not done yet

@ydshieh
Copy link
Collaborator Author

ydshieh commented Oct 10, 2025

@Cyrilvallez If you could make the call or discuss with the other core maintainers 🙏 regarding if I can remove the file examples/legacy/run_language_modeling.py and even the whole directory examples/legacy/.

@ydshieh ydshieh changed the title Remove some custom datasets defined in codebase [don't merge yet] Remove some custom datasets defined in codebase Oct 10, 2025
@ydshieh
Copy link
Collaborator Author

ydshieh commented Oct 10, 2025

[don't merge yet]

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) add it to the v5 issue

@ydshieh
Copy link
Collaborator Author

ydshieh commented Nov 4, 2025

@SunMarc Could you take a look on this commit and let me know if you are fine with it?

The file src/transformers/data/datasets/language_modeling.py is deleted, so there is no more LineByLineTextDataset.

Sorry, forgot the link

eb3199b

@ydshieh ydshieh changed the title [don't merge yet] Remove some custom datasets defined in codebase Remove some custom datasets defined in codebase Nov 4, 2025
@ydshieh
Copy link
Collaborator Author

ydshieh commented Nov 5, 2025

OK merge now. eb3199b should be fine, but will address if there are comments provided later .

@ydshieh ydshieh enabled auto-merge (squash) November 5, 2025 17:16
@ydshieh
Copy link
Collaborator Author

ydshieh commented Nov 5, 2025

Merge !

The failing test

tests/models/paligemma2/test_modeling_paligemma2.py::PaliGemma2ForConditionalGenerationModelTest::test_prompt_lookup_decoding_matches_greedy_search

is flaky!!!

@ydshieh ydshieh disabled auto-merge November 5, 2025 17:26
@ydshieh ydshieh merged commit 1a0ae4b into main Nov 5, 2025
22 of 24 checks passed
@ydshieh ydshieh deleted the remove_datasets branch November 5, 2025 17:26
Abdennacer-Badaoui pushed a commit to Abdennacer-Badaoui/transformers that referenced this pull request Nov 10, 2025
* how bad it woud be anyway?

* let's break all

* delete

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants