Skip to content

Conversation

@shermansiu
Copy link
Contributor

In May 2024, the following request was made to loosen the versioning requirements: #971

datasets 3 was released later that year in September 2024. Can we increase the upper bound accordingly?

In May 2024, the following request was made to loosen the versioning requirements: stanfordnlp#971

`datasets` 3 was released later that year in September 2024. Can we increase the upper bound accordingly?
@okhat
Copy link
Collaborator

okhat commented Mar 10, 2025

Hey, thanks so much @shermansiu !

Q1: see

pyproject.toml changed significantly since poetry.lock was last generated. Run poetry lock to fix the lock file.

Q2: Are we sure datasets 3.0 will not just break stuff we rely on? Does it have breaking changes we need to consider?

@shermansiu
Copy link
Contributor Author

shermansiu commented Mar 12, 2025

You're welcome!

A1: Done!
A2: There are four breaking changes in datasets 3.0.0:

  1. Remove deprecated code huggingface/datasets#6996: Remove deprecated code
  • dspy does not use datasets.list_datasets, datasets.inspect_dataset, datasets.filesystems.S3FileSystem, datasets.filesystems.extract_path_from_uri, datasets.set_caching_enabled, datasets.exceptions, or datasets.utils. dspy does not use the use_auth_token, fs, or ignore_verifications, kind, num_proc, branch or format_type arguments anywhere in its codebase. It does not use the errors argument for TextConfig or the name argument for Cache. It does not use GenerateMode, get_from_cache, hash_url_to_filename, or download_custom. Many of these terms do not even appear in the dspy repo.
  1. Remove beam huggingface/datasets#6987: Removal of Apache Beam support.
  • dspy does not use Apache Beam. The only mention of the term "beam" in this repo relates to beam search, which isn't related and the implementation does not rely on Apache Beam at all.
  1. Remove metrics huggingface/datasets#6983: Removes datasets.load_metric.
  • load_metric does not appear in the dspy repository.
  1. Remove tasks huggingface/datasets#6999: Remove deprecated task argument in load_dataset(), .prepare_for_task() method, datasets.tasks module
  • None of the occurrences of datasets.load_dataset use the task argument, only path, name, split, and trust_remote_code. prepare_for_task does not appear in this repo. dspy does not use datasets.tasks.

TLDR: dspy does not use any of the stuff that was removed.

I ran poetry run pytest tests/ in a new environment with Python 3.9 and datasets 3 and all of the non-skipped and non-xfailed (expected fail) tests passed.

dspy pytests passing

@okhat
Copy link
Collaborator

okhat commented Mar 12, 2025

Thanks a lot @shermansiu ! (What an amazing level of detail in looking into this, I appreciate it.)

Will merge after checks are complete.

@okhat okhat merged commit 112d711 into stanfordnlp:main Mar 12, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants