Skip to content

Conversation

wengh
Copy link
Collaborator

@wengh wengh commented Feb 13, 2025

This change allows use cases such as:

  • reading a dataset on a PySpark version that doesn't have DataSourceArrowWriter
  • writing a dataset to HuggingFace without datasets installed

@wengh wengh marked this pull request as ready for review February 13, 2025 02:43
@wengh wengh requested a review from lhoestq February 13, 2025 02:43
@lhoestq
Copy link
Member

lhoestq commented Feb 13, 2025

Cool ! always nice to defer imports

I think we should keep those dependencies mandatory though, who does a package for which you have to install more packages to use its main feature ? ^^'

btw datasets and huggingface_hub are often installed together (datasets does require huggingface_hub)

@wengh
Copy link
Collaborator Author

wengh commented Feb 13, 2025

I think we should keep those dependencies mandatory though, who does a package for which you have to install more packages to use its main feature ? ^^'

Yeah that makes sense. I changed pyproject to require both packages. Thanks for the correction!

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm !

@wengh wengh merged commit 580864e into main Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants