Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keep_linebreaks parameter to text loader #1913

Merged
merged 1 commit into from
Feb 19, 2021

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Feb 19, 2021

As asked in #870 and huggingface/transformers#10269 there should be a parameter to keep the linebreaks when loading a text dataset.
cc @sgugger @jncasey

@sgugger
Copy link
Contributor

sgugger commented Feb 19, 2021

Just so I understand how it can be used in practice, do you have an example showing how to load a text dataset with this option?

@lhoestq
Copy link
Member Author

lhoestq commented Feb 19, 2021

Sure ! Here is an example:

from datasets import load_dataset

load_dataset("text", keep_linebreaks=True, data_files=...)

I'll update the documentation to explain this

@sgugger
Copy link
Contributor

sgugger commented Feb 19, 2021

Perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants