Skip to content

Commit

Permalink
Update docs on trust_remote_code defaults to False (#6981)
Browse files Browse the repository at this point in the history
* Set trust_remote_code defaults to False in docstrings

* Replace warning tip with version added in docstrings

* Update docs

* Rephrase

* Fix typo
  • Loading branch information
albertvillanova committed Jun 19, 2024
1 parent 1d65718 commit a6ccf94
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 32 deletions.
2 changes: 1 addition & 1 deletion docs/source/dataset_script.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ as long as your dataset repository has a [required structure](./repository_struc

<Tip warning=true>

In the next major release, the new safety features of 馃 Datasets will disable running dataset loading scripts by default, and you will have to pass `trust_remote_code=True` to load datasets that require running a dataset script.
For security reasons, 馃 Datasets do not allow running dataset loading scripts by default, and you have to pass `trust_remote_code=True` to load datasets that require running a dataset script.

</Tip>

Expand Down
4 changes: 2 additions & 2 deletions docs/source/load_hub.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ Certain datasets repositories contain a loading script with the Python code used
Those datasets are generally exported to Parquet by Hugging Face, so that 馃 Datasets can load the dataset fast and without running a loading script.

Even if a Parquet export is not available, you can still use any dataset with Python code in its repository with `load_dataset`.
All files and code uploaded to the Hub are scanned for malware (refer to the Hub security documentation for more information), but you should still review the dataset loading scripts and authors to avoid executing malicious code on your machine. You should set `trust_remote_code=True` to use a dataset with a loading script, or you will get a warning:
All files and code uploaded to the Hub are scanned for malware (refer to the Hub security documentation for more information), but you should still review the dataset loading scripts and authors to avoid executing malicious code on your machine. You should set `trust_remote_code=True` to use a dataset with a loading script, or you will get an error:

```py
>>> from datasets import get_dataset_config_names, get_dataset_split_names, load_dataset
Expand All @@ -120,6 +120,6 @@ All files and code uploaded to the Hub are scanned for malware (refer to the Hub

<Tip warning=true>

In the next major release, the new safety features of 馃 Datasets will disable running dataset loading scripts by default, and you will have to pass `trust_remote_code=True` to load datasets that require running a dataset script.
For security reasons, 馃 Datasets do not allow running dataset loading scripts by default, and you have to pass `trust_remote_code=True` to load datasets that require running a dataset script.

</Tip>
8 changes: 4 additions & 4 deletions src/datasets/hub.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,15 @@ def convert_to_parquet(
`<org>/<dataset_name>`.
revision (`str`, *optional*): Branch of the source Hub dataset repository. Defaults to the `"main"` branch.
token (`bool` or `str`, *optional*): Authentication token for the Hugging Face Hub.
trust_remote_code (`bool`, defaults to `True`): Whether you trust the remote code of the Hub script-based
trust_remote_code (`bool`, defaults to `False`): Whether you trust the remote code of the Hub script-based
dataset to be executed locally on your machine. This option should only be set to `True` for repositories
where you have read the code and which you trust.
<Tip warning={true}>
<Changed version="2.20.0">
`trust_remote_code` will default to False in the next major release.
`trust_remote_code` defaults to `False` if not specified.
</Tip>
</Changed>
Returns:
`huggingface_hub.CommitInfo`
Expand Down
54 changes: 29 additions & 25 deletions src/datasets/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -1749,18 +1749,19 @@ def dataset_module_factory(
Directory to read/write data. Defaults to `"~/.cache/huggingface/datasets"`.
<Added version="2.16.0"/>
trust_remote_code (`bool`, defaults to `True`):
trust_remote_code (`bool`, defaults to `False`):
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
should only be set to `True` for repositories you trust and in which you have read the code, as it will
execute code present on the Hub on your local machine.
<Tip warning={true}>
<Added version="2.16.0"/>
`trust_remote_code` will default to False in the next major release.
<Changed version="2.20.0">
</Tip>
`trust_remote_code` defaults to `False` if not specified.
</Changed>
<Added version="2.16.0"/>
**download_kwargs (additional keyword arguments): optional attributes for DownloadConfig() which will override
the attributes in download_config if supplied.
Expand Down Expand Up @@ -1961,18 +1962,19 @@ def metric_module_factory(
dynamic_modules_path (Optional str, defaults to HF_MODULES_CACHE / "datasets_modules", i.e. ~/.cache/huggingface/modules/datasets_modules):
Optional path to the directory in which the dynamic modules are saved. It must have been initialized with :obj:`init_dynamic_modules`.
By default, the datasets and metrics are stored inside the `datasets_modules` module.
trust_remote_code (`bool`, defaults to `True`):
trust_remote_code (`bool`, defaults to `False`):
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
should only be set to `True` for repositories you trust and in which you have read the code, as it will
execute code present on the Hub on your local machine.
<Tip warning={true}>
<Added version="2.16.0"/>
`trust_remote_code` will default to False in the next major release.
<Changed version="2.20.0">
</Tip>
`trust_remote_code` defaults to `False` if not specified.
</Changed>
<Added version="2.16.0"/>
**download_kwargs (additional keyword arguments): optional attributes for DownloadConfig() which will override
the attributes in download_config if supplied.
Expand Down Expand Up @@ -2078,18 +2080,18 @@ def load_metric(
revision (Optional ``Union[str, datasets.Version]``): if specified, the module will be loaded from the datasets repository
at this version. By default, it is set to the local version of the lib. Specifying a version that is different from
your local version of the lib might cause compatibility issues.
trust_remote_code (`bool`, defaults to `True`):
trust_remote_code (`bool`, defaults to `False`):
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
should only be set to `True` for repositories you trust and in which you have read the code, as it will
execute code present on the Hub on your local machine.
<Tip warning={true}>
<Added version="2.16.0"/>
`trust_remote_code` will default to False in the next major release.
<Changed version="2.20.0">
</Tip>
`trust_remote_code` defaults to `False` if not specified.
<Added version="2.16.0"/>
</Changed>
Returns:
`datasets.Metric`
Expand Down Expand Up @@ -2220,18 +2222,19 @@ def load_dataset_builder(
**Experimental**. Key/value pairs to be passed on to the dataset file-system backend, if any.
<Added version="2.11.0"/>
trust_remote_code (`bool`, defaults to `True`):
trust_remote_code (`bool`, defaults to `False`):
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
should only be set to `True` for repositories you trust and in which you have read the code, as it will
execute code present on the Hub on your local machine.
<Tip warning={true}>
<Added version="2.16.0"/>
<Changed version="2.20.0">
`trust_remote_code` will default to False in the next major release.
`trust_remote_code` defaults to `False` if not specified.
</Tip>
</Changed>
<Added version="2.16.0"/>
**config_kwargs (additional keyword arguments):
Keyword arguments to be passed to the [`BuilderConfig`]
and used in the [`DatasetBuilder`].
Expand Down Expand Up @@ -2481,18 +2484,19 @@ def load_dataset(
**Experimental**. Key/value pairs to be passed on to the dataset file-system backend, if any.
<Added version="2.11.0"/>
trust_remote_code (`bool`, defaults to `True`):
trust_remote_code (`bool`, defaults to `False`):
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
should only be set to `True` for repositories you trust and in which you have read the code, as it will
execute code present on the Hub on your local machine.
<Tip warning={true}>
<Added version="2.16.0"/>
`trust_remote_code` will default to False in the next major release.
<Changed version="2.20.0">
</Tip>
`trust_remote_code` defaults to `False` if not specified.
</Changed>
<Added version="2.16.0"/>
**config_kwargs (additional keyword arguments):
Keyword arguments to be passed to the `BuilderConfig`
and used in the [`DatasetBuilder`].
Expand Down

0 comments on commit a6ccf94

Please sign in to comment.