diff --git a/website/docs/faq.mdx b/website/docs/faq.mdx index 159e493..4ca17d3 100644 --- a/website/docs/faq.mdx +++ b/website/docs/faq.mdx @@ -536,5 +536,15 @@ uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple 4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**. 5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component. +--- + +### How to configure MinerU-specific settings? +1. Set `MINERU_EXECUTABLE` (default: `mineru`) to the path of the MinerU executable. +2. Set `MINERU_DELETE_OUTPUT` to `0` to keep MinerU's output. (Default: `1`, which deletes temporary output) +3. Set `MINERU_OUTPUT_DIR` to specify the output directory for MinerU. +4. Set `MINERU_BACKEND` to `"pipeline"`. (Options: `"pipeline"` (default) | `"vlm-transformers"`) +:::tip NOTE +For information about other environment variables natively supported by MinerU, see [here](https://opendatalab.github.io/MinerU/usage/cli_tools/#environment-variables-description). +:::