Skip to content

fix: webscaper sometime not work#35450

Merged
hjlarry merged 1 commit intolanggenius:mainfrom
hjlarry:p403
Apr 21, 2026
Merged

fix: webscaper sometime not work#35450
hjlarry merged 1 commit intolanggenius:mainfrom
hjlarry:p403

Conversation

@hjlarry
Copy link
Copy Markdown
Contributor

@hjlarry hjlarry commented Apr 21, 2026

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

fix #35449

the readabilipy always try to use nodejs to parse the html which will cause this error. more about this https://github.com/alan-turing-institute/ReadabiliPy

try to mock the error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/hejl/projects/dify/api/.venv/lib/python3.12/site-packages/readabilipy/simple_json.py", line 41, in simple_json_from_html_string
    if use_readability and not have_node():
                               ^^^^^^^^^^^
  File "/Users/hejl/projects/dify/api/.venv/lib/python3.12/site-packages/readabilipy/simple_json.py", line 36, in have_node
    run_npm_install()
  File "/Users/hejl/projects/dify/api/.venv/lib/python3.12/site-packages/readabilipy/utils.py", line 60, in run_npm_install
    cp = subprocess.run(["npm", "install"], check=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hejl/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['npm', 'install']' returned non-zero exit status 1.

Screenshots

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint && make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

@hjlarry hjlarry requested a review from QuantumGhost as a code owner April 21, 2026 05:41
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 21, 2026
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 21, 2026
@hjlarry hjlarry added this pull request to the merge queue Apr 21, 2026
Merged via the queue into langgenius:main with commit 3b24d8d Apr 21, 2026
27 checks passed
@hjlarry hjlarry deleted the p403 branch April 21, 2026 06:15
HanqingZ pushed a commit to HanqingZ/dify that referenced this pull request Apr 23, 2026
asukaminato0721 pushed a commit to asukaminato0721/dify that referenced this pull request Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

web scraper sometime not work

2 participants