Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: agent failed to get page links from vpc network #2114

Closed
shaozhijian2008 opened this issue Aug 14, 2024 · 1 comment
Closed

[BUG]: agent failed to get page links from vpc network #2114

shaozhijian2008 opened this issue Aug 14, 2024 · 1 comment
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@shaozhijian2008
Copy link

shaozhijian2008 commented Aug 14, 2024

How are you running AnythingLLM?

Not listed

What happened?

AnythingLLM was deployed in k3s environment and works well thanks for this manifest.

When come to using agent with web-scraping tool,it's ok with normal website, but it's error with vpc network website.
this vpc website can be access inside AnythingLLM pod exec with curl tool.

error msg is like this:
[collector] error: Failed to get page links from http://xx.xx.xx/. TimeoutError: Navigation timeout of 180000 ms exceeded
at new Deferred (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/util/Deferred.js:23:34)
at Deferred.create (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/util/Deferred.js:65:16)
at new LifecycleWatcher (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/LifecycleWatcher.js:72:46)
at CdpFrame.goto (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Frame.js:138:29)
at CdpFrame. (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/util/decorators.js:104:27)
at CdpPage.goto (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/api/Page.js:568:43)
at PuppeteerWebBaseLoader._scrape (/app/collector/node_modules/langchain/dist/document_loaders/web/puppeteer.cjs:49:20)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async PuppeteerWebBaseLoader.load (/app/collector/node_modules/langchain/dist/document_loaders/web/puppeteer.cjs:74:22)
at async getPageLinks (/app/collector/utils/extensions/WebsiteDepth/index.js:51:18)

Are there known steps to reproduce?

No response

@shaozhijian2008 shaozhijian2008 added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Aug 14, 2024
@shaozhijian2008
Copy link
Author

shaozhijian2008 commented Aug 14, 2024

using web-scraping tool run into another err with self sign website:
[collector] error: getPageContent failed to be fetched by puppeteer - falling back to fetch! Error: net::ERR_CERT_AUTHORITY_INVALID at https://xx.xx.xx/pages/6ec6c0/

TuanBC pushed a commit to TuanBC/anything-llm that referenced this issue Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

No branches or pull requests

1 participant