Persist scraped web pages #530

damms005 · 2024-04-03T12:23:54Z

This is in furtherance of #400

It will be a good addition to not have to re-scrape same webpage over and over, as it is wasteful etc.

Scraped pages should persist perhaps in a system-wide context such that subsequent calls to /web http://already-scraped.com/specific-page will only re-scrape if not already scraped or if user specifically asks to, perhaps a switch to be provided to the /web command.

Many thanks for the awesome job!

The text was updated successfully, but these errors were encountered:

paul-gauthier · 2024-04-11T16:30:45Z

Thanks for trying aider and filing this issue.

Re-scraping a webpage should only take a moment, and ensures you have a fresh copy of the data it contains. Persisting or caching the content could lead to problems with not picking up new page content.

Can you help me understand the problem you are having with re-scraping?

damms005 · 2024-04-12T10:33:30Z

Agreed. Although "should only take a moment" when done multiple times a day adds up, esp if not on good connection.

My specific use-case is when I need to use specific features of tools/frameworks like Laravel or Filament. I find myself needing to re-scrape in order to provide context to tasks.

I may also be using the tool wrong, yk 🤷‍♂️

nevercast · 2024-05-07T22:25:58Z

I wonder if this also fits into the broader RAG feature.

paul-gauthier added the enhancement New feature or request label Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist scraped web pages #530

Persist scraped web pages #530

damms005 commented Apr 3, 2024

paul-gauthier commented Apr 11, 2024

damms005 commented Apr 12, 2024

nevercast commented May 7, 2024

Persist scraped web pages #530

Persist scraped web pages #530

Comments

damms005 commented Apr 3, 2024

paul-gauthier commented Apr 11, 2024

damms005 commented Apr 12, 2024

nevercast commented May 7, 2024