Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist scraped web pages #530

Open
damms005 opened this issue Apr 3, 2024 · 3 comments
Open

Persist scraped web pages #530

damms005 opened this issue Apr 3, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@damms005
Copy link

damms005 commented Apr 3, 2024

This is in furtherance of #400

It will be a good addition to not have to re-scrape same webpage over and over, as it is wasteful etc.

Scraped pages should persist perhaps in a system-wide context such that subsequent calls to /web http://already-scraped.com/specific-page will only re-scrape if not already scraped or if user specifically asks to, perhaps a switch to be provided to the /web command.

Many thanks for the awesome job!

@paul-gauthier paul-gauthier added the enhancement New feature or request label Apr 11, 2024
@paul-gauthier
Copy link
Owner

Thanks for trying aider and filing this issue.

Re-scraping a webpage should only take a moment, and ensures you have a fresh copy of the data it contains. Persisting or caching the content could lead to problems with not picking up new page content.

Can you help me understand the problem you are having with re-scraping?

@damms005
Copy link
Author

Agreed. Although "should only take a moment" when done multiple times a day adds up, esp if not on good connection.

My specific use-case is when I need to use specific features of tools/frameworks like Laravel or Filament. I find myself needing to re-scrape in order to provide context to tasks.

I may also be using the tool wrong, yk 🤷‍♂️

@nevercast
Copy link

I wonder if this also fits into the broader RAG feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants